The Cauchy density function enjoys a notorious reputation in mathematics, serving as a common counterexample to identify incomplete proofs in probability theory. At the same time it has also been a consistent source of frustration for statistical computation. In this case study I review various ways of implementing the Cauchy distribution, from the nominal implementation to alternative implementations aimed at ameliorating these difficulties, and demonstrate their relative performance.

1 The Cauchy Distribution

The Cauchy distribution seems innocent enough, with a probability density function given by the rational function, \[ \pi(x) = \frac{1}{\pi \, s} \frac{ s^{2} }{ (x - m)^{2} + s^{2}}. \] Here the location, \(m\), quantifies where the corresponding probability distribution concentrates while the scale, \(s\), quantifies the breadth of that distribution.

c_light <- c("#DCBCBC")
c_light_highlight <- c("#C79999")
c_mid <- c("#B97C7C")
c_mid_highlight <- c("#A25050")
c_dark <- c("#8F2727")
c_dark_highlight <- c("#7C0000")

x <- seq(-10, 10, 0.001)
plot(x, dcauchy(x, location = 0, scale = 1), type="l", col=c_dark_highlight, lwd=2,
     main="", xlab="x", ylab="Probability Density", yaxt='n')

This probability density function, however, defines a probability distribution with extremely long tails that places significant probability in neighborhoods far away from \(x = m\). We can see just how heavy these tails are by comparing the quantile function of the standard Cauchy density to that of a standard Gaussian density,

x <- seq(0, 1, 0.001)
plot(x, qcauchy(x, location = 0, scale = 1), type="l", col=c_dark_highlight, lwd=2,
     main="", xlab="Probability", ylab="Quantile")
lines(x, qnorm(x, 0, 1), type="l", col=c_light_highlight, lwd=2)

text(x=0.9, y=250, labels="Cauchy", col=c_dark_highlight)
text(x=0.9, y=-50, labels="Normal", col=c_light_highlight)

\(95\%\) of the probability of the Gaussian distribution is contained within a distance of \(x = 1.6\) from \(x = m = 0\), but we would have to go to \(x = 6.3\) to contain the same probability for the Cauchy distribution. In order to contain \(99\%\) of the probability we would need to go to only \(x = 2.3\) for the Gaussian distribution, but all the way to \(x = 31.8\) for the Cauchy distribution!

These extremely heavy tails yield all kinds of surprising behavior for distributions defined with Cauchy density functions. All of the even moments of the distribution are infinite whereas all of the odd moments are not even well defined. In order to avoid pathological behavior we have to restrict ourselves to characterizing these Cauchy distributions with quantiles.

The Cauchy density was once recommended as the default for weakly informative prior densities, but the behavior of these heavy tails, especially the weak containment of probability around the location, \(x = m\), has proven to be too ungainly in practice. Consequently we have since moved towards recommending Gaussian densities for weakly informative prior densities. Still, because the Cauchy density arises as a component in more sophisticated prior densities, such as the horseshoe and its generalizations, understanding how to best fit the it remains important.

2 The Nominal Implementation of the Cauchy Density Function

The heavy tails of the Cauchy density function make it notoriously difficult to estimate its expectation values. In particular Random Walk Metropolis, the Metropolis-Adjusted Langevin Algorithm, and even static Hamiltonian Monte Carlo all fail to provide accurately estimates for these expectations. The problem is that for accurate estimation any algorithm will have to explore the massive extent of the heavy tails, but once out in those tails most algorithms have difficulty returning back to the bulk of the distribution around \(m = 0\).

How does the dynamic Hamiltonian Monte Carlo method used in Stan fare? It’s easy enough to check, here with a product of fifty Cauchy density functions and the indicator function for the interval \(-1 \le x \le 1\),

writeLines(readLines("cauchy_nom.stan"))

parameters {
  vector[50] x;
}

model {
  x ~ cauchy(0, 1);
}

generated quantities {
  real I = fabs(x[1]) < 1 ? 1 : 0;
}

Pushing the program through Stan,

library(rstan)

Loading required package: ggplot2

Loading required package: StanHeaders

rstan (Version 2.18.1, GitRev: 2e1f913d3ca3)

For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

util <- new.env()
source('stan_utility.R', local=util)
source('plot_utility.R', local=util)

fit_nom <- stan(file='cauchy_nom.stan', seed=4938483,
                warmup=1000, iter=11000, control=list(max_treedepth=20))

util$check_all_diagnostics(fit_nom, max_depth=20)

[1] "n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "Tail khats for parameter x[1] are 0.656038231612564 and 0.656038231612564!"
[1] "Tail khats for parameter x[2] are 0.763325645356467 and 0.763325645356467!"
[1] "Tail khats for parameter x[3] are 0.734399748837252 and 0.734399748837252!"
[1] "Tail khats for parameter x[4] are 0.689803096064259 and 0.689803096064259!"
[1] "Tail khats for parameter x[5] are 0.711650043372225 and 0.711650043372225!"
[1] "Tail khats for parameter x[6] are 0.763114704958051 and 0.763114704958051!"
[1] "Tail khats for parameter x[7] are 0.729253426382291 and 0.729253426382291!"
[1] "Tail khats for parameter x[8] are 0.839431515520477 and 0.839431515520477!"
[1] "Tail khats for parameter x[9] are 0.698458597133945 and 0.698458597133945!"
[1] "Tail khats for parameter x[10] are 0.654650834751838 and 0.654650834751838!"
[1] "Tail khats for parameter x[11] are 0.738482528565819 and 0.738482528565819!"
[1] "Tail khats for parameter x[12] are 0.715332438317195 and 0.715332438317195!"
[1] "Tail khats for parameter x[13] are 0.848345838005493 and 0.848345838005493!"
[1] "Tail khats for parameter x[14] are 0.837531018504202 and 0.837531018504202!"
[1] "Tail khats for parameter x[15] are 0.668441148501527 and 0.668441148501527!"
[1] "Tail khats for parameter x[16] are 0.676632538450146 and 0.676632538450146!"
[1] "Tail khats for parameter x[17] are 0.73970827788225 and 0.73970827788225!"
[1] "Tail khats for parameter x[18] are 0.688050249525289 and 0.688050249525289!"
[1] "Tail khats for parameter x[19] are 0.790103252082743 and 0.790103252082743!"
[1] "Tail khats for parameter x[20] are 0.761731815073177 and 0.761731815073177!"
[1] "Tail khats for parameter x[21] are 0.661814422360661 and 0.661814422360661!"
[1] "Tail khats for parameter x[22] are 0.741641245025071 and 0.741641245025071!"
[1] "Tail khats for parameter x[23] are 0.761187419849707 and 0.761187419849707!"
[1] "Tail khats for parameter x[24] are 0.678026690633304 and 0.678026690633304!"
[1] "Tail khats for parameter x[25] are 0.776539340326557 and 0.776539340326557!"
[1] "Tail khats for parameter x[26] are 0.662069075849418 and 0.662069075849418!"
[1] "Tail khats for parameter x[27] are 0.76641564541638 and 0.76641564541638!"
[1] "Tail khats for parameter x[28] are 0.69687121183678 and 0.69687121183678!"
[1] "Tail khats for parameter x[29] are 0.737822828717598 and 0.737822828717598!"
[1] "Tail khats for parameter x[30] are 0.753320587538848 and 0.753320587538848!"
[1] "Tail khats for parameter x[31] are 0.76094464575522 and 0.76094464575522!"
[1] "Tail khats for parameter x[32] are 0.73804992823803 and 0.73804992823803!"
[1] "Tail khats for parameter x[33] are 0.697983458192785 and 0.697983458192785!"
[1] "Tail khats for parameter x[34] are 0.769859512309621 and 0.769859512309621!"
[1] "Tail khats for parameter x[35] are 0.669966413821939 and 0.669966413821939!"
[1] "Tail khats for parameter x[36] are 0.694629465971976 and 0.694629465971976!"
[1] "Tail khats for parameter x[37] are 0.740834073237828 and 0.740834073237828!"
[1] "Tail khats for parameter x[38] are 0.746030882620063 and 0.746030882620063!"
[1] "Tail khats for parameter x[39] are 0.81864019372431 and 0.81864019372431!"
[1] "Tail khats for parameter x[40] are 0.755351342364494 and 0.755351342364494!"
[1] "Tail khats for parameter x[41] are 0.837778274963811 and 0.837778274963811!"
[1] "Tail khats for parameter x[42] are 0.708510582352759 and 0.708510582352759!"
[1] "Tail khats for parameter x[43] are 0.787901735950942 and 0.787901735950942!"
[1] "Tail khats for parameter x[44] are 0.708497741026517 and 0.708497741026517!"
[1] "Tail khats for parameter x[45] are 0.667225719438454 and 0.667225719438454!"
[1] "Tail khats for parameter x[46] are 0.68172988470105 and 0.68172988470105!"
[1] "Tail khats for parameter x[47] are 0.697653241570047 and 0.697653241570047!"
[1] "Tail khats for parameter x[48] are 0.817499201532475 and 0.817499201532475!"
[1] "Tail khats for parameter x[49] are 0.764357011552379 and 0.764357011552379!"
[1] "Tail khats for parameter x[50] are 0.752715754673428 and 0.752715754673428!"
[1] "  Tail khat above 0.5 indicates that the parameter is probably not square integrable"
[1] "0 of 40000 iterations ended with a divergence (0%)"
[1] "0 of 40000 iterations saturated the maximum tree depth of 20 (0%)"
[1] "E-FMI indicated no pathological behavior"

After increasing the maximum treedepth to a pretty high value we see no indications of pathological fitting behavior. We do, however, see \(\hat{k}\) warnings which indicate that the \(x\) parameters probably don’t have finite means and variances and hence well-defined effective sample sizes!

Consequently we shouldn’t try to estimate the means or the estimated effective samples sizes of these parameters and instead focus on quantiles of each \(x\) or the mean of the included indicator function, \(I\).

Indeed Stan is able to recover the \(5\%\), \(50\%\), and \(95\%\) quantiles of each parameter quite accurately,

util$plot_estimated_quantiles(fit_nom, "Nominal Parameterization")

The variable integration times in dynamic Hamiltonian Monte Carlo allow extremely long trajectories once we’re out in the tails of the Cauchy density function. These trajectories very slowly, but surely, carry us back into the bulk of the distribution no matter how far into the tails we have sojourned. The dynamic nature of the trajectories is important here – once we are deep enough into the tails any static but finite integration time will not be long enough to ensure that we return!

Considering that most algorithms fail to accurately fit distributions defined with Cauchy density functions, this is a pretty remarkable achievement for Stan. That said, the long trajectories required for this accuracy come at a significant computational cost. Consequently let’s consider alternative implementations of the Cauchy distribution that can be fit with fewer computational resources.

3 First Alternative Implementation

An interesting property of the Cauchy density function is that it arises as a scale mixture of Gaussian density functions, \[ \text{Cauchy}(x \mid m, s) = \int \mathrm{d} \tau \, \text{Normal} \, \left(x \mid m, \tau^{-\frac{1}{2}} \right) \, \text{Gamma} \, \left(\tau \mid \frac{1}{2}, \frac{s^{2}}{2} \right). \] In other words, if \[ x_{a} \sim \text{Normal} \, \left(0, 1 \right) \] and \[ x_{b} \sim \text{Gamma} \, \left(\frac{1}{2}, \frac{s^{2}}{2} \right) \] then \[ x = m + \frac{ x_{a} }{ \sqrt{x_{b}} } \] follows a distribution specified by a \(\text{Cauchy}(m, s)\) density function.
Note that I am using the Stan conventions for the normal and gamma density function parameterizations.

This property suggests a parameter expansion approach to implementing the Cauchy distribution where we fit \(x_{a}\) and \(x_{b}\) and then derive \(x\) deterministically. Although this requires twice the number of parameters, the resulting joint density function is much more concentrated than the Cauchy density function and hence significantly easier to fit.

x <- seq(-3, 3, 0.05)
y <- seq(-9, 1, 0.05)

n_x <- length(x)
n_y <- length(y)
z <- matrix(nrow=n_x, ncol=n_y)
for (i in 1:n_x) for (j in 1:n_y)
  z[i, j] <- dnorm(x[i], 0, 1) * dgamma(exp(y[j]), 0.5, 1 / 0.5) * exp(y[j])

contour(x, y, z, levels=seq(0.05, 1, 0.05) * max(z), drawlabels=FALSE,
        main="First Alternative", xlab="x_a", ylab="log(x_b)",
        col=c_dark_highlight, lwd=2)

This alternative implementation is straightforward to write as a Stan program,

writeLines(readLines("cauchy_alt_1.stan"))

parameters {
  vector[50] x_a;
  vector<lower=0>[50] x_b;
}

transformed parameters {
  vector[50] x = x_a ./ sqrt(x_b);
}

model {
  x_a ~ normal(0, 1);
  x_b ~ gamma(0.5, 0.5);
}

generated quantities {
  real I = fabs(x[1]) < 1 ? 1 : 0;
}

and readily fit,

fit_1 <- stan(file='cauchy_alt_1.stan', seed=4938483,
              warmup=1000, iter=11000)

util$check_all_diagnostics(fit_1)

[1] "n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "Tail khats for parameter x[1] are 0.656038231612564 and 0.656038231612564!"
[1] "Tail khats for parameter x[2] are 0.763325645356467 and 0.763325645356467!"
[1] "Tail khats for parameter x[3] are 0.734399748837252 and 0.734399748837252!"
[1] "Tail khats for parameter x[4] are 0.689803096064259 and 0.689803096064259!"
[1] "Tail khats for parameter x[5] are 0.711650043372225 and 0.711650043372225!"
[1] "Tail khats for parameter x[6] are 0.763114704958051 and 0.763114704958051!"
[1] "Tail khats for parameter x[7] are 0.729253426382291 and 0.729253426382291!"
[1] "Tail khats for parameter x[8] are 0.839431515520477 and 0.839431515520477!"
[1] "Tail khats for parameter x[9] are 0.698458597133945 and 0.698458597133945!"
[1] "Tail khats for parameter x[10] are 0.654650834751838 and 0.654650834751838!"
[1] "Tail khats for parameter x[11] are 0.738482528565819 and 0.738482528565819!"
[1] "Tail khats for parameter x[12] are 0.715332438317195 and 0.715332438317195!"
[1] "Tail khats for parameter x[13] are 0.848345838005493 and 0.848345838005493!"
[1] "Tail khats for parameter x[14] are 0.837531018504202 and 0.837531018504202!"
[1] "Tail khats for parameter x[15] are 0.668441148501527 and 0.668441148501527!"
[1] "Tail khats for parameter x[16] are 0.676632538450146 and 0.676632538450146!"
[1] "Tail khats for parameter x[17] are 0.73970827788225 and 0.73970827788225!"
[1] "Tail khats for parameter x[18] are 0.688050249525289 and 0.688050249525289!"
[1] "Tail khats for parameter x[19] are 0.790103252082743 and 0.790103252082743!"
[1] "Tail khats for parameter x[20] are 0.761731815073177 and 0.761731815073177!"
[1] "Tail khats for parameter x[21] are 0.661814422360661 and 0.661814422360661!"
[1] "Tail khats for parameter x[22] are 0.741641245025071 and 0.741641245025071!"
[1] "Tail khats for parameter x[23] are 0.761187419849707 and 0.761187419849707!"
[1] "Tail khats for parameter x[24] are 0.678026690633304 and 0.678026690633304!"
[1] "Tail khats for parameter x[25] are 0.776539340326557 and 0.776539340326557!"
[1] "Tail khats for parameter x[26] are 0.662069075849418 and 0.662069075849418!"
[1] "Tail khats for parameter x[27] are 0.76641564541638 and 0.76641564541638!"
[1] "Tail khats for parameter x[28] are 0.69687121183678 and 0.69687121183678!"
[1] "Tail khats for parameter x[29] are 0.737822828717598 and 0.737822828717598!"
[1] "Tail khats for parameter x[30] are 0.753320587538848 and 0.753320587538848!"
[1] "Tail khats for parameter x[31] are 0.76094464575522 and 0.76094464575522!"
[1] "Tail khats for parameter x[32] are 0.73804992823803 and 0.73804992823803!"
[1] "Tail khats for parameter x[33] are 0.697983458192785 and 0.697983458192785!"
[1] "Tail khats for parameter x[34] are 0.769859512309621 and 0.769859512309621!"
[1] "Tail khats for parameter x[35] are 0.669966413821939 and 0.669966413821939!"
[1] "Tail khats for parameter x[36] are 0.694629465971976 and 0.694629465971976!"
[1] "Tail khats for parameter x[37] are 0.740834073237828 and 0.740834073237828!"
[1] "Tail khats for parameter x[38] are 0.746030882620063 and 0.746030882620063!"
[1] "Tail khats for parameter x[39] are 0.81864019372431 and 0.81864019372431!"
[1] "Tail khats for parameter x[40] are 0.755351342364494 and 0.755351342364494!"
[1] "Tail khats for parameter x[41] are 0.837778274963811 and 0.837778274963811!"
[1] "Tail khats for parameter x[42] are 0.708510582352759 and 0.708510582352759!"
[1] "Tail khats for parameter x[43] are 0.787901735950942 and 0.787901735950942!"
[1] "Tail khats for parameter x[44] are 0.708497741026517 and 0.708497741026517!"
[1] "Tail khats for parameter x[45] are 0.667225719438454 and 0.667225719438454!"
[1] "Tail khats for parameter x[46] are 0.68172988470105 and 0.68172988470105!"
[1] "Tail khats for parameter x[47] are 0.697653241570047 and 0.697653241570047!"
[1] "Tail khats for parameter x[48] are 0.817499201532475 and 0.817499201532475!"
[1] "Tail khats for parameter x[49] are 0.764357011552379 and 0.764357011552379!"
[1] "Tail khats for parameter x[50] are 0.752715754673428 and 0.752715754673428!"
[1] "  Tail khat above 0.5 indicates that the parameter is probably not square integrable"
[1] "0 of 40000 iterations ended with a divergence (0%)"
[1] "0 of 40000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-FMI indicated no pathological behavior"

There are no indications of pathological behavior with Stan’s default settings, although we are again warned to be careful about interpreting expectations of the reconstructed Cauchy parameters. The quantiles of each of those recovered Cauchy parameters, however, once again prove to be no problem,

util$plot_estimated_quantiles(fit_1, "First Alternative")

4 Second Alternative Implementation

This first alternative implementation immediately suggests a second. We can avoid the division in the recovery of \(x\) by giving \(x_{b}\) an inverse gamma density function. Here we let \[ x_{a} \sim \text{Normal} \, \left(0, 1 \right) \] and \[ x_{b} \sim \text{Inv-Gamma} \, \left(\frac{1}{2}, \frac{s^{2}}{2} \right) \] from which \[ x = m + x_{a} \cdot \sqrt{x_{b}} \] will follow a distribution specified by a \(\text{Cauchy}(m, s)\) density function.

Although this may seem like a small change, the division operator and its derivatives are significantly slower to evaluate than the multiplication operator and its derivatives. Hence the change can yield nontrivial performance improvements.

This small change yields a joint density function that mirrors the joint density function of the first alternative implementation and its pleasant geometry.

x <- seq(-3, 3, 0.05)
y <- seq(-1, 9, 0.05)

n_x <- length(x)
n_y <- length(y)
z <- matrix(nrow=n_x, ncol=n_y)
for (i in 1:n_x) for (j in 1:n_y)
  z[i, j] <- dnorm(x[i], 0, 1) * dgamma(exp(-y[j]), 0.5, 1 / 0.5) * exp(-y[j])

contour(x, y, z, levels=seq(0.05, 1, 0.05) * max(z), drawlabels=FALSE,
        main="Second Alternative", xlab="x_a", ylab="log(x_b)",
        col=c_dark_highlight, lwd=2)

The corresponding Stan program is given by a small tweak

writeLines(readLines("cauchy_alt_2.stan"))

parameters {
  vector[50] x_a;
  vector<lower=0>[50] x_b;
}

transformed parameters {
  vector[50] x = x_a .* sqrt(x_b);
}

model {
  x_a ~ normal(0, 1);
  x_b ~ inv_gamma(0.5, 0.5);
}

generated quantities {
  real I = fabs(x[1]) < 1 ? 1 : 0;
}

and the fit

fit_2 <- stan(file='cauchy_alt_2.stan', seed=4938483,
              warmup=1000, iter=11000)

util$check_all_diagnostics(fit_2)

[1] "n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "Tail khats for parameter x[1] are 0.656038231612564 and 0.656038231612564!"
[1] "Tail khats for parameter x[2] are 0.763325645356467 and 0.763325645356467!"
[1] "Tail khats for parameter x[3] are 0.734399748837252 and 0.734399748837252!"
[1] "Tail khats for parameter x[4] are 0.689803096064259 and 0.689803096064259!"
[1] "Tail khats for parameter x[5] are 0.711650043372225 and 0.711650043372225!"
[1] "Tail khats for parameter x[6] are 0.763114704958051 and 0.763114704958051!"
[1] "Tail khats for parameter x[7] are 0.729253426382291 and 0.729253426382291!"
[1] "Tail khats for parameter x[8] are 0.839431515520477 and 0.839431515520477!"
[1] "Tail khats for parameter x[9] are 0.698458597133945 and 0.698458597133945!"
[1] "Tail khats for parameter x[10] are 0.654650834751838 and 0.654650834751838!"
[1] "Tail khats for parameter x[11] are 0.738482528565819 and 0.738482528565819!"
[1] "Tail khats for parameter x[12] are 0.715332438317195 and 0.715332438317195!"
[1] "Tail khats for parameter x[13] are 0.848345838005493 and 0.848345838005493!"
[1] "Tail khats for parameter x[14] are 0.837531018504202 and 0.837531018504202!"
[1] "Tail khats for parameter x[15] are 0.668441148501527 and 0.668441148501527!"
[1] "Tail khats for parameter x[16] are 0.676632538450146 and 0.676632538450146!"
[1] "Tail khats for parameter x[17] are 0.73970827788225 and 0.73970827788225!"
[1] "Tail khats for parameter x[18] are 0.688050249525289 and 0.688050249525289!"
[1] "Tail khats for parameter x[19] are 0.790103252082743 and 0.790103252082743!"
[1] "Tail khats for parameter x[20] are 0.761731815073177 and 0.761731815073177!"
[1] "Tail khats for parameter x[21] are 0.661814422360661 and 0.661814422360661!"
[1] "Tail khats for parameter x[22] are 0.741641245025071 and 0.741641245025071!"
[1] "Tail khats for parameter x[23] are 0.761187419849707 and 0.761187419849707!"
[1] "Tail khats for parameter x[24] are 0.678026690633304 and 0.678026690633304!"
[1] "Tail khats for parameter x[25] are 0.776539340326557 and 0.776539340326557!"
[1] "Tail khats for parameter x[26] are 0.662069075849418 and 0.662069075849418!"
[1] "Tail khats for parameter x[27] are 0.76641564541638 and 0.76641564541638!"
[1] "Tail khats for parameter x[28] are 0.69687121183678 and 0.69687121183678!"
[1] "Tail khats for parameter x[29] are 0.737822828717598 and 0.737822828717598!"
[1] "Tail khats for parameter x[30] are 0.753320587538848 and 0.753320587538848!"
[1] "Tail khats for parameter x[31] are 0.76094464575522 and 0.76094464575522!"
[1] "Tail khats for parameter x[32] are 0.73804992823803 and 0.73804992823803!"
[1] "Tail khats for parameter x[33] are 0.697983458192785 and 0.697983458192785!"
[1] "Tail khats for parameter x[34] are 0.769859512309621 and 0.769859512309621!"
[1] "Tail khats for parameter x[35] are 0.669966413821939 and 0.669966413821939!"
[1] "Tail khats for parameter x[36] are 0.694629465971976 and 0.694629465971976!"
[1] "Tail khats for parameter x[37] are 0.740834073237828 and 0.740834073237828!"
[1] "Tail khats for parameter x[38] are 0.746030882620063 and 0.746030882620063!"
[1] "Tail khats for parameter x[39] are 0.81864019372431 and 0.81864019372431!"
[1] "Tail khats for parameter x[40] are 0.755351342364494 and 0.755351342364494!"
[1] "Tail khats for parameter x[41] are 0.837778274963811 and 0.837778274963811!"
[1] "Tail khats for parameter x[42] are 0.708510582352759 and 0.708510582352759!"
[1] "Tail khats for parameter x[43] are 0.787901735950942 and 0.787901735950942!"
[1] "Tail khats for parameter x[44] are 0.708497741026517 and 0.708497741026517!"
[1] "Tail khats for parameter x[45] are 0.667225719438454 and 0.667225719438454!"
[1] "Tail khats for parameter x[46] are 0.68172988470105 and 0.68172988470105!"
[1] "Tail khats for parameter x[47] are 0.697653241570047 and 0.697653241570047!"
[1] "Tail khats for parameter x[48] are 0.817499201532475 and 0.817499201532475!"
[1] "Tail khats for parameter x[49] are 0.764357011552379 and 0.764357011552379!"
[1] "Tail khats for parameter x[50] are 0.752715754673428 and 0.752715754673428!"
[1] "  Tail khat above 0.5 indicates that the parameter is probably not square integrable"
[1] "0 of 40000 iterations ended with a divergence (0%)"
[1] "0 of 40000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-FMI indicated no pathological behavior"

proceeds with no issues save for the warnings about the recovered Cauchy parameters. As before, we accurately recover the quantiles of each Cauchy distributed component in our model,

util$plot_estimated_quantiles(fit_2, "Second Alternative")

5 Third Alternative Implementation

The final alternative implementation that we will consider utilizes the inverse cumulative distribution function of the Cauchy density. In particular, if \[ \tilde{x} \sim \text{Uniform} \, \left(0, 1 \right) \] then \[ x = m + s \cdot \tan \, \left(\pi \left(\tilde{x} - \frac{1}{2} \right) \right) \] follows a distribution specified by a \(\text{Cauchy}(m, s)\) density function.

The latent density of \(\text{logit}(\tilde{x})\), which is what Stan ultimately utilizes, exhibits much more reasonable tails than the Cauchy distribution,

x <- seq(-10, 10, 0.001)
plot(x, exp(-x) / (1 + exp(-x))**2, type="l", col=c_dark_highlight, lwd=2,
     main="Third Alternative", xlab="logit(x_tilde)", 
     ylab="Probability Density", yaxt='n')

This final implementation requires only a sparse Stan program,

writeLines(readLines("cauchy_alt_3.stan"))

parameters {
  vector<lower=0, upper=1>[50] x_tilde;
}

transformed parameters {
vector[50] x = tan(pi() * (x_tilde - 0.5));
}

model {
  // Implicit uniform prior on the x_tilde
}

generated quantities {
  real I = fabs(x[1]) < 1 ? 1 : 0;
}

and the fit proceeds without any indications of problems,

fit_3 <- stan(file='cauchy_alt_3.stan', seed=4938483,
              warmup=1000, iter=11000)

util$check_all_diagnostics(fit_3)

[1] "n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "Tail khats for parameter x[1] are 0.656038231612564 and 0.656038231612564!"
[1] "Tail khats for parameter x[2] are 0.763325645356467 and 0.763325645356467!"
[1] "Tail khats for parameter x[3] are 0.734399748837252 and 0.734399748837252!"
[1] "Tail khats for parameter x[4] are 0.689803096064259 and 0.689803096064259!"
[1] "Tail khats for parameter x[5] are 0.711650043372225 and 0.711650043372225!"
[1] "Tail khats for parameter x[6] are 0.763114704958051 and 0.763114704958051!"
[1] "Tail khats for parameter x[7] are 0.729253426382291 and 0.729253426382291!"
[1] "Tail khats for parameter x[8] are 0.839431515520477 and 0.839431515520477!"
[1] "Tail khats for parameter x[9] are 0.698458597133945 and 0.698458597133945!"
[1] "Tail khats for parameter x[10] are 0.654650834751838 and 0.654650834751838!"
[1] "Tail khats for parameter x[11] are 0.738482528565819 and 0.738482528565819!"
[1] "Tail khats for parameter x[12] are 0.715332438317195 and 0.715332438317195!"
[1] "Tail khats for parameter x[13] are 0.848345838005493 and 0.848345838005493!"
[1] "Tail khats for parameter x[14] are 0.837531018504202 and 0.837531018504202!"
[1] "Tail khats for parameter x[15] are 0.668441148501527 and 0.668441148501527!"
[1] "Tail khats for parameter x[16] are 0.676632538450146 and 0.676632538450146!"
[1] "Tail khats for parameter x[17] are 0.73970827788225 and 0.73970827788225!"
[1] "Tail khats for parameter x[18] are 0.688050249525289 and 0.688050249525289!"
[1] "Tail khats for parameter x[19] are 0.790103252082743 and 0.790103252082743!"
[1] "Tail khats for parameter x[20] are 0.761731815073177 and 0.761731815073177!"
[1] "Tail khats for parameter x[21] are 0.661814422360661 and 0.661814422360661!"
[1] "Tail khats for parameter x[22] are 0.741641245025071 and 0.741641245025071!"
[1] "Tail khats for parameter x[23] are 0.761187419849707 and 0.761187419849707!"
[1] "Tail khats for parameter x[24] are 0.678026690633304 and 0.678026690633304!"
[1] "Tail khats for parameter x[25] are 0.776539340326557 and 0.776539340326557!"
[1] "Tail khats for parameter x[26] are 0.662069075849418 and 0.662069075849418!"
[1] "Tail khats for parameter x[27] are 0.76641564541638 and 0.76641564541638!"
[1] "Tail khats for parameter x[28] are 0.69687121183678 and 0.69687121183678!"
[1] "Tail khats for parameter x[29] are 0.737822828717598 and 0.737822828717598!"
[1] "Tail khats for parameter x[30] are 0.753320587538848 and 0.753320587538848!"
[1] "Tail khats for parameter x[31] are 0.76094464575522 and 0.76094464575522!"
[1] "Tail khats for parameter x[32] are 0.73804992823803 and 0.73804992823803!"
[1] "Tail khats for parameter x[33] are 0.697983458192785 and 0.697983458192785!"
[1] "Tail khats for parameter x[34] are 0.769859512309621 and 0.769859512309621!"
[1] "Tail khats for parameter x[35] are 0.669966413821939 and 0.669966413821939!"
[1] "Tail khats for parameter x[36] are 0.694629465971976 and 0.694629465971976!"
[1] "Tail khats for parameter x[37] are 0.740834073237828 and 0.740834073237828!"
[1] "Tail khats for parameter x[38] are 0.746030882620063 and 0.746030882620063!"
[1] "Tail khats for parameter x[39] are 0.81864019372431 and 0.81864019372431!"
[1] "Tail khats for parameter x[40] are 0.755351342364494 and 0.755351342364494!"
[1] "Tail khats for parameter x[41] are 0.837778274963811 and 0.837778274963811!"
[1] "Tail khats for parameter x[42] are 0.708510582352759 and 0.708510582352759!"
[1] "Tail khats for parameter x[43] are 0.787901735950942 and 0.787901735950942!"
[1] "Tail khats for parameter x[44] are 0.708497741026517 and 0.708497741026517!"
[1] "Tail khats for parameter x[45] are 0.667225719438454 and 0.667225719438454!"
[1] "Tail khats for parameter x[46] are 0.68172988470105 and 0.68172988470105!"
[1] "Tail khats for parameter x[47] are 0.697653241570047 and 0.697653241570047!"
[1] "Tail khats for parameter x[48] are 0.817499201532475 and 0.817499201532475!"
[1] "Tail khats for parameter x[49] are 0.764357011552379 and 0.764357011552379!"
[1] "Tail khats for parameter x[50] are 0.752715754673428 and 0.752715754673428!"
[1] "  Tail khat above 0.5 indicates that the parameter is probably not square integrable"
[1] "0 of 40000 iterations ended with a divergence (0%)"
[1] "0 of 40000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-FMI indicated no pathological behavior"

Once again by modifying the geometry of the target distribution we accurately recover the quantiles of the Cauchy distribution without the excessive cost of its nominal implementation,

util$plot_estimated_quantiles(fit_3, "Third Alternative")

6 Comparing Performance of the Various Implementations

To quantify how much better these alternative implementations perform let’s consider the only well-defined performance metric for Markov chain Monte Carlo: the number of effective samples per computational cost. Here we use run time as a proxy for computational cost and consider the effective sample size per time for the indicator function constructed in each of our models which, unlike the independent Cauchy components, has a finite mean and variance and hence well-defined effective sample size.

r_nom <- (summary(fit_nom, probs=NA)$summary)[51,5] /
         sum(get_elapsed_time(fit_nom)[,2])
r_1 <- (summary(fit_1, probs=NA)$summary)[151,5] /
        sum(get_elapsed_time(fit_1)[,2])
r_2 <- (summary(fit_2, probs=NA)$summary)[151,5] /
         sum(get_elapsed_time(fit_2)[,2])
r_3 <- (summary(fit_3, probs=NA)$summary)[101,5] /
        sum(get_elapsed_time(fit_3)[,2])

plot(1:4, c(r_nom, r_1, r_2, r_3), type="l", lwd=2, col=c_dark,
     main="", xaxt = "n", xlab="", ylab="ESS / Time (s)",
     xlim=c(0, 5), ylim=c(0, 4000))
axis(1, at=1:4, labels=c("Nom", "Alt 1", "Alt 2", "Alt 3"))

Immediately we see that the alternative implementations are drastically better than the nominal implementation. Moreover, the second alternative implementation is almost twice as fast as the the first. Perhaps surprisingly, using an inverse Gamma distribution to avoid a division ends up being a significant improvement.

We can also compare the geometry of these implementations by considering only the number of gradient evaluations per iteration and ignoring the actual cost of each evaluation.

probs <- c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)

sampler_params <- get_sampler_params(fit_nom, inc_warmup=FALSE)
n_grad_nom <- quantile(do.call(rbind, sampler_params)[,'n_leapfrog__'],
                       probs = probs)

sampler_params <- get_sampler_params(fit_1, inc_warmup=FALSE)
n_grad_1 <- quantile(do.call(rbind, sampler_params)[,'n_leapfrog__'],
                     probs = probs)

sampler_params <- get_sampler_params(fit_2, inc_warmup=FALSE)
n_grad_2 <- quantile(do.call(rbind, sampler_params)[,'n_leapfrog__'],
                     probs = probs)

sampler_params <- get_sampler_params(fit_3, inc_warmup=FALSE)
n_grad_3 <- quantile(do.call(rbind, sampler_params)[,'n_leapfrog__'],
                     probs = probs)

idx <- c(1, 1, 2, 2, 3, 3, 4, 4)
x <- c(0.5, 1.5, 1.5, 2.5, 2.5, 3.5, 3.5, 4.5)

cred <- data.frame(n_grad_nom, n_grad_1, n_grad_2, n_grad_3)
pad_cred <- do.call(cbind, lapply(idx, function(n) cred[1:9,n]))

plot(1, type="n", main="", xaxt = "n", xlab="",
     ylab="Gradient Evaluations Per Iteration",
     xlim=c(0.5, 4.5), ylim=c(0, 8000))
axis(1, at=1:4, labels=c("Nom", "Alt 1", "Alt 2", "Alt 3"))

polygon(c(x, rev(x)), c(pad_cred[1,], rev(pad_cred[9,])),
        col = c_light, border = NA)
polygon(c(x, rev(x)), c(pad_cred[2,], rev(pad_cred[8,])),
        col = c_light_highlight, border = NA)
polygon(c(x, rev(x)), c(pad_cred[3,], rev(pad_cred[7,])),
        col = c_mid, border = NA)
polygon(c(x, rev(x)), c(pad_cred[4,], rev(pad_cred[6,])),
        col = c_mid_highlight, border = NA)
lines(x, pad_cred[5,], col=c_dark, lwd=2)

As expected the alternative implementations require far fewer gradient evaluations per iteration as they don’t exhibit the heavy tails which require increasingly longer and longer trajectories.

Zooming in to the alternative implementations,

probs <- c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)

idx <- c(1, 1, 2, 2, 3, 3)
x <- c(0.5, 1.5, 1.5, 2.5, 2.5, 3.5)

cred <- data.frame(n_grad_1, n_grad_2, n_grad_3)
pad_cred <- do.call(cbind, lapply(idx, function(n) cred[1:9,n]))

plot(1, type="n", main="", xaxt = "n", xlab="",
     ylab="Gradient Evaluations Per Iteration",
     xlim=c(0.5, 3.5), ylim=c(0, 50))
axis(1, at=1:3, labels=c("Alt 1", "Alt 2", "Alt 3"))

polygon(c(x, rev(x)), c(pad_cred[1,], rev(pad_cred[9,])),
        col = c_light, border = NA)
polygon(c(x, rev(x)), c(pad_cred[2,], rev(pad_cred[8,])),
        col = c_light_highlight, border = NA)
polygon(c(x, rev(x)), c(pad_cred[3,], rev(pad_cred[7,])),
        col = c_mid, border = NA)
polygon(c(x, rev(x)), c(pad_cred[4,], rev(pad_cred[6,])),
        col = c_mid_highlight, border = NA)
lines(x, pad_cred[5,], col=c_dark, lwd=2)

we see that the third implementation requires the fewest gradient evaluations and hence features the nicest geometry. That said, the tangent function is more expensive then the cost of evaluating the gamma and inverse gamma density functions and in terms of overall efficiency the second implementation proves superior.

7 The Half Cauchy Density Function

These alternative implementations can also be modified to work for the half Cauchy density function that is defined over only positive values. In particular, for the first and second alternative implementations we just have to constrain \(x_{a}\) to be positive. In the third we just have to set \[ x = \tan \, \left(\frac{\pi}{2} \tilde{x} \right) \] to yield a \(\text{Half-Cauchy}(0, 1)\) distribution.

To demonstrate let’s compare a nominal half Cauchy implementation,

writeLines(readLines("half_cauchy_nom.stan"))

parameters {
  vector<lower=0>[50] x;
}

model {
  x ~ cauchy(0, 1);
}

fit_half_nom <- stan(file='half_cauchy_nom.stan', seed=4938483,
                     warmup=1000, iter=11000)

util$check_all_diagnostics(fit_half_nom)

[1] "n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "Tail khats for parameter x[1] are 0.656038231612564 and 0.656038231612564!"
[1] "Tail khats for parameter x[2] are 0.763325645356467 and 0.763325645356467!"
[1] "Tail khats for parameter x[3] are 0.734399748837252 and 0.734399748837252!"
[1] "Tail khats for parameter x[4] are 0.689803096064259 and 0.689803096064259!"
[1] "Tail khats for parameter x[5] are 0.711650043372225 and 0.711650043372225!"
[1] "Tail khats for parameter x[6] are 0.763114704958051 and 0.763114704958051!"
[1] "Tail khats for parameter x[7] are 0.729253426382291 and 0.729253426382291!"
[1] "Tail khats for parameter x[8] are 0.839431515520477 and 0.839431515520477!"
[1] "Tail khats for parameter x[9] are 0.698458597133945 and 0.698458597133945!"
[1] "Tail khats for parameter x[10] are 0.654650834751838 and 0.654650834751838!"
[1] "Tail khats for parameter x[11] are 0.738482528565819 and 0.738482528565819!"
[1] "Tail khats for parameter x[12] are 0.715332438317195 and 0.715332438317195!"
[1] "Tail khats for parameter x[13] are 0.848345838005493 and 0.848345838005493!"
[1] "Tail khats for parameter x[14] are 0.837531018504202 and 0.837531018504202!"
[1] "Tail khats for parameter x[15] are 0.668441148501527 and 0.668441148501527!"
[1] "Tail khats for parameter x[16] are 0.676632538450146 and 0.676632538450146!"
[1] "Tail khats for parameter x[17] are 0.73970827788225 and 0.73970827788225!"
[1] "Tail khats for parameter x[18] are 0.688050249525289 and 0.688050249525289!"
[1] "Tail khats for parameter x[19] are 0.790103252082743 and 0.790103252082743!"
[1] "Tail khats for parameter x[20] are 0.761731815073177 and 0.761731815073177!"
[1] "Tail khats for parameter x[21] are 0.661814422360661 and 0.661814422360661!"
[1] "Tail khats for parameter x[22] are 0.741641245025071 and 0.741641245025071!"
[1] "Tail khats for parameter x[23] are 0.761187419849707 and 0.761187419849707!"
[1] "Tail khats for parameter x[24] are 0.678026690633304 and 0.678026690633304!"
[1] "Tail khats for parameter x[25] are 0.776539340326557 and 0.776539340326557!"
[1] "Tail khats for parameter x[26] are 0.662069075849418 and 0.662069075849418!"
[1] "Tail khats for parameter x[27] are 0.76641564541638 and 0.76641564541638!"
[1] "Tail khats for parameter x[28] are 0.69687121183678 and 0.69687121183678!"
[1] "Tail khats for parameter x[29] are 0.737822828717598 and 0.737822828717598!"
[1] "Tail khats for parameter x[30] are 0.753320587538848 and 0.753320587538848!"
[1] "Tail khats for parameter x[31] are 0.76094464575522 and 0.76094464575522!"
[1] "Tail khats for parameter x[32] are 0.73804992823803 and 0.73804992823803!"
[1] "Tail khats for parameter x[33] are 0.697983458192785 and 0.697983458192785!"
[1] "Tail khats for parameter x[34] are 0.769859512309621 and 0.769859512309621!"
[1] "Tail khats for parameter x[35] are 0.669966413821939 and 0.669966413821939!"
[1] "Tail khats for parameter x[36] are 0.694629465971976 and 0.694629465971976!"
[1] "Tail khats for parameter x[37] are 0.740834073237828 and 0.740834073237828!"
[1] "Tail khats for parameter x[38] are 0.746030882620063 and 0.746030882620063!"
[1] "Tail khats for parameter x[39] are 0.81864019372431 and 0.81864019372431!"
[1] "Tail khats for parameter x[40] are 0.755351342364494 and 0.755351342364494!"
[1] "Tail khats for parameter x[41] are 0.837778274963811 and 0.837778274963811!"
[1] "Tail khats for parameter x[42] are 0.708510582352759 and 0.708510582352759!"
[1] "Tail khats for parameter x[43] are 0.787901735950942 and 0.787901735950942!"
[1] "Tail khats for parameter x[44] are 0.708497741026517 and 0.708497741026517!"
[1] "Tail khats for parameter x[45] are 0.667225719438454 and 0.667225719438454!"
[1] "Tail khats for parameter x[46] are 0.68172988470105 and 0.68172988470105!"
[1] "Tail khats for parameter x[47] are 0.697653241570047 and 0.697653241570047!"
[1] "Tail khats for parameter x[48] are 0.817499201532475 and 0.817499201532475!"
[1] "Tail khats for parameter x[49] are 0.764357011552379 and 0.764357011552379!"
[1] "Tail khats for parameter x[50] are 0.752715754673428 and 0.752715754673428!"
[1] "  Tail khat above 0.5 indicates that the parameter is probably not square integrable"
[1] "0 of 40000 iterations ended with a divergence (0%)"
[1] "0 of 40000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-FMI indicated no pathological behavior"

to the second alternative implementation,

writeLines(readLines("half_cauchy_alt.stan"))

parameters {
  vector<lower=0>[50] x_a;
  vector<lower=0>[50] x_b;
}

transformed parameters {
  vector[50] x = x_a .* sqrt(x_b);
}

model {
  x_a ~ normal(0, 1);
  x_b ~ inv_gamma(0.5, 0.5);
}

fit_half_reparam <- stan(file='half_cauchy_alt.stan', seed=4938483,
                         warmup=1000, iter=11000)

util$check_all_diagnostics(fit_half_reparam)

[1] "n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "Tail khats for parameter x[1] are 0.656038231612564 and 0.656038231612564!"
[1] "Tail khats for parameter x[2] are 0.763325645356467 and 0.763325645356467!"
[1] "Tail khats for parameter x[3] are 0.734399748837252 and 0.734399748837252!"
[1] "Tail khats for parameter x[4] are 0.689803096064259 and 0.689803096064259!"
[1] "Tail khats for parameter x[5] are 0.711650043372225 and 0.711650043372225!"
[1] "Tail khats for parameter x[6] are 0.763114704958051 and 0.763114704958051!"
[1] "Tail khats for parameter x[7] are 0.729253426382291 and 0.729253426382291!"
[1] "Tail khats for parameter x[8] are 0.839431515520477 and 0.839431515520477!"
[1] "Tail khats for parameter x[9] are 0.698458597133945 and 0.698458597133945!"
[1] "Tail khats for parameter x[10] are 0.654650834751838 and 0.654650834751838!"
[1] "Tail khats for parameter x[11] are 0.738482528565819 and 0.738482528565819!"
[1] "Tail khats for parameter x[12] are 0.715332438317195 and 0.715332438317195!"
[1] "Tail khats for parameter x[13] are 0.848345838005493 and 0.848345838005493!"
[1] "Tail khats for parameter x[14] are 0.837531018504202 and 0.837531018504202!"
[1] "Tail khats for parameter x[15] are 0.668441148501527 and 0.668441148501527!"
[1] "Tail khats for parameter x[16] are 0.676632538450146 and 0.676632538450146!"
[1] "Tail khats for parameter x[17] are 0.73970827788225 and 0.73970827788225!"
[1] "Tail khats for parameter x[18] are 0.688050249525289 and 0.688050249525289!"
[1] "Tail khats for parameter x[19] are 0.790103252082743 and 0.790103252082743!"
[1] "Tail khats for parameter x[20] are 0.761731815073177 and 0.761731815073177!"
[1] "Tail khats for parameter x[21] are 0.661814422360661 and 0.661814422360661!"
[1] "Tail khats for parameter x[22] are 0.741641245025071 and 0.741641245025071!"
[1] "Tail khats for parameter x[23] are 0.761187419849707 and 0.761187419849707!"
[1] "Tail khats for parameter x[24] are 0.678026690633304 and 0.678026690633304!"
[1] "Tail khats for parameter x[25] are 0.776539340326557 and 0.776539340326557!"
[1] "Tail khats for parameter x[26] are 0.662069075849418 and 0.662069075849418!"
[1] "Tail khats for parameter x[27] are 0.76641564541638 and 0.76641564541638!"
[1] "Tail khats for parameter x[28] are 0.69687121183678 and 0.69687121183678!"
[1] "Tail khats for parameter x[29] are 0.737822828717598 and 0.737822828717598!"
[1] "Tail khats for parameter x[30] are 0.753320587538848 and 0.753320587538848!"
[1] "Tail khats for parameter x[31] are 0.76094464575522 and 0.76094464575522!"
[1] "Tail khats for parameter x[32] are 0.73804992823803 and 0.73804992823803!"
[1] "Tail khats for parameter x[33] are 0.697983458192785 and 0.697983458192785!"
[1] "Tail khats for parameter x[34] are 0.769859512309621 and 0.769859512309621!"
[1] "Tail khats for parameter x[35] are 0.669966413821939 and 0.669966413821939!"
[1] "Tail khats for parameter x[36] are 0.694629465971976 and 0.694629465971976!"
[1] "Tail khats for parameter x[37] are 0.740834073237828 and 0.740834073237828!"
[1] "Tail khats for parameter x[38] are 0.746030882620063 and 0.746030882620063!"
[1] "Tail khats for parameter x[39] are 0.81864019372431 and 0.81864019372431!"
[1] "Tail khats for parameter x[40] are 0.755351342364494 and 0.755351342364494!"
[1] "Tail khats for parameter x[41] are 0.837778274963811 and 0.837778274963811!"
[1] "Tail khats for parameter x[42] are 0.708510582352759 and 0.708510582352759!"
[1] "Tail khats for parameter x[43] are 0.787901735950942 and 0.787901735950942!"
[1] "Tail khats for parameter x[44] are 0.708497741026517 and 0.708497741026517!"
[1] "Tail khats for parameter x[45] are 0.667225719438454 and 0.667225719438454!"
[1] "Tail khats for parameter x[46] are 0.68172988470105 and 0.68172988470105!"
[1] "Tail khats for parameter x[47] are 0.697653241570047 and 0.697653241570047!"
[1] "Tail khats for parameter x[48] are 0.817499201532475 and 0.817499201532475!"
[1] "Tail khats for parameter x[49] are 0.764357011552379 and 0.764357011552379!"
[1] "Tail khats for parameter x[50] are 0.752715754673428 and 0.752715754673428!"
[1] "  Tail khat above 0.5 indicates that the parameter is probably not square integrable"
[1] "0 of 40000 iterations ended with a divergence (0%)"
[1] "0 of 40000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-FMI indicated no pathological behavior"

Comparing the 10th identical component of our model we see that the two implementations yield equivalent results,

x <- extract(fit_half_nom)$x[,10]
p1 <- hist(x[x < 25], breaks=seq(0, 25, 0.25), plot=FALSE)
p1$counts <- p1$counts / sum(p1$counts)

x <- extract(fit_half_reparam)$x[,10]
p2 <- hist(x[x < 25], breaks=seq(0, 25, 0.25), plot=FALSE)
p2$counts <- p2$counts / sum(p2$counts)

c_light_trans <- c("#DCBCBC80")
c_light_highlight_trans <- c("#C7999980")
c_dark_trans <- c("#8F272780")
c_dark_highlight_trans <- c("#7C000080")

plot(p1, col=c_dark_trans, border=c_dark_highlight_trans,
     main="", xlab="x[10]", yaxt='n', ylab="")
plot(p2, col=c_light_trans, border=c_light_highlight_trans, add=T)

8 Conclusion

Although the nominal implementation of the Cauchy density function frustrates computational algorithms with either bias or slow execution, there exist multiple alternative implementations that yield an equivalent distribution without the excessive cost.

The practical consequences of these implementations, however, may not ultimately be all that significant. Having a Cauchy distribution in the generative model does not necessarily imply that the heavy tails will persist into the posterior distribution. In many cases even a little bit of data can tame the heavy tails, resulting in a pleasant posterior geometry regardless of which implementation of the Cauchy we use. Nevertheless it is up the the user to remain vigilant and monitor for indications of heavy tails and motivation for alternative implementations.

9 Acknowledgements

I thank Aki Vehtari and Junpeng Lao for helpful comments.

A very special thanks to everyone supporting me on Patreon: Aki Vehtari, Alan O’Donnell, Andre Zapico, Andrew Rouillard, Austin Rochford, Avraham Adler, Bo Schwartz Madsen, Bryan Yu, Cat Shark, Charles Naylor, Christopher Howlin, Colin Carroll, Daniel Simpson, David Pascall, David Roher, David Stanard, Ed Cashin, Eddie Landesberg, Elad Berkman, Eric Jonas, Ethan Goan, Finn Lindgren, Granville Matheson, Hernan Bruno, J Michael Burgess, Joel Kronander, Jonas Beltoft Gehrlein, Joshua Mayer, Justin Bois, Lars Barquist, Luiz Carvalho, Marek Kwiatkowski, Matthew Kay, Maurits van der Meer, Maxim Kesin, Michael Dillon, Michael Redman, Noah Silverman, Ole Rogeberg, Oscar Olvera, Paul Oreto, Peter Heinrich, Putra Manggala, Ravin Kumar, Riccardo Fusaroli, Richard Torkar, Robert Frost, Robin Taylor, Sam Petulla, Sam Zorowitz, Seo-young Kim, Seth Axen, Sharan Banagiri, Simon Duane, Stephen Oates, Stijn, Vladislavs Dovgalecs, and yolha.

10 Original Computing Environment

writeLines(readLines(file.path(Sys.getenv("HOME"), ".R/Makevars")))

CC=clang

CXXFLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function -Wno-macro-redefined -Wno-unneeded-internal-declaration
CXX=clang++ -arch x86_64 -ftemplate-depth-256

CXX14FLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function -Wno-macro-redefined -Wno-unneeded-internal-declaration -Wno-unknown-pragmas
CXX14=clang++ -arch x86_64 -ftemplate-depth-256

devtools::session_info("rstan")

─ Session info ──────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 os       macOS Sierra 10.12.6        
 system   x86_64, darwin15.6.0        
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2018-12-03                  

─ Packages ──────────────────────────────────────────────────────────────
 package      * version   date       lib source        
 assertthat     0.2.0     2017-04-11 [1] CRAN (R 3.5.0)
 backports      1.1.2     2017-12-13 [1] CRAN (R 3.5.0)
 base64enc      0.1-3     2015-07-28 [1] CRAN (R 3.5.0)
 BH             1.66.0-1  2018-02-13 [1] CRAN (R 3.5.0)
 callr          3.0.0     2018-08-24 [1] CRAN (R 3.5.0)
 cli            1.0.1     2018-09-25 [1] CRAN (R 3.5.0)
 colorspace     1.3-2     2016-12-14 [1] CRAN (R 3.5.0)
 crayon         1.3.4     2017-09-16 [1] CRAN (R 3.5.0)
 desc           1.2.0     2018-05-01 [1] CRAN (R 3.5.0)
 digest         0.6.18    2018-10-10 [1] CRAN (R 3.5.0)
 fansi          0.4.0     2018-10-05 [1] CRAN (R 3.5.0)
 ggplot2      * 3.1.0     2018-10-25 [1] CRAN (R 3.5.1)
 glue           1.3.0     2018-07-17 [1] CRAN (R 3.5.0)
 gridExtra      2.3       2017-09-09 [1] CRAN (R 3.5.0)
 gtable         0.2.0     2016-02-26 [1] CRAN (R 3.5.0)
 inline         0.3.15    2018-05-18 [1] CRAN (R 3.5.0)
 labeling       0.3       2014-08-23 [1] CRAN (R 3.5.0)
 lattice        0.20-35   2017-03-25 [1] CRAN (R 3.5.1)
 lazyeval       0.2.1     2017-10-29 [1] CRAN (R 3.5.0)
 loo            2.0.0     2018-04-11 [1] CRAN (R 3.5.0)
 magrittr       1.5       2014-11-22 [1] CRAN (R 3.5.0)
 MASS           7.3-50    2018-04-30 [1] CRAN (R 3.5.1)
 Matrix         1.2-14    2018-04-13 [1] CRAN (R 3.5.1)
 matrixStats    0.54.0    2018-07-23 [1] CRAN (R 3.5.0)
 mgcv           1.8-24    2018-06-23 [1] CRAN (R 3.5.1)
 munsell        0.5.0     2018-06-12 [1] CRAN (R 3.5.0)
 nlme           3.1-137   2018-04-07 [1] CRAN (R 3.5.1)
 pillar         1.3.0     2018-07-14 [1] CRAN (R 3.5.0)
 pkgbuild       1.0.2     2018-10-16 [1] CRAN (R 3.5.0)
 plyr           1.8.4     2016-06-08 [1] CRAN (R 3.5.0)
 prettyunits    1.0.2     2015-07-13 [1] CRAN (R 3.5.0)
 processx       3.2.0     2018-08-16 [1] CRAN (R 3.5.0)
 ps             1.2.0     2018-10-16 [1] CRAN (R 3.5.0)
 R6             2.3.0     2018-10-04 [1] CRAN (R 3.5.0)
 RColorBrewer   1.1-2     2014-12-07 [1] CRAN (R 3.5.0)
 Rcpp           0.12.19   2018-10-01 [1] CRAN (R 3.5.0)
 RcppEigen      0.3.3.4.0 2018-02-07 [1] CRAN (R 3.5.0)
 reshape2       1.4.3     2017-12-11 [1] CRAN (R 3.5.0)
 rlang          0.3.0     2018-10-22 [1] CRAN (R 3.5.0)
 rprojroot      1.3-2     2018-01-03 [1] CRAN (R 3.5.0)
 rstan        * 2.18.1    2018-10-16 [1] CRAN (R 3.5.0)
 scales         1.0.0     2018-08-09 [1] CRAN (R 3.5.0)
 StanHeaders  * 2.18.0    2018-10-07 [1] CRAN (R 3.5.0)
 stringi        1.2.4     2018-07-20 [1] CRAN (R 3.5.0)
 stringr        1.3.1     2018-05-10 [1] CRAN (R 3.5.0)
 tibble         1.4.2     2018-01-22 [1] CRAN (R 3.5.0)
 utf8           1.1.4     2018-05-24 [1] CRAN (R 3.5.0)
 viridisLite    0.3.0     2018-02-01 [1] CRAN (R 3.5.0)
 withr          2.1.2     2018-03-15 [1] CRAN (R 3.5.0)

[1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library

Fitting The Cauchy Distribution

Michael Betancourt

January 2018