This case study has been deprecated; the current version can be found here.

Part 1: Introduction to Gaussian Processes
Part 2: Optimizing Gaussian Processes Hyperparameters
Part 3: Bayesian Inference of Gaussian Processes Hyperparameters

Often data decompose into variates, \(y\), and covariates, \(x\), with the covariates being particularly straightforward to measure. A regression model quantifies the statistical relationship between the variates and covariates so that if new covariates are measured then the corresponding variates can be inferred from the model.

Given the variate-covariate decomposition, the likelihood is more naturally written as \[ \pi (y, x \mid \theta, \phi) = \pi(y \mid x, \theta, \phi) \, \pi(x \mid \theta, \phi).\] If we assume that the distribution of the covariates does not depend on the parameters influencing the distribution of the variates, \[ \pi (y, x \mid \theta, \phi) = \pi(y \mid x, \theta) \, \pi(x \mid \phi),\] then our regression model is completely quantifies in the conditional likelihood \(\pi(y \mid x, \theta)\).

In general a conditional likelihood can be constructed by first assuming a model for the statistical variation in the variates, \(\pi(y \mid \theta_1, \ldots, \theta_N)\), and then replacing one or more of the parameters with deterministic functions of the covariates, for example, \(\pi(y \mid f(x, \theta_1), \theta_2, \ldots, \theta_N)\). For example, we might model the measurement variation in the variates as Gaussian with standard deviation \(\sigma\) and mean given by a linear function of the covariates, \(\mu = \beta \cdot x + \alpha\).

This is a fine model if the data are consistent with a linear relationship,

but becomes much less satisfactory when there are more complex interactions evident in the the data,

We can model more complex regression interactions by including a more complex mean function, such as a high-order polynomial of the input covariates, but these regression models quickly become ungainly to fit and often prone to overfitting, where the model misinterprets the statistical variation in the variates as deterministic behavior.

Bayesian methods are a particularly useful way to regularize overfitting, but principled prior assignment on such polynomial expansions can be awkward. Fortunately we can avoid parametric functions entirely and make inferences over entire spaces of functions using Gaussian processes.

In this series of case studies we will learn about the very basics of Gaussian processes, how to implement them in Stan, and how they can be robustly incorporated into Bayesian models to avoid subtle but pathological behavior.

1 Gaussian Processes

Formally Gaussian processes define probability distributions over infinite-dimensional Hilbert spaces of functions. A loose way of interpreting these spaces is having each dimension correspond to the function output at one of the infinite covariate inputs, \(f(x)\). The Gaussian process defines a distribution over those infinite outputs. In particular, if we sample from a Gaussian process then we obtain an infinite number of point outputs that stitch together to recover an entire function.

As with a finite-dimensional Gaussian distribution, Gaussian processes are specified with a mean and covariance, \[ f \sim \mathcal{GP}(m, k),\] only these are now given by functions instead of vectors and matrices.

1.1 Mean Functions

The mean function of a Gaussian process, \(m(x)\), defines the base function around which all of the realizations of the Gaussian process will be distributed. In other words, if we average an infinite number of realizations of the Gaussian process then we recover the mean function.

Typically we don’t have any strong information about the mean of our regression model in which case we default to the zero function that returns zero for all covariate inputs. Even if we do have information about the average function behavior, however, we can always decouple it from the Gaussian process by linearity. In other words, we will often use Gaussian processes not to model the regression directly but rather to model residual deviation around some parametric regression model.

In this case study we will assume a zero mean function.

1.2 Covariance Functions

The covariance function or covariance kernel, \(k(x_1, x_2)\) controls how the realizations of a Gaussian process vary around the mean function. The larger to covariance between two covariates, \(x_1\) and \(x_2\), the less their functional outputs, \(f(x_1)\) and \(f(x_2)\), can vary.

There are many other covariance functions in common use, and those common covariance functions can be combined to yield even more covariance functions. Not only is the sum of two covariance functions also a covariance function, so too is their product. Consequently we can build quite sophisticated Gaussian processes by exploiting just a few base covariance functions.

Perhaps the most common covariance function, and the one we will use in this case study exclusively, is the exponentiated quadratic kernel, \[ k(x_1, x_2) = \alpha^{2} \exp \left( - \frac{ (x_1 - x_2)^{2} }{\rho^{2}} \right).\] Unfortunately naming conventions for this kernel are not entirely consistent across fields or even time, and the reader might also find it referred to as a squared exponential kernel, a quadratic exponential kernel, a radial basis function or RBF kernel, and a Gaussian kernel, amongst other names.

The exponentiated quadratic kernel supports smooth functions between covariate inputs and a one-dimensional variate output. The marginal standard deviation, \(\alpha\), controls the expected variation in the variate output while the length scale, \(\rho\), controls how quickly the variate output varies with respect to the distance between covariates. In other words, the marginal standard deviation controls how strong the functions wiggle and the length scale controls who quickly they wiggle.

Keep in mind that the influences of these two hyperparameters are not independent. Even if the length scale is very long, for example, we can increase the marginal standard deviation to specify a Gaussian process with rapid wiggles. Consequently care is needed when interpreting the consequences of a given set of hyperparameters.

1.3 Gaussian Processes in Practice

The concept of a Gaussian process is certainly appealing, but even if the mathematics is well-posed the infinite-dimensional functions supported by Gaussian processes will be too ungainly to manipulate in practice. Fortunately, Gaussian processes have an incredibly useful marginalization property that allows for the relevant parts of those functions to be manipulated with finite computational resources.

In any practical application we only ever consider a finite set of covariates, for example the covariates in our observed data and the covariates for which we want to model variate values. Over such a finite set of inputs a Gaussian process will always marginalizes to a multivariate Gaussian distribution with mean components \[\mu_{i} = m(x_{i})\] and covariance matrix components \[\Sigma_{ij} = k(x_{i}, x_{j}).\] The marginalized covariance matrix is also known as the Gram matrix of the covariance kernel.

Consequently we can perform all of the manipulations of a Gaussian processes that we need in practice using just the familiar properties of multivariate Gaussian distributions.

2 Gaussian Process Regression

Given a regression model such as \[\pi(y \mid x, \sigma) = \mathcal{N} (y \mid f(x), \sigma)\] or \[\pi(y \mid x, \phi) = \mathrm{NegBin2}(y \mid f(x), \phi)\] we introduce a Gaussian process as a prior over the functions \(f(x)\) which defines a joint model over the statistical model for the variates and its functional dependency on the covariates. Given observations we can then learn a Gaussian process posterior over those functional dependencies most consistent with the data.

Conveniently, a Gaussian observation model is conjugate to a Gaussian process prior. In other words if \[y \sim \mathcal{N} (f(x), \sigma)\] \[f \sim \mathcal{GP}(m, k)\] then the posterior is a Gaussian process with analytic mean and covariance functions. The posterior Gaussian process also has mean function \(m\) but the covariance function is modified to \[k'(x_1, x_2) = k(x_1, x_2) + \sigma^{2} \delta(x_1 - x_2),\] where \(\delta(0) = 1\) and vanishes for all other values.

Given observed variate-covariate pairs \(\{y, x\}\) and lone covariates \(x'\) we can then construct a joint multivariate Gaussian distribution over all of the covariates, and then condition on \(y\) to construct the marginal posterior distribution for unobserved variates \(y'\) corresponding to the \(x'\).

If the statistical model for the variates is not Gaussian then we do not have such a clean analytic result. Instead we have to represent the Gaussian process as a latent multivariate Gaussian distribution joint over the covariates with observed and unobserved variates and approximate the posterior for the unobserved variates, say with Markov chain Monte Carlo so adequately provided by Stan.

3 Gaussian Processes in Stan

With a very cursory introduction to Gaussian processes we are now ready to consider their implementation in Stan. As noted above, once we have constructed the marginal mean vector and covariance matrix their implementation reduces to common Gaussian manipulations.

We’ll begin by sampling from a Gaussian process prior and then both analytic and non-analytic Gaussian process posteriors.

3.1 Simulating From A Gaussian Process Prior

Sampling a function from a Gaussian process prior, and then simulating data corresponding to that function, is a straightforward application of a multivariate Gaussian random number generator.

First we take care of some preliminary setup,

library(rstan)
Loading required package: ggplot2
Loading required package: StanHeaders
rstan (Version 2.16.2, packaged: 2017-07-03 09:24:58 UTC, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
source("gp_utility.R")

and then define the Gaussian process hyperparameters, \(\alpha\) and \(\rho\), the Gaussian measurement variation \(\sigma\), and covariates at which we will evaluate the Gaussian process.

alpha_true <- 3
rho_true <- 5.5
sigma_true <- 2

N_total = 501
x_total <- 20 * (0:(N_total - 1)) / (N_total - 1) - 10

simu_data <- list(alpha=alpha_true, rho=rho_true, sigma=sigma_true,
                  N=N_total, x=x_total)

To sample from the Gaussian process we simply construct the marginal covariance matrix, here using the builtin cov_exp_quad function, and then simulate data by adding Gaussian variation around the sampled function. Note that we add a nugget or jitter of \(10^{-10}\) so the marginal covariance matrix before computing its Cholesky decomposition in order to stabilize the numerical calculations. Often the nugget is taking to be square root of the floating point precision, which would be \(10^{-8}\) for double precision calculations.

writeLines(readLines("simu_gauss.stan"))
data {
  int<lower=1> N;
  real x[N];

  real<lower=0> rho;
  real<lower=0> alpha;
  real<lower=0> sigma;
}

transformed data {
  matrix[N, N] cov =   cov_exp_quad(x, alpha, rho)
                     + diag_matrix(rep_vector(1e-10, N));
  matrix[N, N] L_cov = cholesky_decompose(cov);
}

parameters {}
model {}

generated quantities {
  vector[N] f = multi_normal_cholesky_rng(rep_vector(0, N), L_cov);
  vector[N] y;
  for (n in 1:N)
    y[n] = normal_rng(f[n], sigma);
}
simu_fit <- stan(file='simu_gauss.stan', data=simu_data, iter=1,
            chains=1, seed=494838, algorithm="Fixed_param")

SAMPLING FOR MODEL 'simu_gauss' NOW (CHAIN 1).
Iteration: 1 / 1 [100%]  (Sampling)

 Elapsed Time: 0 seconds (Warm-up)
               0.000233 seconds (Sampling)
               0.000233 seconds (Total)

From the 501 sampled points let’s reserve 11 evenly spaced points as observed data and leave the remaining 490 as held-out data.

f_total <- extract(simu_fit)$f[1,]
y_total <- extract(simu_fit)$y[1,]

true_realization <- data.frame(f_total, x_total)
names(true_realization) <- c("f_total", "x_total")

observed_idx <- c(50*(0:10)+1)
N = length(observed_idx)
x <- x_total[observed_idx]
y <- y_total[observed_idx]

plot(x_total, f_total, type="l", lwd=2, xlab="x", ylab="y",
     xlim=c(-10, 10), ylim=c(-10, 10))
points(x_total, y_total, col="white", pch=16, cex=0.6)
points(x_total, y_total, col=c_mid_teal, pch=16, cex=0.4)
points(x, y, col="white", pch=16, cex=1.2)
points(x, y, col="black", pch=16, cex=0.8)

In general we will be interested in predictions over both the observed data and the held-out data.

N_predict <- N_total
x_predict <- x_total
y_predict <- y_total

Per good Stan workflow we save these simulated data in its own file.

stan_rdump(c("N", "x", "y",
             "N_predict", "x_predict", "y_predict",
             "sample_idx"), file="gp.data.R")
Warning in stan_rdump(c("N", "x", "y", "N_predict", "x_predict",
"y_predict", : objects not found: sample_idx
data <- read_rdump("gp.data.R")

stan_rdump(c("f_total", "x_total", "sigma_true"), file="gp.truth.R")

For future comparison, we can also construct the prior data generating process conditioned on the true realization of the Gaussian process.

f_data <- list(sigma=sigma_true, N=N_total, f=f_total)
dgp_fit <- stan(file='simu_gauss_dgp.stan', data=f_data, iter=1000, warmup=0,
                chains=1, seed=5838298, refresh=1000, algorithm="Fixed_param")

SAMPLING FOR MODEL 'simu_gauss_dgp' NOW (CHAIN 1).
Iteration:   1 / 1000 [  0%]  (Sampling)
Iteration: 1000 / 1000 [100%]  (Sampling)

 Elapsed Time: 0 seconds (Warm-up)
               0.056767 seconds (Sampling)
               0.056767 seconds (Total)
plot_gp_pred_quantiles(dgp_fit, data, true_realization,
                       "True Data Generating Process Quantiles")

The concentric intervals here cover probabilities of 20%, 40%, 60%, and 80%.

3.2 Simulating From a Gaussian Process Posterior with Gaussian Observations

Simulating from a Gaussian process posterior is straightforward given the analytic manipulations available. We simply use the posterior kernel to construct a multivariate Gaussian distribution joint over all of the covariates values and then analytically condition on those covariates with observed variates.

writeLines(readLines("predict_gauss.stan"))
functions {
  vector gp_pred_rng(real[] x2,
                     vector y1, real[] x1,
                     real alpha, real rho, real sigma, real delta) {
    int N1 = rows(y1);
    int N2 = size(x2);
    vector[N2] f2;
    {
      matrix[N1, N1] K =   cov_exp_quad(x1, alpha, rho)
                         + diag_matrix(rep_vector(square(sigma), N1));
      matrix[N1, N1] L_K = cholesky_decompose(K);

      vector[N1] L_K_div_y1 = mdivide_left_tri_low(L_K, y1);
      vector[N1] K_div_y1 = mdivide_right_tri_low(L_K_div_y1', L_K)';
      matrix[N1, N2] k_x1_x2 = cov_exp_quad(x1, x2, alpha, rho);
      vector[N2] f2_mu = (k_x1_x2' * K_div_y1);
      matrix[N1, N2] v_pred = mdivide_left_tri_low(L_K, k_x1_x2);
      matrix[N2, N2] cov_f2 =   cov_exp_quad(x2, alpha, rho) - v_pred' * v_pred
                              + diag_matrix(rep_vector(delta, N2));
      f2 = multi_normal_rng(f2_mu, cov_f2);
    }
    return f2;
  }
}

data {
  int<lower=1> N;
  real x[N];
  vector[N] y;

  int<lower=1> N_predict;
  real x_predict[N_predict];

  real<lower=0> rho;
  real<lower=0> alpha;
  real<lower=0> sigma;
}

transformed data {
  matrix[N, N] cov =   cov_exp_quad(x, alpha, rho)
                     + diag_matrix(rep_vector(1e-10, N));
  matrix[N, N] L_cov = cholesky_decompose(cov);
}

parameters {}
model {}

generated quantities {
  vector[N_predict] f_predict = gp_pred_rng(x_predict, y, x, alpha, rho, sigma, 1e-10);
  vector[N_predict] y_predict;

  for (n in 1:N_predict)
    y_predict[n] = normal_rng(f_predict[n], sigma);
}
pred_data <- list(alpha=alpha_true, rho=rho_true, sigma=sigma_true, N=N, x=x, y=y,
                  N_predict=N_predict, x_predict=x_predict)
pred_fit <- stan(file='predict_gauss.stan', data=pred_data, iter=1000, warmup=0,
                     chains=1, seed=5838298, refresh=1000, algorithm="Fixed_param")

SAMPLING FOR MODEL 'predict_gauss' NOW (CHAIN 1).
Iteration:   1 / 1000 [  0%]  (Sampling)
Iteration: 1000 / 1000 [100%]  (Sampling)

 Elapsed Time: 0 seconds (Warm-up)
               6.34754 seconds (Sampling)
               6.34754 seconds (Total)

We can readily visualize posterior Gaussian process, either with sampled realizations,

plot_gp_realizations(pred_fit, data, true_realization,
                     "Posterior Realizations")

or quantiles of those realizations as a function of the input covariate,

plot_gp_quantiles(pred_fit, data, true_realization,
                  "Posterior Quantiles")

Similarly, we can visualize the posterior predictive distribution that incorporates the Poisson measurement variation with realizations,

plot_gp_realizations(pred_fit, data, true_realization,
                     "Posterior Predictive Realizations")

or quantiles,

plot_gp_pred_quantiles(pred_fit, data, true_realization,
                  "Posterior Predictive Quantiles")

3.3 Simulating From a Gaussian Process Conditional on Non-Gaussian Observations

When the observation model is non-Gaussian the posterior Gaussian process no longer has a closed form kernel. In this case we have to construct the multivariate Gaussian distribution joint over all of the covariates within the model itself, and allow the fit to explore the conditional realizations.

Consider, for example, a Poisson observation model where the Gaussian process models the log rate,

writeLines(readLines("simu_poisson.stan"))
data {
  int<lower=1> N;
  real x[N];

  real<lower=0> rho;
  real<lower=0> alpha;
}

transformed data {
  matrix[N, N] cov =   cov_exp_quad(x, alpha, rho)
                     + diag_matrix(rep_vector(1e-10, N));
  matrix[N, N] L_cov = cholesky_decompose(cov);
}

parameters {}
model {}

generated quantities {
  vector[N] f = multi_normal_cholesky_rng(rep_vector(0, N), L_cov);
  int y[N];
  for (n in 1:N)
    y[n] = poisson_log_rng(f[n]);
}
simu_fit <- stan(file='simu_poisson.stan', data=simu_data, iter=1,
            chains=1, seed=494838, algorithm="Fixed_param")

SAMPLING FOR MODEL 'simu_poisson' NOW (CHAIN 1).
Iteration: 1 / 1 [100%]  (Sampling)

 Elapsed Time: 0 seconds (Warm-up)
               0.000265 seconds (Sampling)
               0.000265 seconds (Total)
f_total <- extract(simu_fit)$f[1,]
y_total <- extract(simu_fit)$y[1,]

true_realization <- data.frame(exp(f_total), x_total)
names(true_realization) <- c("f_total", "x_total")

sample_idx <- c(50*(0:10)+1)
N = length(sample_idx)
x <- x_total[sample_idx]
y <- y_total[sample_idx]

data = list("N"=N, "x"=x, "y"=y,
             "N_predict"=N_predict, "x_predict"=x_total, "y_predict"=y_total)

plot(x_total, exp(f_total), type="l", lwd=2, xlab="x", ylab="y",
     xlim=c(-10, 10), ylim=c(0, 10))
points(x_total, y_total, col="white", pch=16, cex=0.6)
points(x_total, y_total, col=c_mid_teal, pch=16, cex=0.4)
points(x, y, col="white", pch=16, cex=1.2)
points(x, y, col="black", pch=16, cex=0.8)

One way of implementing the model in Stan is to pull the covariates with variate observations out of the vector of all of the covariates in order to specify the observation model. Because the model is no longer conjugate we have to fit the latent Gaussian process with Markov chain Monte Carlo in Stan,

writeLines(readLines("predict_poisson.stan"))
data {
  int<lower=1> N_predict;
  real x_predict[N_predict];

  int<lower=1> N_observed;
  int<lower=1, upper=N_predict> observed_idx[N_observed];
  int y_observed[N_observed];

  real<lower=0> rho;
  real<lower=0> alpha;
}

transformed data {
  matrix[N_predict, N_predict] cov =   cov_exp_quad(x_predict, alpha, rho)
                     + diag_matrix(rep_vector(1e-10, N_predict));
  matrix[N_predict, N_predict] L_cov = cholesky_decompose(cov);
}

parameters {
  vector[N_predict] f_tilde;
}

transformed parameters {
  vector[N_predict] log_f_predict = L_cov * f_tilde;
}

model {
  f_tilde ~ normal(0, 1);
  y_observed ~ poisson_log(log_f_predict[observed_idx]);
}

generated quantities {
  vector[N_predict] f_predict = exp(log_f_predict);
  vector[N_predict] y_predict;
  for (n in 1:N_predict)
    y_predict[n] = poisson_log_rng(log_f_predict[n]);
}

Note that I have exploited the non-centered parameterization of the latent multivariate Gaussian which takes advantage of the fact that \[ \mathbf{f} \sim \mathcal{N} ( \boldsymbol{\mu}, \boldsymbol{\Sigma})\] is equivalent to \[ \tilde{\mathbf{f}} \sim \mathcal{N} (0, 1) \] \[ \mathbf{f} = \boldsymbol{\mu} + \mathbf{L} \tilde{\mathbf{f}} \] where \[ \boldsymbol{\Sigma} = \mathbf{L} \mathbf{L}^{T} \] Although these forms define the same distribution for \(\mathbf{f}\), the latter induces a nicer posterior geometry when the information contained in the data is sparse.

pred_data <- list(alpha=alpha_true, rho=rho_true,
                  N_predict=N_predict, x_predict=x_predict,
                  N_observed=N, y_observed=y, observed_idx=observed_idx)
pred_fit <- stan(file='predict_poisson.stan', data=pred_data, seed=5838298, refresh=1000)
Warning: There were 13 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Warning: There were 165 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
Warning: Examine the pairs() plot to diagnose sampling problems

As in the non-Gaussian observation case, once we have fit the posterior Gaussian process we can visualize it with sampled realizations

plot_gp_realizations(pred_fit, data, true_realization,
                     "Posterior Realizations")

or quantiles of those realizations as a function of the input covariate,

plot_gp_quantiles(pred_fit, data, true_realization,
                  "Posterior Quantiles")

Similarly, we can visualize the posterior predictive distribution that incorporates the Gaussian measurement variation with realizations,

plot_gp_realizations(pred_fit, data, true_realization,
                     "Posterior Predictive Realizations")

or quantiles,

plot_gp_pred_quantiles(pred_fit, data, true_realization,
                  "Posterior Predictive Quantiles")

4 Conclusion

Gaussian processes provide a flexible means of modeling non-parametric regression behavior, both when the observation model is Gaussian and non-Gaussian. Moreover, given a choice of mean function and covariate function the implementation of a Gaussian processes is straightforward.

But how exactly do we specify a covariance function? Not only do we have to choose a functional form, we also have to select the hyperparameters. The specific value of these hyperparameters, however, can have a drastic effect on the performance of the resulting Gaussian process. In order to ensure optimal performance we will have to infer the hyperparameters from the observed data. Unfortunately that turns out to be no easy feat.

In Parts 2 and 3 of this case study we’ll investigate how to infer Gaussian process hyperparameters with maximum marginal likelihood and Bayesian inference, paying particular attention to what can go wrong and how we can maintain robust performance.

Part 2: Optimizing Gaussian Processes Hyperparameters Part 3: Bayesian Inference of Gaussian Processes Hyperparameters

5 Acknowledgements

The insights motivating this case study came from a particularly fertile research project with Dan Simpson, Rob Trangucci, and Aki Vehtari.

I thank Dan Simpson, Aki Vehtari, and Rob Trangucci for many helpful comments on the case study.

6 Original Computing Environment

writeLines(readLines(file.path(Sys.getenv("HOME"), ".R/Makevars")))
CXXFLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function -Wno-macro-redefined

CC=clang
CXX=clang++ -arch x86_64 -ftemplate-depth-256
devtools::session_info("rstan")
Session info -------------------------------------------------------------
 setting  value                       
 version  R version 3.4.2 (2017-09-28)
 system   x86_64, darwin15.6.0        
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/New_York            
 date     2017-11-12                  
Packages -----------------------------------------------------------------
 package      * version   date       source        
 BH             1.65.0-1  2017-08-24 CRAN (R 3.4.1)
 colorspace     1.3-2     2016-12-14 CRAN (R 3.4.0)
 dichromat      2.0-0     2013-01-24 CRAN (R 3.4.0)
 digest         0.6.12    2017-01-27 CRAN (R 3.4.0)
 ggplot2      * 2.2.1     2016-12-30 CRAN (R 3.4.0)
 graphics     * 3.4.2     2017-10-04 local         
 grDevices    * 3.4.2     2017-10-04 local         
 grid           3.4.2     2017-10-04 local         
 gridExtra      2.3       2017-09-09 CRAN (R 3.4.1)
 gtable         0.2.0     2016-02-26 CRAN (R 3.4.0)
 inline         0.3.14    2015-04-13 CRAN (R 3.4.0)
 labeling       0.3       2014-08-23 CRAN (R 3.4.0)
 lattice        0.20-35   2017-03-25 CRAN (R 3.4.2)
 lazyeval       0.2.1     2017-10-29 CRAN (R 3.4.2)
 magrittr       1.5       2014-11-22 CRAN (R 3.4.0)
 MASS           7.3-47    2017-02-26 CRAN (R 3.4.2)
 Matrix         1.2-11    2017-08-21 CRAN (R 3.4.2)
 methods      * 3.4.2     2017-10-04 local         
 munsell        0.4.3     2016-02-13 CRAN (R 3.4.0)
 plyr           1.8.4     2016-06-08 CRAN (R 3.4.0)
 R6             2.2.2     2017-06-17 CRAN (R 3.4.0)
 RColorBrewer   1.1-2     2014-12-07 CRAN (R 3.4.0)
 Rcpp           0.12.13   2017-09-28 CRAN (R 3.4.2)
 RcppEigen      0.3.3.3.0 2017-05-01 CRAN (R 3.4.0)
 reshape2       1.4.2     2016-10-22 CRAN (R 3.4.0)
 rlang          0.1.4     2017-11-05 CRAN (R 3.4.2)
 rstan        * 2.16.2    2017-07-03 CRAN (R 3.4.1)
 scales         0.5.0     2017-08-24 CRAN (R 3.4.1)
 StanHeaders  * 2.16.0-1  2017-07-03 CRAN (R 3.4.1)
 stats        * 3.4.2     2017-10-04 local         
 stats4         3.4.2     2017-10-04 local         
 stringi        1.1.5     2017-04-07 CRAN (R 3.4.0)
 stringr        1.2.0     2017-02-18 CRAN (R 3.4.0)
 tibble         1.3.4     2017-08-22 CRAN (R 3.4.1)
 tools          3.4.2     2017-10-04 local         
 utils        * 3.4.2     2017-10-04 local         
 viridisLite    0.2.0     2017-03-24 CRAN (R 3.4.0)