Writing

Preprints of my research work are posted on the arXiv as much as possible. Highlights include a long but comprehensive introduction to statistical computing and Hamiltonian Monte Carlo targeted at applied researches, and a more theoretical treatment of the geometric foundations of Hamiltonian Monte Carlo.

Recently I have been taking advantage of the notebook environments knitr and Jupyter to develop resources demonstrating important concepts in probabilistic modeling with Stan.
The ultimate goal is a reasonably self-contained treatment demonstrating how to build, evaluate, and utilize probabilistic models that capture domain expertise in applied analyses; the most important prerequisites are a familiarity with calculus and linear algebra.

While relatively mature these resources are still very much dynamic and improving as I receive feedback from readers, so please don’t hesitate to send comments through email or pull requests on the case study GitHub repositories linked below.

For those new to Bayesian modeling I recommend the following progression.

Probability Theory

Foundations of Probability Theory
Conditional Probability Theory
Probability Theory on Product Spaces
Sampling and Monte Carlo
Common Families of Probabilty Densities
Probabilistic Computation
Markov Chain Monte Carlo
Hamiltonian Monte Carlo (coming soon!)

Modeling and Inference

Modeling and Inference
Generative Modeling
Introduction to Stan
Prior Modeling Strategies (coming soon!)
Identifiability and Degeneracies
Principled Model Building Workflow

Modeling Technniques

Hierarchical Modeling
Factor Modeling
Modeling Sparsity
Regression Modeling (coming soon!)
Ordinal Regression
Gaussian Process Modeling

Case Studies

Falling

All available resources, including citation information, are listed below.

(What's the Probabilistic Story) Modeling Glory?

Generative modeling is often suggested as a useful approach for designing probabilistic models that capture the relevant structure of a given application. The specific details of this approach, however, is left vague enough to limit how useful it can actually be in practice. In this case study I present an explicit definition of generative modeling as a way to bridge implicit domain expertise and explicit probabilistic models, motivating a wealth of useful model critique and construction techniques.

View (HTML)
betanalpha/knitr_case_studies/generative_modeling/ (GitHub)
Dependences: R, knitr
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2021). (What’s the Probabilistic Story) Modeling Glory?. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/generative_modeling, commit 99e07946e73d4c45ea757c4cde7882495e6519a8.

Sparsity Blues

As measurements become more complex they often convolve meaningful phenomena with more extraneous phenomena. In order to limit the impact of these irrelevant phenomena on our inferences, and isolate the relevant phenomena, we need models that encourage sparse inferences. In this case study I review the basics of Bayesian inferential sparsity and some of the various strategies for building prior models that incorporate sparsity assumptions.

View (HTML)
betanalpha/knitr_case_studies/modeling_sparsity/ (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2021). Sparsity Blues. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/modeling_sparsity, commit e6bb565825cd3c2533a4b4fee6f1b68a5728956a.

Sampling

Probability theory teaches us that the only well-posed way to extract information from a probability distribution is through expectation values. Unfortunately computing expectation values from abstract probability distributions is no easy feat. In some cases, however, we can readily approximate expectation values by averaging functions of interest over special collections of points known as samples. In this case study we carefully define the concept of a sample from a probability distribution and show how those samples can be used to approximate expectation values through the Monte Carlo method.

View (HTML)
betanalpha/knitr_case_studies/sampling/ (GitHub)
Dependences: R, knitr
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2021). Sampling. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/sampling, commit eb27fbdb5063220f3d335024d2ae5e2bfdf70955.

Factor Modeling

Hierarchical models are a natural way to model heterogeneity across exchangeable contexts, but they become less appropriate when additional information discriminates those contexts. In particular any labels that categorize individual contexts into groups immediately obstructs the full exchangeability, and modeling heterogeneity consistently with these groupings becomes much more subtle. When the contexts are subject to multiple, overlapping categorizations the challenge becomes even more difficult. In this case study we investigate how to generalize exchangeability in the presence of factors that categorize individual contexts and then develop modeling techniques compatible with this generalization.

View (HTML)
betanalpha/knitr_case_studies/factor_modeling/ (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2021). Factor Modeling. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/factor_modeling, commit 6e4566309163ee79f8b7c907e2efce969a96bc54.

Hierarchical Modeling

Hierarchical modeling is a powerful technique for modeling heterogeneity and, consequently, it is becoming increasingly ubiquitous in contemporary applied statistics. Unfortunately that ubiquitous application has not brought with it an equivalently ubiquitous understanding for how awkward these models can be to fit in practice. In this case study we dive deep into hierarchical models, from their theoretical motivations to their inherent degeneracies and the strategies needed to ensure robust computation. We will learn not only how to use hierarchical models but also how to use them robustly.

View (HTML)
betanalpha/knitr_case_studies/hierarchical_modeling/ (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2020). Hierarchical Modeling. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/hierarchical_modeling, commit 27c1d260e9ceca710465dc3b02f59f59b729ca43.

Robust Gaussian Process Modeling

Gaussian processes are a powerful approach for modeling functional behavior, but as with any tool that power can be wielded responsibly only with deliberate care. In this case study I introduce the basic usage of Gaussian processes in modeling applications, and how to implement them in Stan, before considering some of their subtle but ubiquitous inferential pathologies and the techniques for addressing those pathologies with principled domain expertise.

View (HTML)
betanalpha/knitr_case_studies/gaussian_processes/ (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2020). Robust Gaussian Process Modeling. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/gaussian_processes, commit e10083abbcdb65c745f840ab9d2da58229fa9af3.

Identity Crisis

Under ideal conditions only a small neighborhood of model configurations will be consistent with both the observed data and the domain expertise we encode in our prior model, resulting in a posterior distribution that strongly concentrates along each parameter. This not only yields precise inferences and but also facilitates accurate estimation of those inferences. Under more realistic conditions, however, our measurements and domain expertise can be much less informative, allowing our posterior distribution to stretch across more expansive, complex neighborhoods of the model configuration space. These intricate uncertainties complicate not only the utility of our inferences but also our ability to quantify those inferences computationally. In this case study we will explore the theoretical concept of identifiability and its more geometric counterpart degeneracy that better generalizes to applied statistical practice. We will also discuss the principled ways in which we can identify and then compensate for degenerate inferences before demonstrating these strategies in a series of pedagogical examples.

View (HTML)
betanalpha/knitr_case_studies/identifiability (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2020). Identity Crisis. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/identifiability, commit b71d65d44731ce90bbfc769f3cbc8355efac262f.

Towards A Principled Bayesian Workflow (RStan)

Given the posterior distribution derived from a probabilistic model and a particular observation, Bayesian inference is straightforward to implement: inferences, and any decisions based upon them, follow immediately in the form of posterior expectations. Building such a probabilistic model that is satisfactory in a given application, however, is a far more open-ended challenge. In order to ensure robust analyses we need a principled workflow that guides the development of a probabilistic model that is consistent with both our domain expertise and any observed data while also being amenable to accurate computation. In this case study I introduce a principled workflow for building and evaluating probabilistic models in Bayesian inference.

View (HTML)
betanalpha/knitr_case_studies/principled_bayesian_workflow (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2020). Towards A Principled Bayesian Workflow (RStan). Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/principled_bayesian_workflow, commit aeab31509b8e37ff05b0828f87a3018b1799b401.

An Introduction to Stan

Stan is a comprehensive software ecosystem aimed at facilitating the application of Bayesian inference. It features an expressive probabilistic programming language for specifying sophisticated Bayesian models backed by extensive math and algorithm libraries to support automated computation. In this case study I present a thorough introduction to the Stan ecosystem with a particular focus on the modeling language. After a motivating introduction we will review the Stan ecosystem and the fundamentals of the Stan modeling language and the RStan interface. Finally I will demonstrate some more advanced features and debugging techniques in a series of exercises.

View (HTML)
betanalpha/knitr_case_studies/stan_intro (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2020). An Introduction to Stan. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/stan_intro, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Probability Theory on Product Spaces

Once we are given a well-defined probability distribution probability theory tells us how we can manipulate it for our applied needs. What the theory does not tell us, however, is how to construct a useful probability distribution in the first place. The challenge of building probability distributions in high-dimensional spaces becomes much more manageable when the ambient spaces are product spaces constructed from lower-dimensional component spaces. This product structure serves as scaffolding on which we can build up a probability distribution piece by piece. In this case study I introduce probability theory on product spaces, with an emphasis on how this structure facilitates the construction of probability distributions in practice while also motivating convenient notation, vocabulary, and visualizations.

View (HTML)
betanalpha/knitr_case_studies/probability_on_product_spaces (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2020). Probability Theory on Product Spaces. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/probability_on_product_spaces, commit dfb96237152414a4c8c1a5d6c8639da9915c2378.

Falling (In Love With Principled Modeling)

The concept of measurement and inference is often introduced through what are supposed to be simple experiments, such as inferring gravity from the time it takes objects to fall from various heights. The realizations of these experiments in practice, however, are often much more complex than their idealized designs imply, and any principled statistical analysis will have to go into much more detail than one might expect. At the same time these simple experiments can provide an elegant demonstration of many of they key concepts of a principled Bayesian workflow where an initial model based on the theoretical experimental design is continuously expanded to capture all of the important features exhibited by the realization of the experiment. In this case study I attempt to infer the local gravitational acceleration from falling ball measurements, along the way illustrating strategies for identifying model defects and motivating principled model development.

View (HTML)
betanalpha/knitr_case_studies/falling (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2020). Falling (In Love With Principled Modeling). Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/falling, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Markov chain Monte Carlo

Markov chain Monte Carlo is one of our best tools in the desperate struggle against high-dimensional probabilistic computation, but its fragility makes it dangerous to wield without adequate training. In order to use the method responsibly, and ensure accurate quantification of the target distribution, practitioners need to know not just how it works under ideal circumstances but also how it breaks under less ideal, but potentially more common, circumstances. Most importantly a practitioner needs to be able to identify when the method breaks and they shouldn’t trust the results that it gives. In other words a responsible user of Markov chain Monte Carlo needs to know how to manage its risks. In this case study I introducing the critical concepts needed to understand how employ Markov chain Monte Carlo responsibly.

View (HTML)
betanalpha/knitr_case_studies/markov_chain_monte_carlo (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2020). Markov chain Monte Carlo. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/markov_chain_monte_carlo, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Probabilistic Computation

In practice we define probability distributions implicitly through their computational consequences. More formally we define them algorithmically through methods that return expectation values. The sophisticated probability distributions that arise in applied analyses, however, do not often admit exact results and we have to focus instead on numerical algorithms which only _estimate_ the exact expectation values returned by a given probability distribution. Practical probabilistic computation considers the construction of algorithms with well-behaved and well-quantified errors that allow us to understand when we can trust the corrupted answers they give. In this case study I introduce the basics of probabilistic computation with a focus on the challenges that arise as we attempt to scale to problems in more than a few dimenions.

View (HTML)
betanalpha/knitr_case_studies/probabilistic_computation (GitHub)
Dependences: R, knitr
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2019). Probabilistic Computation. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/probabilistic_computation, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Probabilistic Building Blocks

Probability theory defines how we can manipulate and utilize probability distributions, but it provides no guidance for which probability distributions we might want to utilize in a given application. Fortunately we can exploit conditional probability theory to build up high-dimensional probability distributions from many one-dimensional probability distributions about which we can more easily reason. With only a modest collection of probability distributions at their disposal a practitioner has to potential to construct a wide array of sophisticated probability distributions. A cache of interpretable, well-understood probability distributions is a critical piece of any practitioner’s statistical toolkit. In this case study I review common families of probability density functions which provide the foundational elements of this toolkit.

View (HTML)
betanalpha/knitr_case_studies/probability_densities (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2019). Probabilistic Building Blocks. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/probability_densities, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Ordinal Regression

Regression models quantify statistical correlations by allowing latent effects to moderate the distribution of an observed outcome. Ordinal regression models the influence of a latent effect on an ordinal outcome consisting of discrete but ordered categories. In thise cases study I review the mathematical structure of ordinal regression models and their practical implementation in Stan. In the process I also derive a principled prior model that ensures robustness even when ordinal data are only weakly informative.

View (HTML)
betanalpha/knitr_case_studies/ordinal_regression (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2019). Ordinal Regression. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/ordinal_regression, commit 0a987836177477d3fb31e92227854b267aba2f77.

Probabilistic Modeling and Statistical Inference

In this case study we’ll review the foundations of statistical models and statistical inference that advise principled decision making. We’ll place a particular emphasis on Bayesian inference, which utilizes probability theory and statistical modeling to encapsulate information from observed data and our own domain expertise.

View (HTML)
betanalpha/knitr_case_studies/modeling_and_inference (GitHub)
Dependences: R, knitr
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2019). Probabilistic Modeling and Statistical Inference. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/modeling_and_inference, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Underdetermined Linear Regression

A linear regression is underdetermined when there are fewer observations than parameters. In this case the likelihood function does not concentrate on a compact neighborhood but rather an underdetermined hyperplane of degenerate model configurations. The corresponding posterior density has a surprisingly interesting geometry, even with weakly-informative prior densities that ensure a well-defined fit. In this short note I walk through the nature of this geometry and why underdetermined regressions are so hard to fit.

View (HTML)
betanalpha/knitr_case_studies/underdetermined_linear_regression (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2018). Underdetermined Linear Regression. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/underdetermined_linear_regression, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Towards A Principled Bayesian Workflow (PyStan)

Given the posterior distribution derived from a probabilistic model and a particular observation, Bayesian inference is straightforward to implement: inferences, and any decisions based upon them, follow immediately in the form of posterior expectations. Building such a probabilistic model that is satisfactory in a given application, however, is a far more open-ended challenge. In order to ensure robust analyses we need a principled workflow that guides the development of a probabilistic model that is consistent with both our domain expertise and any observed data while also being amenable to accurate computation. In this case study I introduce a principled workflow for building and evaluating probabilistic models in Bayesian inference.

View (HTML)
betanalpha/jupyter_case_studies/principled_bayesian_workflow (GitHub)
Dependences: Python, Jupyter, PyStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2018). Towards A Principled Bayesian Workflow (PyStan). Retrieved from https://github.com/betanalpha/jupyter_case_studies/tree/master/principled_bayesian_workflow, commit 2580fdeac38f77859b2d1c60f9c4a37237864e63.

Conditional Probability Theory (For Scientists and Engineers)

This case study will introduce a conceptual understanding of conditional probability theory and its applications. We’ll begin with a discussion of marginal probability distributions before introducing conditional probability distributions as their complement. Then we’ll examine how different conditional probability distributions can be related to each other through Bayes’ Theorem before considering how all of these objects manifest in probability mass function and probability density function representations. Finally we’ll review some of the important practical applications of the theory.

View (HTML)
betanalpha/knitr_case_studies/conditional_probability_theory (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2018). Conditional Probability Theory (For Scientists and Engineers). Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/conditional_probability_theory, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Probability Theory (For Scientists and Engineers)

Formal probability theory is a rich and complex field of mathematics with a reputation for being confusing if not outright impenetrable. Much of that intimidation, however, is due not to the abstract mathematics but rather how they are employed in practice. In this case study I attempt to untangle this pedagogical knot to illuminate the basic concepts and manipulations of probability theory and how they can be implemented in practice. Our ultimate goal is to demystify what we can calculate in probability theory and how we can perform those calculations in practice.

View (HTML)
betanalpha/knitr_case_studies/probability_theory (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2018). Probability Theory (For Scientists and Engineers). Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/probability_theory, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Fitting the Cauchy

In this case study I review various ways of implementing the Cauchy distribution, from the nominal implementation to alternative implementations aimed at ameliorating these difficulties, and demonstrate their relative performance.

View (HTML)
betanalpha/knitr_case_studies/fitting_the_cauchy (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2018). Fitting the Cauchy. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/fitting_the_cauchy, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

The QR Decomposition for Regression Models

This case study reviews the QR decomposition, a technique for decorrelating covariates and, consequently, the resulting posterior distribution in regression models.

View (HTML)
betanalpha/knitr_case_studies/qr_regression (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2017). The QR Decomposition for Regression Models. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/qr_regression, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Robust PyStan Workflow

This case study demonstrates a proper PyStan workflow that ensures robust inferences with the default dynamic Hamiltonian Monte Carlo algorithm.

View (HTML)
betanalpha/jupyter_case_studies/pystan_workflow (GitHub)
Dependences: Python, Jupyter, PyStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2017). Robust PyStan Workflow. Retrieved from https://github.com/betanalpha/jupyter_case_studies/tree/master/pystan_workflow, commit 2580fdeac38f77859b2d1c60f9c4a37237864e63.

Robust RStan Workflow

This case study demonstrates a proper RStan workflow that ensures robust inferences with the default dynamic Hamiltonian Monte Carlo algorithm.

View (HTML)
betanalpha/knitr_case_studies/rstan_workflow (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2017). Robust RStan Workflow. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/rstan_workflow, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Diagnosing Biased Inference with Divergences

This case study discusses the subtleties of accurate Markov chain Monte Carlo estimation and how divergences can be used to identify biased estimation in practice.

View (HTML)
betanalpha/knitr_case_studies/divergences_and_bias (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2017). Diagnosing Biased Inference with Divergences. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/divergences_and_bias, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

Identifying Bayesian Mixture Models

This case study discusses the common pathologies of Bayesian mixture models as well as some strategies for identifying and overcoming them.

View (HTML)
betanalpha/knitr_case_studies/identifying_mixture_models (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2017). Identifying Bayesian Mixture Models. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/identifying_mixture_models, commit b474ec1a5a79347f7c9634376c866fe3294d657a.

How the Shape of a Weakly Informative Prior Affects Inferences

This case study reviews the basics of weakly-informative priors and how the choice of a specific shape of such a prior affects the resulting posterior distribution.

View (HTML)
betanalpha/knitr_case_studies/weakly_informative_shapes (GitHub)
Dependences: R, knitr, RStan
Code License: BSD (3 clause), Text License: CC BY-NC 4.0
Cite As: Betancourt, Michael (2017). How the Shape of a Weakly Informative Prior Affects Inferences. Retrieved from https://github.com/betanalpha/knitr_case_studies/tree/master/weakly_informative_shapes, commit b474ec1a5a79347f7c9634376c866fe3294d657a.