Generative modeling is often suggested as a useful approach for designing probabilistic models that capture the relevant structure of a given application. The specific details of this approach, however, is left vague enough to limit how useful it can actually be in practice. In this case study I present an explicit definition of generative modeling as a way to bridge implicit domain expertise and explicit probabilistic models, motivating a wealth of useful model critique and construction techniques.

1 Generating What Now?

An immediate problem with the concept of "generative modeling" is that the term isn't always used consistently. In particular what makes a model "generative" can be very different in fields dominated by machine learning and fields dominated by contemporary applied statistics. To construct any self-consistent formalization of the term we have to be careful to differentiate between these two colloquial uses.

1.1 Generative As Sampling

In machine learning a generative model is typically defined as a probabilistic model of all quantities that vary from observation to observation; in other words a model over the entire observational space, \(Y\) [1]. For example given an observational space parameterized by two variables \(y = (y_{1}, y_{2})\) the conditional model \(\pi(y_{1} | y_{2} ; \theta)\) would not be generative because it lacks a probabilistic model for \(y_{2}\). Such incomplete model specifications, often denoted discriminative models, commonly arise in regression modeling where the observational space \(Y \times X\) separates into variates, \(y \in Y\), and covariates, \(x \in X\), and only the conditional relationship \(\pi(y | x; \theta)\) is modeled.

One immediate benefit of a model \(\pi(y; \theta)\) that spans the entire observational space is that, at least in theory, we can construct exact sampling mechanisms for all of the variables in the observational space, \[ \tilde{y} \sim \pi(y; \theta), \] for each model configurations \(\theta\). For example the non-generative conditional model \(\pi(y_{1} | y_{2} ; \theta)\) admits sampling mechanisms for \(y_{1}\) only once \(y_{2}\) has been fixed to a particular value, but we have no natural way to choose such a value. The generative model \(\pi(y_{1} | y_{2} ; \theta) \pi(y_{2}; \theta)\), however, admits sampling mechanisms for \(y_{2}\) which then enables ancestral sampling.

Indeed the phrase "generative model" is often used colloquially to imply probabilistic modelings equipped, if not outright defined, by an explicit, exact sampling mechanism from which we can "generate" samples in practice. I will refer to this notion as procedurally generative.

One way to construct explicit, exact sampling algorithms is to utilize a conditional decomposition of the observational model \(\pi(y; \theta)\) into a sequence of one-dimensional conditional probability distributions, \[ \pi(y_{1}, \ldots, y_{n}, \ldots, y_{N}; \theta) = \left[ \prod_{n = 2}^{N} \pi(y_{n} | y_{1}, \ldots, y_{n - 1}; \theta) \right] \, \pi(y_{1}; \theta), \] where each \(y_{n}\) is a one-dimensional variable. If we can efficiently generate exact samples from these one-dimensional distributions then we can generate exact samples from the entire model through ancestral sampling.

In contemporary machine learning ancestral sampling mechanisms have largely given way to pushforward sampling mechanisms. Here the observational model \(\pi(y; \theta)\) is defined only implicitly as the pushforward of some simple base distribution, \(\pi(x)\), along a family of complicated transformations, \(\phi_{\theta} : X \rightarrow Y\), \[ \pi(y; \theta) = (\phi_{\theta})_{*} \pi(x). \] If the base distribution is engineered to admit an efficient exact sampling mechanism then we can immediately generate exact samples from these pushforward distributions by applying the transformations to base samples, \[ \begin{align*} \tilde{x} &\sim \pi(x) \\ \tilde{y} &= \phi_{\theta}(\tilde{x}). \end{align*} \] These families of complex transformations are often built out of popular machine learning techniques for function approximation, such as neural networks and kernel methods, resulting in popular methods such as Generative Adversarial Networks [2].

A procedurally generative Bayesian model \(\pi(y, \theta)\) requires an exact sampling mechanism that samples the variables in the observational space and the model configuration space. For example by exploiting the conditional decomposition \[ \pi(y, \theta) = \pi(y | \theta) \, \pi(\theta) \] we can build a procedurally generative Bayesian model by combining a procedurally generative observational model \(\pi(y | \theta)\), as discussed above, with a procedurally generative prior model \(\pi(\theta)\). Because we cannot generate samples from unnormalizeable prior models, such as those specified by uniform density functions over the real line, these improper prior models immediately obstruct procedurally generativity.

That said procedural generation doesn't require that we utilize this particular conditional decomposition. Any conditional decomposition of the full Bayesian model \(\pi(y, \theta)\) into well-defined, one-dimensional conditional probability distributions will admit ancestral sampling, and there are many of these conditional decompositions to consider. For example a relatively low-dimensional model \(\pi(y_{1}, y_{2}, \theta_{1}, \theta_{2})\) admits 24 different one-dimensional conditional decompositions!