Transforming Probability Spaces
In Chapter 2 we learned how basic structures, such as metrics and topologies, are transformed when we transform the underlying set. In this chapter we’ll learn how measure-theoretic structures are transformed, including probability distributions, expectation values, and probability density functions.
1 Transforming \sigma-Algebras
We’ve already seen how to transform subsets Chapter 2. Given two spaces X and Y and any function f : X \rightarrow Y we can always push forward subsets in X to subsets in Y by combining the output of each point in the input subset (Figure 1 (a)) \begin{alignat*}{6} f_{*} :\; & 2^{X} & &\rightarrow& \; & 2^{Y} & \\ & \mathsf{x} & &\mapsto& & f_{*} \mathsf{x} = \{ f(x) \mid x \in \mathsf{x} \} &. \end{alignat*} Similarly we can pull back subsets in Y to subsets in X by combining the preimages of every output point (Figure 1 (b)), \begin{alignat*}{6} f^{*} :\; & 2^{Y} & &\rightarrow& \; & 2^{X} & \\ & \mathsf{y} & &\mapsto& & \phi^{*} \mathsf{y} = \{ x \in X \mid f(x) \in \mathsf{y} \} &. \end{alignat*}
There is an asymmetry between these two induced transformations, however, when we consider set operations. Both the pushforward and pullback set maps are compatible with the union operation, \begin{align*} f_{*}(\cup_{i} \mathsf{x}) &= \cup_{i} f_{*}(\mathsf{x}) \\ f^{*}(\cup_{i} \mathsf{y}) &= \cup_{i} f^{*}(\mathsf{y}), \end{align*} but only the pullback map is always compatible with the intersection operation, f^{*}(\cap_{i} \mathsf{y}) = \cap_{i} f^{*}(\mathsf{y}). In general the intersection of any collection of input subsets pushes forward to a subset of the intersection of the individual pushforward subsets (Figure 2). f_{*}(\cup_{i} \mathsf{x}) \subseteq \cup_{i} f_{*}(\mathsf{x}).
This has an immediate consequence for \sigma-algebras: the pushforward of an intersection of measurable input subsets isn’t necessarily a measurable output subset. Consequently a \sigma-algebra of measurable subsets on X doesn’t always push forward into a \sigma-algebra of measurable subsets on Y.
On the other hand a \sigma-algebra on Y does always pull back to a well-behaved \sigma-algebra on X. If \mathcal{Y} is a \sigma-algebra on Y then f^{*} \mathcal{Y} = \{ f^{*}(\mathsf{y}) \mid \mathsf{y} \in \mathcal{Y} \} is referred to as the pullback \sigma-algebra along f or the \sigma-algebra generated by f.
In order for a function f: X \rightarrow Y to preserve the structure of two measurable spaces (X, \mathcal{X}) and (Y, \mathcal{Y}), every measurable subset \mathsf{y} \in \mathcal{Y} needs to pull back to a measurable subset in \mathcal{X}, f^{*}(\mathsf{y}) \in \mathcal{X}, Equivalently f preserves measurable structure only when f^{*} \mathcal{Y} \subseteq \mathcal{X}. Note that this does not require that every measurable subset \mathsf{x} \in \mathcal{X} pushes forward to a measurable subset in \mathcal{Y}; we can safely ignore measurable input subsets without compromising the \sigma-algebra on the output space.
Functions that preserve measurable structure are known as (\mathcal{X}, \mathcal{Y})-measurable functions. When the \sigma-algebras on the input and output space are unambiguous this is often shortened to just measurable functions. I will also use the more compact notation f : (X, \mathcal{X}) \rightarrow (Y, \mathcal{Y}) to denote (\mathcal{X}, \mathcal{Y})-measurable functions.
We’ve already encountered measurable functions in Chapter 5 when introducing measure-informed integration. A real-valued function f : X \rightarrow \mathbb{R} can be integrated on the measure space (X, \mathcal{X}, \mu) if every half-open interval on the output space pulls back to a measurable subset on the input space, f^{*}( \, (-\infty, x) \, ) \in \mathcal{X}. This condition, however, is equivalent to every subset in the Borel \sigma-algebra of the real line, \mathsf{y} \in \mathcal{B}_{\mathbb{R}}, pulling back to a measurable subset on the input space, f^{*}( \mathsf{y} ) \in \mathcal{X}.
In other words what we referred to as “\mathcal{X}-measurable real-valued functions” in Chapter 5 are more formally (\mathcal{X}, \mathcal{B}_{\mathbb{R}})-measurable functions. The former notation takes the Borel \sigma-algebra on the real line for granted, while the latter makes it more explicit. This is a common shorthand – references to “measurable functions” without any specification almost always imply Borel \sigma-algebras on the input and output spaces.
Fortunately this shorthand isn’t too problematic in practice because we will almost always be working with measures defined over Borel \sigma-algebras derived from the topological structure of the relevant spaces. Consequently a function f : X \rightarrow Y mapping the Borel measurable space (X, \mathcal{B}_{X}) into the Borel measurable space (Y, \mathcal{B}_{Y}) might be described as (\mathcal{B}_{X}, \mathcal{B}_{Y})-measurable, Borel measurable, or even just “measurable”.
Continuous functions that respect the topological structure of the input and outputs spaces are always Borel measurable, but so too are functions that are only piece-wise continuous. Ultimately Borel measurability is a much weaker condition than topological continuity because we can map open subsets in the output space into not only open subsets in the output space, but also closed subsets in the output space and even any subset that we can derive from unions and intersections of open and closed subsets in the input space.
When working with finite-dimensional spaces in practice it is safe to assume that not only all but the most pathological subsets are measurable but also that all but the most pathological functions are measurable. Infinite-dimensional spaces are another matter, but that those spaces will largely be outside of the scope of this book.
2 Transforming Measures
Conveniently the pullback of measurable subsets allows us to pushforward measures from the input space to a compatible measure on the output space. Given a (\mathcal{X}, \mathcal{Y})-measurable function f : X \rightarrow Y any measure \mu : \mathcal{X} \rightarrow [0, \infty] defines a pushforward measure by the allocations \begin{alignat*}{6} f_{*} \mu :\; & \mathcal{Y} & &\rightarrow& \; & \mathbb{R}^{+} & \\ & \mathsf{y} & &\mapsto& & f_{*} \mu (\mathsf{y}) = \mu(f^{*}(\mathsf{y})) &. \end{alignat*} In words the pushforward measure allocated to any measurable subset on the output space \mathsf{y} \in \mathcal{Y} is computed by pulling the subset back to the input space f^{*}(\mathsf{y}) \in \mathcal{X} and then evaluating the initial measure, \mu(f^{*}(\mathsf{y})) (Figure 4).
The exact interpretation of a pushforward measure will depend on the interpretation of the input measure and the transformation. Consider, for example, a probability distribution \pi defined on the input space that we interpret as quantifying uncertainty. This probability distribution captures our uncertainty about the inputs to a function f while the pushforward probability distribution f_{*} \pi quantifies the corresponding uncertainty in the output of the function. In other words the pushforward transformation propagates the initial uncertainty through the deterministic mapping.
At the same time certain functions can endow the corresponding pushforward measures with particular interpretations.
2.1 Pushforward Terminology
Pushforward measures are ubiquitous in applied probability theory, although they are often better known by other names.
For example consider a finite input space X, X = \{ \blacksquare, \clubsuit, \bigcirc, \diamondsuit, \triangle, \bowtie \}, a finite output space Y, Y = \{ \heartsuit, \spadesuit, \bigstar \}, and a function f : X \rightarrow Y defined by the relations (Figure 5 (a)) \begin{align*} f(\blacksquare) &= \spadesuit \\ f(\clubsuit) &= \heartsuit \\ f(\bigcirc) &= \bigstar \\ f(\diamondsuit) &= \spadesuit \\ f(\triangle) &= \heartsuit \\ f(\bowtie) &= \spadesuit \end{align*}
These relationships between input and output points become particularly well-organized when we arrange the input elements into a table, with each row collecting all of the input elements that map to a particular output element (Figure 5 (b)). Conveniently the pushforward measure allocated to each output atomic subset is then given by summing the input atomic subset allocations in each the corresponding row (Figure 5 (c), Figure 5 (c)). In other words the pushforward allocations fit nicely into the margins of the table.
Historically these kinds of graphical organizations motivated the term marginal measure to describe pushforward measures, or marginal probability distribution in the case of an input probability distribution. Today this terminology is common even when the input and output spaces are not finite, and the tabular representation of functions isn’t quite as useful (Figure 6).