Bayes-by Shower

Author

Michael Betancourt

Published

May 2025

Most epidemiological analyses are subject to a variety of subtle challenges. For example many health outcomes are influenced by exposures not only directly but also indirectly; if we cannot quantify both consistently then we will not be able to make accurate predictions for how health will vary under the various circumstances of interest. Moreover, because these exposures will generally vary across the individuals within any given population the corresponding health outcomes will vary from one population to another.

In this case study we’ll see how Bayesian modeling and inference can be used to manage these difficulties and extract productive insights.

1 Setup

First and foremost we have to set up the local R environment.

par(family="serif", las=1, bty="l",
    cex.axis=1, cex.lab=1, cex.main=1,
    xaxs="i", yaxs="i", mar = c(5, 5, 3, 1))
library(rstan)
rstan_options(auto_write = TRUE)            # Cache compiled Stan programs
options(mc.cores = parallel::detectCores()) # Parallelize chains
parallel:::setDefaultClusterOptions(setup_strategy = "sequential")

Next we’ll load some utility functions into the local environment to facilitate the implementation of Bayesian inference.

util <- new.env()

First we have a suite Markov chain Monte Carlo diagnostics and estimation tools; this code and supporting documentation are both available on GitHub.

source('mcmc_analysis_tools_rstan.R', local=util)

Second we have a suite of probabilistic visualization functions based on Markov chain Monte Carlo estimation. Again the code and supporting documentation are available on GitHub.

source('mcmc_visualization_tools.R', local=util)

2 Data Exploration

For this analysis we will be analyzing fertility across a cohort of male patients. Each patient was recruited through a referral for fertility preservation consultation and observed for the same twelve month period. Many of the referrals were also coincident with a cancer diagnosis.

data <- read_rdump("data/conception.data.R")

names(data)
 [1] "k_trt" "y"     "K_stg" "k_art" "K_rel" "K_trt" "k_tox" "N"     "k_stg"
[10] "k_rel" "K_tox"

The main component of the data is a collection of binary observations indicating whether or not each patient in the observed cohort conceived a child during a particular year. Specifically y_{n} = 0 indicates that the nth patient did not conceive a child while y_{n} = 1 indicates that they conceived at least one child.

About two-thirds of the patients conceived a child.

par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))

util$plot_line_hist(data$y, -0.5, 1.5, 1,
                    xlab="Observed Conception Status")

We can also clarify the relative frequencies of these fertility outcomes by normalizing the summed counts.

par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))

util$plot_line_hist(data$y, -0.5, 1.5, 1, prob=TRUE,
                    xlab="Observed Conception Status")

These fertility outcomes are complemented by clinical and demographic information about the individual patients in the cohort (Table 1). Note that these categorical variables are ordered, with a clear notion of increasing severity. Moreover all of these variables are indexed from 1, even those with only two values; this is helpful when working with 1-indexed programming languages like Stan.

Table 1: The observed data includes clinical and demographic information for each patient.
Variable Name Description Values Value Descriptions
`k_rel` Relationship Status 1 Stable Female Partner
2 No Stable Female Partner
`k_stg` Cancer Stage 1 No Cancer
2 Early Stage Cancer
3 Late Stage Cancer
`k_trt` Treatment Status 1 No Treatment
2 Active Treatment
`k_tox` Treatment Toxicity Status 1 None
2 Low
3 Medium
4 High
par(mfrow=c(2, 2), mar=c(5, 5, 1, 1))

util$plot_line_hist(data$k_rel, 0.5, data$K_rel + 0.5, 1,
                    xlab="Observed Relationship Status")

util$plot_line_hist(data$k_stg, 0.5, data$K_stg + 0.5, 1,
                    xlab="Observed Cancer Stage")

util$plot_line_hist(data$k_trt, 0.5, data$K_trt + 0.5, 1,
                    xlab="Observed Treatment Status")

util$plot_line_hist(data$k_tox, 0.5, data$K_tox + 0.5, 1,
                    xlab="Observed Toxicity Status")

Finally some of the patients in the observed cohort have taken advantage of assisted reproductive technologies, or ART, which can drastically increase fertility. Similar to the fertility outcomes, and unlike the clinical and demographic information, ART participation is 0-1 coded, with 0 indicating no participation and 1 participation.

par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))

util$plot_line_hist(data$k_art, -0.5, 1.5, 1,
                    xlab="Observed ART Status")

Note that these data are not actually real but rather have been simulated from an epidemiologically-motivated true model for demonstration purposes. Consequently these observations are a bit more well-behaved than real data tends to be. Moreover there are no privacy concerns.

3 Model 1

Any epidemiological system is inherently complex. To avoid being overwhelmed by this complexity we’ll start with a relatively simple observational model.

3.1 The Observational Model

Let’s assume that the conception probability is homogeneous across all patients in the observed cohort, p( y_{1}, \ldots, y_{N} \mid q_{C}) = \prod_{n = 1}^{N} p(y_{n} \mid q_{C}) = \prod_{n = 1}^{N} \text{Bernoulli}(y_{n} \mid q_{C}).

3.2 The Prior Model

To elevate this observational model into a full Bayesian model we need to complement it with a prior model for the conception probability.

Prior modeling in a demonstration is always tricky because few, if any, readers will share the same domain expertise. For this analysis let’s just say that the available domain expertise is inconsistent with conception probabilities below 0.10 and above 0.95; these values are not outright impossible but much more extreme than the intermediate values. I will denote this soft constraint as 0.10 \lessapprox q_{C} \lessapprox 0.95.

Next we have to find a probability distribution over the unit interval that is consistent with these thresholds. Conveniently the beta family of probability density functions specifies a diversity of candidate probability distributions over the unit interval.

To select a prior model from these candidates we need to define consistency, although we don’t have to be too precious. I like to define consistency with tail probability conditions, \begin{align*} \delta &= \pi( \, [0.00, 0.10] \, ) = \int_{0.00}^{0.10} \mathrm{d} q_{C} \, \text{beta}(q_{C} \mid \alpha, \beta) \\ 1 - \delta &= \pi( \, [0.95, 1.00] \, ) = \int_{0.95}^{1.00} \mathrm{d} q_{C} \, \text{beta}(q_{C} \mid \alpha, \beta). \end{align*} with \delta = 0.01.

One nice benefit of this definition of consistency is that we can numerically solve for the parameters \alpha and \beta that identify the compatible beta probability density function.

prior\_tune\_beta.stan
functions {
  // Differences between beta tail probabilities
  // and target probabilities
  vector tail_delta(vector y, vector theta,
                    array[] real x_r, array[] int x_i) {
    vector[2] deltas;
    deltas[1] =     beta_cdf(theta[1] | exp(y[1]), exp(y[2])) - 0.01;
    deltas[2] = 1 - beta_cdf(theta[2] | exp(y[1]), exp(y[2])) - 0.01;
    return deltas;
  }
}

data {
  real<lower=0, upper=1>     q_low;  // Lower threshold
  real<lower=q_low, upper=1> q_high; // Upper threshold
}

transformed data {
  vector[2] y_guess = [log(5), log(5)]'; // Initial guess at beta parameters
  vector[2] theta = [q_low, q_high]';    // Target quantiles
  vector[2] y;
  array[0] real x_r;
  array[0] int x_i;

  // Find beta parameters that ensure
  // 1% probability below lower threshold
  // and 1% probability above upper threshold
  y = algebra_solver(tail_delta, y_guess, theta, x_r, x_i);

  print("alpha = ", exp(y[1]));
  print("beta = ", exp(y[2]));
}

generated quantities {
  real alpha = exp(y[1]);
  real beta = exp(y[2]);
}
q_low <- 0.1
q_high <- 0.95

stan(file='stan_programs/prior_tune_beta.stan',
     data=list('q_low' = q_low, 'q_high' = q_high),
     iter=1, warmup=0, chains=1,
     seed=4838282, algorithm="Fixed_param")
alpha = 2.52283
beta = 2.02117

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: Iteration: 1 / 1 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0 seconds (Warm-up)
Chain 1:                0 seconds (Sampling)
Chain 1:                0 seconds (Total)
Chain 1: 
Inference for Stan model: anon_model.
1 chains, each with iter=1; warmup=0; thin=1; 
post-warmup draws per chain=1, total post-warmup draws=1.

      mean se_mean sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
alpha 2.52      NA NA 2.52 2.52 2.52 2.52  2.52     0  NaN
beta  2.02      NA NA 2.02 2.02 2.02 2.02  2.02     0  NaN
lp__  0.00      NA NA 0.00 0.00 0.00 0.00  0.00     0  NaN

Samples were drawn using (diag_e) at Fri May  2 15:47:38 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))

qs <- seq(0, 1, 0.001)
dens <- dbeta(qs, 2.5, 2.0)
plot(qs, dens, type="l", col=util$c_dark, lwd=2,
     xlab="Conception Probability",
     ylab="Prior Density", yaxt='n')

q98 <- seq(q_low, q_high, 0.001)
dens <- dbeta(q98, 2.5, 2.0)
q98 <- c(q98, q_high, q_low)
dens <- c(dens, 0, 0)

polygon(q98, dens, col=util$c_dark, border=NA)

abline(v=q_low,  lwd=3, lty=2, col='#DDDDDD')
abline(v=q_high, lwd=3, lty=2, col='#DDDDDD')

3.3 Posterior Quantification

Putting everything together we can summarize the structure of the full Bayesian model with a directed graph (Figure 1) and implement the full Bayesian model in a Stan program.

Figure 1: A directed graphical model visually summarizes the narratively generative structure of our initial model.
model1.stan
data {
  // Number of observations
  int<lower=1> N;

  // Observed conception status
  // y = 0: No conception
  // y = 1: Conception
  array[N] int<lower=0, upper=1> y;
}

parameters {
  real<lower=0, upper=1> q_C; // Conception probability
}

model {
  // Prior model
  target += beta_lpdf(q_C | 2.5, 2.0); // 0.10 <~ q_C <~ 0.95

  // Observational model
  target += bernoulli_lpmf(y | q_C);

  // Also valid but slightly less efficient
  // for (n in 1:N) {
  //   target += bernoulli_lpmf(y[n] | q_C);
  // }
}

generated quantities {
  // Posterior predictive data
  array[N] int<lower=0, upper=1> y_pred;

  for (n in 1:N) {
    y_pred[n] = bernoulli_rng(q_C);
  }
}

With this Stan program we can then run Markov chain Monte Carlo to extract information about the posterior distribution that can be used to estimate posterior expectation values.

fit <- stan(file="stan_programs/model1.stan",
            data=data, seed=8438338,
            warmup=1000, iter=2024, refresh=0)

Before doing anything else we have to check for any signs that the posterior computation might be inaccurate. Fortunately there are no diagnostic warnings suggesting any problems.

diagnostics <- util$extract_hmc_diagnostics(fit)
util$check_all_hmc_diagnostics(diagnostics)
  All Hamiltonian Monte Carlo diagnostics are consistent with reliable
Markov chain Monte Carlo.
samples1 <- util$extract_expectand_vals(fit)
base_samples <- util$filter_expectands(samples1,
                                       c('q_C'))
util$check_all_expectand_diagnostics(base_samples)
All expectands checked appear to be behaving well enough for reliable
Markov chain Monte Carlo estimation.

3.4 Retrodictive Checks

Next we need to evaluate the adequacy of our model by comparing the behavior of the observed data to the posterior predictive distribution with a posterior retrodictive check.

Here we’ll consider a posterior retrodictive check using the histogram summary statistic that we that we used when exploring the data. Fortunately there is no sign of tension between the observed histogram and the probability distribution of posterior predictive histograms.

par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))

util$plot_hist_quantiles(samples1, 'y_pred', -0.5, 1.5, 1,
                         baseline_values=data$y,
                         xlab="Observed Conception Status")

3.5 Posterior Insights

Flush with confidence in our modeling assumptions, at least in the context of the lone summary statistic we considered, we can examine the resulting posterior inferences.

The posterior distribution concentrates on conception probabilities near 2/3, consistent with the empirical behavior. One immediate benefit of the Bayesian approach is that the posterior distribution captures all of the conception probabilities consistent with the observed data, quantifying uncertainty in our insights.

par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))
util$plot_expectand_pushforward(samples1[['q_C']],
                                25,
                                display_name="Conception Probability")

When examining inferences for probability parameters it’s often helpful to plot the full range of possible values and offer as much context as possible.

par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))
util$plot_expectand_pushforward(samples1[['q_C']],
                                200, flim=c(0, 1),
                                display_name="Conception Probability")

4 Model 2

In practice the adequacy of a model is determined by the criteria we use to critique it. Our initial model adequately captures the aggregate conception behavior across the observed cohort, but that doesn’t mean it will be able to capture finer details.

For example there’s no reason why patient fertility should not vary across most of the available clinical and demographic categories. A male patient in a stable relationship with a female partner is more likely to conceive than one who is not. Similarly more aggressive cancer, and the ancillary toxicity common to most cancer treatments, is likely to reduce fertility and hence conception probability.

Fertility should also vary with treatment, but only as a side effect of treatment toxicity. Because we’re modeling treatment toxicity directly we don’t need to consider heterogeneity across the treatment groups.

The key question for a practical analysis is not whether or not the variations in conception probability are zero but rather whether or not they are large enough to manifest in the observed data. Because our initial model assumes homogeneous conception probabilities, the posterior predictive conception behavior should be the same no matter how we partition the patients. If the heterogeneity in fertility is strong enough then the observed behaviors will fall outside of the posterior predictive uncertainties, indicating the inadequacy of our initial homogeneous model.

There are many ways to stratify retrodictive comparisons across categories. Here I’m going to use the conditional mean summary statistic introduced in Section 2.5 of my Taylor modeling chapter and implemented in my recommended visualization tools.

par(mfrow=c(1, 3), mar=c(5, 5, 1, 1))

pred_names <- sapply(1:data$N, function(n) paste0('y_pred[', n, ']'))

util$plot_conditional_mean_quantiles(samples1, pred_names, data$k_rel,
                                     0.5, data$K_rel + 0.5, 1, data$y,
                                     xlab="Observed Relationship Status",
                                     ylab="Average Conception Status")

util$plot_conditional_mean_quantiles(samples1, pred_names, data$k_stg,
                                     0.5, data$K_stg + 0.5, 1, data$y,
                                     xlab="Observed Cancer Stage",
                                     ylab="Average Conception Status")

util$plot_conditional_mean_quantiles(samples1, pred_names, data$k_tox,
                                     0.5, data$K_tox + 0.5, 1, data$y,
                                     xlab="Observed Toxicity Status",
                                     ylab="Average Conception Status")

Indeed we see clear disagreement between the observed and posterior predictive behavior. This indicates that we need to incorporate systematic variation in fertility in order to adequately model this cohort of patients.

4.1 Observational Model

One of the most productive ways to model variation across a population is to define an interpretable baseline and then model deviations around that baseline. Here we’ll take our baseline to be the subset of patients in a stable relationship, with no cancer diagnosis, and no treatment toxicity, and use the parameter q_{C_{0}} to model the baseline conception probability.

An immediate benefit of this choice of baseline is that fertility should always decrease as we move from the baseline to more extreme patient characteristics. Increasing treatment toxicity, for instance, should never increase fertility. Consequently we can model the conception probability in other patient characteristics as proportional decreases from the baseline conception probability.

More formally for any clinical or demographic grouping we will model the conception probability in the baseline group as q = q_{C_{0}} \, \delta_{1} for \delta_{1} = 1, or equivalently q = q_{C_{0}} \, \exp(-\alpha_{1}) for \alpha_{1} = 0. Then we can model the conception probability in the least extreme group beyond the baseline as q = q_{C_{0}} \, \delta_{2} for 0 \le \delta_{1} \le \delta_{2}, or equivalently q = q_{C_{0}} \, \exp(-\alpha_{2}) for \alpha_{2} \ge \alpha_{1}. Similarly the conception probability in the second least extreme group becomes q = q_{C_{0}} \, \delta_{3} for \delta_{2} \le \delta_{3}, or equivalently q = q_{C_{0}} \, \exp(-\alpha_{3}) for \alpha_{3} \ge \alpha_{2}. I will refer to the \alpha_{k} for each clinical or demographic grouping as impairment parameters.

In order to model the variation across K different groups we need a collection of K positive and ordered parameters that start at zero. 0 = \alpha_{1} < \alpha_{2} < \ldots < \alpha_{k} < \ldots < \alpha_{K}. This multivariate constraint can be tricky to maintain in practice.

Finally we repeat this construction three times to capture the variation in fertility across each of the three patient characteristics under consideration (Figure 2), \begin{align*} q_{C, n} &= q_{C_{0}} \, \exp(-\alpha_{\text{stg}, n} ) \, \exp(-\alpha_{\text{rel}, n} ) \, \exp(-\alpha_{\text{tox}, n} ) \\ &= q_{C_{0}} \, \exp(-\alpha_{\text{stg}, n} -\alpha_{\text{rel}, n} -\alpha_{\text{tox}, n} ). \end{align*} where \begin{align*} \alpha_{\text{stg}, n} &= \boldsymbol{\alpha}_{\text{rel}} [ k_{\text{stg}, n} ] \\ \alpha_{\text{rel}, n} &= \boldsymbol{\alpha}_{\text{rel}} [ k_{\text{rel}, n} ] \\ \alpha_{\text{tox}, n} &= \boldsymbol{\alpha}_{\text{rel}} [ k_{\text{tox}, n} ] \end{align*}

Figure 2: We model fertility heterogeneity by allowing conception probability to vary across cancer stage, relationship status, and treatment toxicity groups. More precisely the conception probability for each patient q_{C, n} is modeled as a baseline conception probability q_{C_{0}} coupled with proportional decreases depending on group membership.

4.2 Prior Model

In this new model we are no longer modeling a population-wide conception probability but rather the conception probability for only the baseline group of patients who are in stable relationships and have not been diagnosed with cancer. If we have additional domain expertise about this smaller group then we can incorporate it into a more informative prior model.

Here let’s say that our domain expertise is inconsistent with baseline conception probabilities below 0.5 and above 0.95, 0.50 \lessapprox q_{C_{0}} \lessapprox 0.95.

q_low <- 0.5
q_high <- 0.95

stan(file='stan_programs/prior_tune_beta.stan',
     data=list('q_low' = q_low, 'q_high' = q_high),
     iter=1, warmup=0, chains=1,
     seed=4838282, algorithm="Fixed_param")
alpha = 12.6454
beta = 3.74419

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: Iteration: 1 / 1 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0 seconds (Warm-up)
Chain 1:                0 seconds (Sampling)
Chain 1:                0 seconds (Total)
Chain 1: 
Inference for Stan model: anon_model.
1 chains, each with iter=1; warmup=0; thin=1; 
post-warmup draws per chain=1, total post-warmup draws=1.

       mean se_mean sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
alpha 12.65      NA NA 12.65 12.65 12.65 12.65 12.65     0  NaN
beta   3.74      NA NA  3.74  3.74  3.74  3.74  3.74     0  NaN
lp__   0.00      NA NA  0.00  0.00  0.00  0.00  0.00     0  NaN

Samples were drawn using (diag_e) at Fri May  2 15:47:44 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
par(mfrow=c(1, 1), mar=c(5, 5, 5, 1))

qs <- seq(0, 1, 0.001)
dens <- dbeta(qs, 12.7, 3.7)
plot(qs, dens, type="l", col=util$c_dark, lwd=2,
     xlab="Baseline Conception Probability",
     ylab="Prior Density", yaxt='n')

q98 <- seq(q_low, q_high, 0.001)
dens <- dbeta(q98, 12.7, 3.7)
q98 <- c(q98, q_high, q_low)
dens <- c(dens, 0, 0)

polygon(q98, dens, col=util$c_dark, border=NA)

abline(v=q_low,  lwd=3, lty=2, col='#DDDDDD')
abline(v=q_high, lwd=3, lty=2, col='#DDDDDD')

To construct a prior model for these new fertility impairment parameters we need to elicit any available domain expertise about the reasonable proportional decreases across groups. For example let’s say that our domain expertise is inconsistent with any decreases below 5\%, 0.05 \lessapprox \delta \lessapprox 1. This requires \begin{align*} 0.05 &\lessapprox \quad\quad \delta \;\; \lessapprox 1 \\ 0.05 &\lessapprox \exp(\alpha) \lessapprox 1 \\ -\log(1) &\lessapprox \quad\quad \alpha \;\;\lessapprox -\log(0.05) \\ 0 &\lessapprox \quad\quad \alpha \;\; \lessapprox -\log(0.05). \end{align*} We can achieve this prior containment with the half-normal prior model p( \alpha ) = \text{half-normal}( \alpha \mid 0, -\log(0.05) / 2.57) that contains 99% of the prior probability between 0 and -\log(0.05).

q_high <- -log(0.05)

par(mfrow=c(1, 1), mar=c(5, 5, 5, 1))

qs <- seq(0, 3.5, 0.001)
dens <- 2 * dnorm(qs, 0, q_high / 2.57 )
plot(qs, dens, type="l", col=util$c_dark, lwd=2,
     xlab="Conception Impairment",
     ylab="Prior Density", yaxt='n')

q98 <- seq(0, q_high, 0.001)
dens <- 2 * dnorm(q98, 0, q_high / 2.57 )
q98 <- c(0, q98, -log(0.05))
dens <- c(0, dens, 0)

polygon(q98, dens, col=util$c_dark, border=NA)

abline(v=q_high,  lwd=3, lty=2, col='#DDDDDD')

4.3 Posterior Quantification

The updated observational and prior models snap together into a more elaborate full Bayesian model (Figure 3).

Figure 3: Our second model replaces the homogeneous conception probability parameter from the first model with the varying outputs of a deterministic function of the clinical and demographic group memberships of each patient.

Note that in the Stan programming language a normal log probability density function is equivalent to a half-normal log probability density function so long as the input variable is constrained to be positive. Consequently we can implement half-normal prior models for the impairment parameters using a normal probability density function.

In order to maintain the assumed constraints on the impairment parameters for each non-baseline group, 0 < \alpha_{2} < \ldots < \alpha_{k} < \ldots < \alpha_{K}, we use the Stan programming language’s positive_ordered variable type. We can then ensure the assumed constraint for all groups including the baselines, 0 = \alpha_{1} < \alpha_{2} < \ldots < \alpha_{k} < \ldots < \alpha_{K}, by prepending the positive_ordered variable with a zero.

model2.stan
data {
  // Number of observations
  int<lower=1> N;

  // Relationship status
  // k = 1: Stable partner
  // k = 2: No partner
  int<lower=1> K_rel;

  // Cancer stage
  // k = 1: No cancer
  // k = 2: Early stage cancer
  // k = 3: Advanced stage cancer
  int<lower=1> K_stg;

  // Toxicity status
  // k = 1: None
  // k = 2: Low
  // k = 3: Medium
  // k = 4: High
  int<lower=1> K_tox;

  // Observed conception status
  // y = 0: No conception
  // y = 1: Conception
  array[N] int<lower=0, upper=1> y;

  // Observed relationship status;
  array[N] int<lower=1, upper=K_rel> k_rel;

  // Observed cancer stage;
  array[N] int<lower=1, upper=K_stg> k_stg;

  // Observed toxicity status;
  array[N] int<lower=1, upper=K_tox> k_tox;
}

parameters {
  // Probability of conception for baseline patients in a stable
  // relationship, no cancer, and no toxicity
  real<lower=0, upper=1> q_C_0;

  // Proportional decreases in conception probability due to
  // non-baseline relationship status, cancer stage, and toxicity
  // status.
  positive_ordered[K_rel - 1] alpha_rel;
  positive_ordered[K_stg - 1] alpha_stg;
  positive_ordered[K_tox - 1] alpha_tox;
}

transformed parameters {
  vector[K_rel] alpha_rel_buff = append_row([0]', alpha_rel);
  vector[K_stg] alpha_stg_buff = append_row([0]', alpha_stg);
  vector[K_tox] alpha_tox_buff = append_row([0]', alpha_tox);
}

model {
  // Prior model
  target += beta_lpdf(q_C_0 | 12.7, 3.7); // 0.50 <~ q_C_0 <~ 0.95

  target += normal_lpdf(alpha_rel | 0, 3 / 2.32); // 0 <~ alpha <~ -log(0.05)
  target += normal_lpdf(alpha_stg | 0, 3 / 2.32); // 0 <~ alpha <~ -log(0.05)
  target += normal_lpdf(alpha_tox | 0, 3 / 2.32); // 0 <~ alpha <~ -log(0.05)

  // Observational model
  target += bernoulli_lpmf(y | q_C_0 * exp(-alpha_rel_buff[k_rel]
                                           -alpha_stg_buff[k_stg]
                                           -alpha_tox_buff[k_tox]));
}

generated quantities {
  // Proportional decreases in conception probability
  vector[K_rel] gamma_rel_buff = exp(-alpha_rel_buff);
  vector[K_stg] gamma_stg_buff = exp(-alpha_stg_buff);
  vector[K_tox] gamma_tox_buff = exp(-alpha_tox_buff);

  // Posterior predictive data
  array[N] int<lower=0, upper=1> y_pred;

  for (n in 1:N) {
    y_pred[n] = bernoulli_rng(q_C_0 * exp(-alpha_rel_buff[k_rel[n]]
                                          -alpha_stg_buff[k_stg[n]]
                                          -alpha_tox_buff[k_tox[n]]));
  }
}
fit <- stan(file="stan_programs/model2.stan",
            data=data, seed=8438339,
            warmup=1000, iter=2024, refresh=0)

The computational diagnostics don’t suggest any problems with our posterior quantification.

diagnostics <- util$extract_hmc_diagnostics(fit)
util$check_all_hmc_diagnostics(diagnostics)
  All Hamiltonian Monte Carlo diagnostics are consistent with reliable
Markov chain Monte Carlo.
samples2 <- util$extract_expectand_vals(fit)
base_samples <- util$filter_expectands(samples2,
                                       c('q_C_0', 'alpha_rel' ,
                                         'alpha_stg', 'alpha_tox'),
                                       check_arrays=TRUE)
util$check_all_expectand_diagnostics(base_samples)
All expectands checked appear to be behaving well enough for reliable
Markov chain Monte Carlo estimation.

4.4 Retrodictive Checks

The retrodictive performance aggregated across the entire population continues to be strong.

par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))

util$plot_hist_quantiles(samples2, 'y_pred', -0.5, 1.5, 1,
                         baseline_values=data$y,
                         xlab="Observed Conception Status")

Now, however, the conditional retrodictive checks across the patient characteristics also look good. This suggests that our model of the variation is adequate, at least for this particular data set.

par(mfrow=c(1, 3), mar=c(5, 5, 1, 1))

pred_names <- sapply(1:data$N, function(n) paste0('y_pred[', n, ']'))

util$plot_conditional_mean_quantiles(samples2, pred_names, data$k_rel,
                                     0.5, data$K_rel + 0.5, 1, data$y,
                                     xlab="Observed Relationship Status",
                                     ylab="Average Conception Status")

util$plot_conditional_mean_quantiles(samples2, pred_names, data$k_stg,
                                     0.5, data$K_stg + 0.5, 1, data$y,
                                     xlab="Observed Cancer Stage",
                                     ylab="Average Conception Status")

util$plot_conditional_mean_quantiles(samples2, pred_names, data$k_tox,
                                     0.5, data$K_tox + 0.5, 1, data$y,
                                     xlab="Observed Toxicity Status",
                                     ylab="Average Conception Status")

4.5 Posterior Insights

The expanded model offers a variety of posterior behaviors to consider. Note that in order to learn the baseline conception probability, and hence variations away from that baseline, we needed to have patients who are not diagnosed with cancer in the observed cohort. If such a cohort is not possible then we would need to inform q_{C_{0}} using a strong prior model informed by domain expertise, previous studies, or a combination of the two.

par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))
util$plot_expectand_pushforward(samples2[['q_C_0']],
                                25,
                                display_name=paste("Baseline",
                                                   "Probability",
                                                   "of Conception"))

Because of their non-linear influence on the patient conception probabilities the impairment parameters can be tricky to correctly interpret.

par(mfrow=c(1, 3), mar=c(5, 5, 1, 1))

names <- sapply(1:data$K_rel,
                function(k) paste0('alpha_rel_buff[', k, ']'))
util$plot_disc_pushforward_quantiles(samples2, names,
                                     xlab="Observed Relationship Status",
                                     ylab="Conception Impairment",
                                     display_ylim=c(-0.05, 1))

names <- sapply(1:data$K_stg,
                function(k) paste0('alpha_stg_buff[', k, ']'))
util$plot_disc_pushforward_quantiles(samples2, names,
                                     xlab="Observed Cancer Stage",
                                     ylab="Conception Impairment",
                                     display_ylim=c(-0.05, 1))

names <- sapply(1:data$K_tox,
                function(k) paste0('alpha_tox_buff[', k, ']'))
util$plot_disc_pushforward_quantiles(samples2, names,
                                     xlab="Observed Toxicity Status",
                                     ylab="Conception Impairment",
                                     display_ylim=c(-0.05, 1))

Fortunately we can always propagate our posterior inferences to the more interpretable proportional decreases, \gamma_{k} = \exp( - \alpha_{k} ).

par(mfrow=c(1, 3), mar=c(5, 5, 1, 1))

names <- sapply(1:data$K_rel,
                function(k) paste0('gamma_rel_buff[', k, ']'))
util$plot_disc_pushforward_quantiles(samples2, names,
                                     xlab="Observed Relationship Status",
                                     ylab=paste("Proportional",
                                                "Probability Decrease"),
                                     display_ylim=c(0, 1))

names <- sapply(1:data$K_stg,
                function(k) paste0('gamma_stg_buff[', k, ']'))
util$plot_disc_pushforward_quantiles(samples2, names,
                                     xlab="Observed Cancer Stage",
                                     ylab=paste("Proportional",
                                                "Probability Decrease"),
                                     display_ylim=c(0, 1))

names <- sapply(1:data$K_tox,
                function(k) paste0('gamma_tox_buff[', k, ']'))
util$plot_disc_pushforward_quantiles(samples2, names,
                                     xlab="Observed Toxicity Status",
                                     ylab=paste("Proportional",
                                                "Probability Decrease"),
                                     display_ylim=c(0, 1))

5 Model 3

Because all of the patient characteristics in the observed cohort are known we did not have to model them to infer heterogeneity in fertility within the cohort. If patient characteristics are prone to missingness or we want to inform behaviors for other patient cohorts, however, then we need to model the patient characteristics. The latter is particularly important if we want to consider hypothetical, sometimes referred to as counterfactual or out-of-sample, cohorts subject to potential interventions.

5.1 Model 3a

Because consistently modeling all of the patient characteristics at the same time can be challenging we’ll walk through the process as deliberately as possible.

5.1.1 Patient Characteristic Observational Model

In general the patient characteristics of any population are defined by a joint probability distribution. For a population of patients characterized by relationship status, cancer stage, treatment status, and treatment toxicity status this means modeling the joint probability density function p( \text{relationship}, \text{stage}, \text{treatment}, \text{toxicity} ).

Modeling joint probability distributions, and all of the complex couplings between the component variables they can manifest, can be overwhelming. One way to make them more manageable is to decompose them into lower-dimensional conditional probability distributions. When this decomposition follows the data generating process these conditional probability distributions become more interpretable and hence more straightforward to model.

For example consider the conditional decomposition (Figure 4) \begin{align*} p( \text{relationship}&, \text{stage}, \text{treatment}, \text{toxicity} ) \\ &=\;\, p( \text{toxicity} \mid \text{relationship}, \text{stage}, \text{treatment} ) \\ &\quad \cdot p( \text{treatment} \mid \text{relationship}, \text{stage} ) \\ &\quad \cdot p( \text{relationship} \mid \text{stage} ) \\ &\quad \cdot p( \text{stage} ). \end{align*}

Figure 4: Conditionally decomposing the joint patient characteristic model allows us to focus on smaller, more interpretable component models.

Treatment toxicity certainly depends on whether or not a patient is undergoing treatment in the first place. Moreover it can depend on the stage of cancer, with more aggressive cancers making a patient more vulnerable to treatment side effects. On the other hand treatment toxicity is not directly influenced by the relationship status of a patient. Consequently the first conditional probability simplifies to p( \text{toxicity} \mid \text{relationship}, \text{stage}, \text{treatment} ) = p( \text{toxicity} \mid \text{stage}, \text{treatment} ).

In theory treatment status could depend on relationship status; for example a long term partner might increase the probability of seeking treatment. For this analysis, however, we will assume that treatment within this particular cohort is determined entirely by clinicians and hence is largely independent of relationship status, p( \text{treatment} \mid \text{relationship}, \text{stage} ) = p( \text{treatment} \mid \text{stage} ).

With these simplifications our patient characteristic model becomes (Figure 5) \begin{align*} p( \text{relationship}&, \text{stage}, \text{treatment}, \text{toxicity} ) \\ &=\;\, p( \text{toxicity} \mid \text{stage}, \text{treatment} ) \\ &\quad \cdot p( \text{treatment} \mid \text{stage} ) \\ &\quad \cdot p( \text{relationship} \mid \text{stage} ) \\ &\quad \cdot p( \text{stage} ). \end{align*}

Figure 5: Our domain expertise eliminates some of the conditional dependencies, simplifying this conditionally decomposition of the joint patient characteristic model.

Finally some of the conditional probabilities are known precisely. If treatment always follows a cancer diagnosis then p( \text{treatment} \mid \text{no cancer} ) = p( k_{\text{trt}} = 2 \mid k_{\text{stg}} = 1 ) = 0, or equivalently p( \text{no treatment} \mid \text{no cancer} ) = p( k_{\text{trt}} = 1 \mid k_{\text{stg}} = 1 ) = 1. Similarly without an active treatment there cannot be any treatment toxicity. Consequently p( \text{no toxicity} \mid \text{stage}, \text{treatment} ) = p( k_{\text{tox}} = 1 \mid k_{\text{stg}}, k_{\text{trt}} = 1) = 1 for all stages.

In practice all of the remaining discrete probabilities can be implemented as simplices. For example p( \text{stage} ) contains three probabilities, one for each stage, which can be implemented with a three-component simplex variable \mathbf{q}_{\text{stg}} = ( q_{\text{stg} = 1}, q_{\text{stg} = 2}, q_{\text{stg} = 3}) with 0 \le q_{\text{stg} = k} \le 1 and \sum_{k = 1}^{K_{\text{stg}}} q_{\text{stg} = k} = 1. Similarly p( \text{relationship} \mid \text{stage} ) can be implemented with three, two-component simplex variables \begin{align*} \mathbf{q}_{\text{rel} \mid \text{stg} = 1} &= ( q_{\text{rel} = 1 \mid \text{stg} = 1}, q_{\text{rel} = 2 \mid \text{stg} = 1} ) \\ \mathbf{q}_{\text{rel} \mid \text{stg} = 2} &= ( q_{\text{rel} = 1 \mid \text{stg} = 2}, q_{\text{rel} = 2 \mid \text{stg} = 2} ) \\ \mathbf{q}_{\text{rel} \mid \text{stg} = 2} &= ( q_{\text{rel} = 1 \mid \text{stg} = 3}, q_{\text{rel} = 2 \mid \text{stg} = 3} ). \end{align*}

The remaining patient characteristics probabilities have to be inferred from observed data using an appropriate categorical observational model (Figure 6), \begin{align*} &\text{categorical}(k_{\text{stg}, n} \mid \mathbf{q}_{\text{stg}}) \\ &\text{categorical}(k_{\text{rel}, n} \, \mid \mathbf{q}_{\text{rel} \mid \text{stg} = k_{\text{stg}, n}}) \\ &\text{categorical}(k_{\text{trt}, n} \, \mid \mathbf{q}_{\text{trt} \mid \text{stg} = k_{\text{stg}, n}}) \\ &\text{categorical}(k_{\text{tox}, n} \mid \mathbf{q}_{\text{tox} \mid \text{stg} = k_{\text{stg}, n}, \text{rel} = k_{\text{rel}, n}}). \end{align*}

Figure 6: The joint patient model is parameterized by a collection of conditional probabilities, each of which can be implemented with simplex variable types.

5.1.2 Patient Characteristic Prior Model

The Dirichlet family of probability density functions provides a convenient diversity of probabilistic models over simplices, which in turn is particularly useful for developing a principled prior model for our patient characteristic model.

For example the Dirichlet probability density function \text{Dirichlet}( q_{1}, \ldots, q_{K} \mid \gamma_{1}, \ldots, \gamma_{K}) with \gamma_{k} = 1 for all k is uniform over the (K - 1)-simplex. This is useful when we have limited domain expertise.

On the other hand the Dirichlet model with \gamma_{k} = \rho_{k} / \tau + 1 concentrates around the baseline probabilities \rho_{1}, \ldots, \rho_{K} with the value \tau determining the strength of that concentration. These configurations are useful when our domain expertise is more informative.

The Dirichlet family can also be configured to concentrate on more extreme behaviors. For example if any of the \gamma_{k} are less than one then the resulting Dirichlet probability density function will concentrate on at least one simplex boundary.

When the properties of any particular Dirichlet model are not clear from inspecting the gamma parameters we can always build intuition by studying samples.

library(colormap)
nom_colors <- c("#DCBCBC", "#C79999", "#B97C7C",
                "#A25050", "#8F2727", "#7C0000")
line_colors <- colormap(colormap=nom_colors, nshades=25)

plot_simplex_samples <- function(gammas, baseline=FALSE, main="") {
  K <- length(gammas)

  plot(1, type="n", main=main,
       xlim=c(0.5, K + 0.5), xlab="Component",
       ylim=c(0, 1), ylab="Probability")

  idxs <- rep(1:K, each=2)

  xs <- sapply(1:length(idxs),
               function(k) if(k %% 2 == 1) idxs[k] - 0.5
                           else            idxs[k] + 0.5)

  for (s in 1:25) {
    q <- rgamma(K, gammas, 1)
    q <- q / sum(q)

    for (k in 1:K) {
      idx1 <- 2 * k - 1
      idx2 <- 2 * k
      lines(xs[idx1:idx2], rep(q[k], 2), col=line_colors[s], lwd=3)
    }
  }

  if (baseline) {
    rho <- (gammas - 1)
    rho <- rho / sum(rho)
    for (k in 1:K) {
      idx1 <- 2 * k - 1
      idx2 <- 2 * k
      lines(xs[idx1:idx2], rep(rho[k], 2), col=util$c_mid_teal, lwd=6)
      lines(xs[idx1:idx2], rep(rho[k], 2), col=util$c_mid_teal, lwd=3)
    }
  }
}
par(mfrow=c(1, 1), mar=c(5, 5, 5, 1))

plot_simplex_samples(c(5, 7, 8, 3, 9, 5), baseline=TRUE)

What do we know about the cohort?

Because patients were included only from referrals for fertility preservation consultation, patients with cancer diagnoses should not dominate. The more domain expertise we have about referral rates the more precisely we can tune an appropriate prior model; here we will assume a weak concentration towards no cancer diagnosis, \boldsymbol{\gamma}_{\text{stg}} = \frac{ \left( \frac{3}{5}, \frac{2}{5}, 0 \right) }{ \frac{1}{5} } + \mathbf{1} = \left(4, 3, 1 \right).

par(mfrow=c(1, 1), mar=c(5, 5, 5, 1))

plot_simplex_samples(c(4, 3, 1), baseline=TRUE, main="Cancer Stage")

In general relationships are strained more and more as cancer progresses to more advanced stages, although here we don’t have too much a priori understanding of just how much, \begin{align*} \boldsymbol{\gamma}_{\text{rel} \mid \text{stg} = 1} &= \frac{ \left( 1, 0 \right) }{ \frac{1}{3} } + \mathbf{1} = \left(4, 1 \right), \\ \boldsymbol{\gamma}_{\text{rel} \mid \text{stg} = 2} &= \frac{ \left( 1, 0 \right) }{ 1 } + \mathbf{1} = \left(2, 1 \right), \\ \boldsymbol{\gamma}_{\text{rel} \mid \text{stg} = 3} &= \left(1, 1 \right). \end{align*}

par(mfrow=c(1, 3), mar=c(5, 5, 5, 1))

plot_simplex_samples(c(4, 1), baseline=TRUE,
                     main="Relationship Status\nNo Cancer Diagnosis")

plot_simplex_samples(c(2, 1), baseline=TRUE,
                     main="Relationship Status\nEarly Stage Cancer")

plot_simplex_samples(c(1, 1), baseline=FALSE,
                     main="Relationship Status\nAdvanced Stage Cancer")

Similarly we expect that treatment becomes more likely as cancer progresses, \begin{align*} \boldsymbol{\gamma}_{\text{trt} \mid \text{stg} = 2} &= \frac{ \left( 0, 1 \right) }{ \frac{1}{2} } + \mathbf{1} = \left(1, 3 \right), \\ \boldsymbol{\gamma}_{\text{trt} \mid \text{stg} = 3} &= \left(0.5, 4 \right). \end{align*}

par(mfrow=c(1, 2), mar=c(5, 5, 5, 1))

plot_simplex_samples(c(1, 3), baseline=TRUE,
                     main="Relationship Status\nEarly Stage Cancer")

plot_simplex_samples(c(0.5, 4), baseline=FALSE,
                     main="Relationship Status\nAdvanced Stage Cancer")

Finally toxicity should increase with cancer stage, but only if a patient is being actively treated, \begin{align*} \boldsymbol{\gamma}_{\text{tox} \mid \text{stg} = 2, \text{trt} = 2} &= \frac{ \left( \frac{3}{6}, \frac{2}{6}, \frac{1}{6}, 0 \right) }{ \frac{1}{6} } + \mathbf{1} = \left(4, 3, 2, 1 \right), \\ \boldsymbol{\gamma}_{\text{tox} \mid \text{stg} = 2, \text{trt} = 2} &= \frac{ \left( \frac{1}{5}, \frac{2}{5}, \frac{2}{5}, 0 \right) }{ \frac{1}{5} } + \mathbf{1} = \left(2, 3, 3, 1 \right), \\ \boldsymbol{\gamma}_{\text{tox} \mid \text{stg} = 3, \text{trt} = 2} &= \frac{ \left( 0, \frac{1}{5}, \frac{2}{5}, \frac{2}{5} \right) }{ \frac{1}{5} } + \mathbf{1} = \left(1, 2, 3, 3 \right). \end{align*}

par(mfrow=c(1, 1), mar=c(5, 5, 5, 1))

title <- "Relationship Status\nNo Cancer Diagnosis, No Treatment"
plot_simplex_samples(c(4, 3, 2, 1), baseline=TRUE,
                     main=title)

par(mfrow=c(1, 1), mar=c(5, 5, 5, 1))

title <- "Relationship Status\nEarly Stage Cancer, Active Treatment"
plot_simplex_samples(c(2, 3, 3, 1), baseline=TRUE,
                     main=title)

par(mfrow=c(1, 1), mar=c(5, 5, 5, 1))
title <- "Relationship Status\nAdvanced Stage Cancer, Active Treatment"
plot_simplex_samples(c(1, 2, 3, 3), baseline=FALSE,
                     main=title)

Overall the full prior model is relatively diffuse, but it does suppress the more unrealistic patient characteristics.

5.1.3 Posterior Quantification

If the fertility heterogeneity is independent of how the patient characteristic groups are populated then our previous model and new patient characteristic model simply compose together (Figure 7). This is by no means a trivial assumption in practice. For example it would be violated if referrals into the observed cohort favored patients who suffered from fertility issues.

Figure 7: When fertility is independent of how the cohort was selected we can simply compose a patient fertility model with a patient characteristic model.

All of these patient characteristic probabilities can be conveniently implemented with the simplex variable types in Stan. The only challenge is organizing all of the simplices effectively.

Also note that when we generate posterior predictive simulations in the generated quantities block we simulate new patient characteristics as well as new fertility outcomes.

model3a.stan
data {
  // Number of observations
  int<lower=1> N;

  // Number of predictions
  int<lower=1> N_pred;

  // Relationship status
  // k = 1: Stable partner
  // k = 2: No partner
  int<lower=1> K_rel;

  // Cancer stage
  // k = 1: No cancer
  // k = 2: Early stage cancer
  // k = 3: Advanced stage cancer
  int<lower=1> K_stg;

  // Treatment status
  // k = 1: No treatment
  // k = 2: Treatment
  int<lower=1> K_trt;

  // Toxicity status
  // k = 1: None
  // k = 2: Low
  // k = 3: Medium
  // k = 4: High
  int<lower=1> K_tox;

  // Observed conception status
  // y = 0: No conception
  // y = 1: Conception
  array[N] int<lower=0, upper=1> y;

  // Observed relationship status;
  array[N] int<lower=1, upper=K_rel> k_rel;

  // Observed cancer stage;
  array[N] int<lower=1, upper=K_stg> k_stg;

  // Observed treatment status;
  array[N] int<lower=1, upper=K_trt> k_trt;

  // Observed toxicity status;
  array[N] int<lower=1, upper=K_tox> k_tox;
}

parameters {
  // Marginal probability of cancer stage
  simplex[K_stg] q_stg;

  // Conditional probability of relationship status given cancer stage
  array[K_stg] simplex[K_rel] q_rel;

  // Conditional probability of treatment status given active cancer stage
  array[K_stg - 1] simplex[K_trt] q_trt_active_stg;

  // Conditional probability of toxicity status given cancer stage and
  // active treatment status
  array[K_stg] simplex[K_tox] q_tox_active_trt;

  // Probability of conception for baseline patients in a stable
  // relationship, no cancer, and no toxicity
  real<lower=0, upper=1> q_C_0;

  // Proportional decreases in conception probability due to
  // non-baseline relationship status, cancer stage, and toxicity
  // status.
  positive_ordered[K_rel - 1] alpha_rel;
  positive_ordered[K_stg - 1] alpha_stg;
  positive_ordered[K_tox - 1] alpha_tox;
}

transformed parameters {
  vector[K_rel] alpha_rel_buff = append_row([0]', alpha_rel);
  vector[K_stg] alpha_stg_buff = append_row([0]', alpha_stg);
  vector[K_tox] alpha_tox_buff = append_row([0]', alpha_tox);

  // Conditional probability of treatment status given cancer stage
  array[K_stg] simplex[K_trt] q_trt = append_array({ [1.0, 0.0]' },
                                                   q_trt_active_stg);

  // Conditional probability of toxicity status given cancer stage and
  // treatment status
  array[K_stg, K_trt] simplex[K_tox] q_tox;
  q_tox[, 1] = { [1, 0, 0, 0]',
                 [1, 0, 0, 0]',
                 [1, 0, 0, 0]' };
  q_tox[, 2] = q_tox_active_trt;
}

model {
  // Prior model
  target += dirichlet_lpdf(q_stg | [4, 3, 1]');
  target += dirichlet_lpdf(q_rel[1] | [4, 1]');
  target += dirichlet_lpdf(q_rel[2] | [2, 1]');
  target += dirichlet_lpdf(q_rel[3] | [1, 1]');
  target += dirichlet_lpdf(q_trt_active_stg[1] | [1, 3]');
  target += dirichlet_lpdf(q_trt_active_stg[2] | [0.5, 4]');
  target += dirichlet_lpdf(q_tox_active_trt[1] | [4, 3, 2, 1]');
  target += dirichlet_lpdf(q_tox_active_trt[2] | [2, 3, 3, 1]');
  target += dirichlet_lpdf(q_tox_active_trt[3] | [1, 2, 3, 3]');

  target += beta_lpdf(q_C_0 | 12.7, 3.7); // 0.50 <~ q_C_0 <~ 0.95

  target += normal_lpdf(alpha_rel | 0, 3 / 2.32); // 0 <~ alpha <~ -log(0.05)
  target += normal_lpdf(alpha_stg | 0, 3 / 2.32); // 0 <~ alpha <~ -log(0.05)
  target += normal_lpdf(alpha_tox | 0, 3 / 2.32); // 0 <~ alpha <~ -log(0.05)

  // Observational model
  for (n in 1:N) {
    target += categorical_lpmf(k_stg[n] | q_stg);
    target += categorical_lpmf(k_rel[n] | q_rel[k_stg[n]]);
    target += categorical_lpmf(k_trt[n] | q_trt[k_stg[n]]);
    target += categorical_lpmf(k_tox[n] | q_tox[k_stg[n],
                                                k_trt[n]]);
  }

  target += bernoulli_lpmf(y | q_C_0 * exp(-alpha_rel_buff[k_rel]
                                           -alpha_stg_buff[k_stg]
                                           -alpha_tox_buff[k_tox]));
}

generated quantities {
  // Posterior predictive data
  array[N_pred] real<lower=0, upper=1> q_pred;
  array[N_pred] int<lower=0, upper=1>  y_pred;

  array[N_pred] int<lower=1, upper=K_rel> k_rel_pred;
  array[N_pred] int<lower=1, upper=K_stg> k_stg_pred;
  array[N_pred] int<lower=1, upper=K_trt> k_trt_pred;
  array[N_pred] int<lower=1, upper=K_tox> k_tox_pred;

  for (n in 1:N_pred) {
    k_stg_pred[n] = categorical_rng(q_stg);
    k_rel_pred[n] = categorical_rng(q_rel[k_stg_pred[n]]);
    k_trt_pred[n] = categorical_rng(q_trt[k_stg_pred[n]]);
    k_tox_pred[n] = categorical_rng(q_tox[k_stg_pred[n],
                                          k_trt_pred[n]]);

    q_pred[n] = q_C_0 * exp(-alpha_rel_buff[k_rel_pred[n]]
                            -alpha_stg_buff[k_stg_pred[n]]
                            -alpha_tox_buff[k_tox_pred[n]]);
    y_pred[n] = bernoulli_rng(q_pred[n]);
  }
}
data$N_pred <- 5000

fit <- stan(file="stan_programs/model3a.stan",
            data=data, seed=8438338,
            warmup=1000, iter=2024, refresh=0)

Although our model is quickly increasing in complexity the computational diagnostics suggest that our posterior computation is keeping up.

diagnostics <- util$extract_hmc_diagnostics(fit)
util$check_all_hmc_diagnostics(diagnostics)
  All Hamiltonian Monte Carlo diagnostics are consistent with reliable
Markov chain Monte Carlo.
samples3a <- util$extract_expectand_vals(fit)
base_samples <- util$filter_expectands(samples3a,
                                       c('q_stg', 'q_trt_active_stg',
                                         'q_tox_active_trt','q_C_0',
                                         'alpha_rel', 'alpha_stg',
                                         'alpha_tox'),
                                       check_arrays=TRUE)
util$check_all_expectand_diagnostics(base_samples)
All expectands checked appear to be behaving well enough for reliable
Markov chain Monte Carlo estimation.

5.1.4 Retrodictive Checks

Posterior retrodictive checks for the patient characteristics all look good, suggesting that the assumptions underlying our patient characteristic model is adequately capturing the details of the observed data.

par(mfrow=c(2, 2), mar=c(5, 5, 1, 1))

util$plot_hist_quantiles(samples3a, 'k_rel_pred',
                         0.5, data$K_rel + 0.5, 1,
                         baseline_values=data$k_rel,
                         xlab="Relationship Status")

util$plot_hist_quantiles(samples3a, 'k_stg_pred',
                         0.5, data$K_stg + 0.5, 1,
                         baseline_values=data$k_stg,
                         xlab="Cancer Stage")

util$plot_hist_quantiles(samples3a, 'k_trt_pred',
                         0.5, data$K_trt + 0.5, 1,
                         baseline_values=data$k_trt,
                         xlab="Treatment Status")

util$plot_hist_quantiles(samples3a, 'k_tox_pred',
                         0.5, data$K_tox + 0.5, 1,
                         baseline_values=data$k_tox,
                         xlab="Toxicity Status")

Moreover the observed conception status is consistent with the full posterior predictive conception status, where we consider variation in the clinical, demographic, and fertility behaviors.

par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))

util$plot_hist_quantiles(samples3a, 'y_pred', -0.5, 1.5, 1,
                         baseline_values=data$y,
                         xlab="Observed Conception Status")

Note that we cannot consider conditional retrodictive checks here as we did previously because the observed data and posterior predictions are now based on different realizations of the patient characteristic variables.

5.1.5 Posterior Insights

Content with the adequacy of our model we can examine the patient characteristic inferences one by one. About half of the observed cohort suffers from cancer.

par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))

names <- sapply(1:data$K_stg,
                function(k) paste0('q_stg[', k, ']'))
util$plot_disc_pushforward_quantiles(samples3a, names,
                                     xlab="Observed Cancer Stage",
                                     ylab="Probability",
                                     display_ylim=c(0, 1))

Stable relationships become less common as cancer progresses.

par(mfrow=c(1, 3), mar=c(5, 5, 1, 1))

for (ks in 1:data$K_stg) {
  names <- sapply(1:data$K_rel,
                  function(k) paste0('q_rel[', ks, ',', k, ']'))
  util$plot_disc_pushforward_quantiles(samples3a, names,
                                       xlab="Observed Relationship Status",
                                       ylab="Probability",
                                       display_ylim=c(0, 1),
                                       main=paste("Cancer Stage ", ks))
}

Conversely treatment becomes more common as cancer progresses.

par(mfrow=c(1, 3), mar=c(5, 5, 1, 1))

for (ks in 1:data$K_stg) {
  names <- sapply(1:data$K_trt,
                  function(k) paste0('q_trt[', ks, ',', k, ']'))
  util$plot_disc_pushforward_quantiles(samples3a, names,
                                       xlab="Observed Treatment Status",
                                       ylab="Probability",
                                       display_ylim=c(0, 1),
                                       main=paste("Cancer Stage ", ks))
}

Treatment toxicity becomes more severe as cancer progresses, at least for patients actively undergoing treatment.

par(mfrow=c(2, 3), mar=c(5, 5, 1, 1))

for (kt in 1:data$K_trt) {
  for (ks in 1:data$K_stg) {
    names <- sapply(1:data$K_tox,
                    function(k) paste0('q_tox[', ks, ',', kt, ',', k, ']'))
    util$plot_disc_pushforward_quantiles(samples3a, names,
                                       xlab="Observed Toxicity Status",
                                       ylab="Probability",
                                       display_ylim=c(0, 1),
                                       main=paste("Cancer Stage ", ks,
                                                  ", Treatment ",  kt))
  }
}