par(family="serif", las=1, bty="l",
cex.axis=1, cex.lab=1, cex.main=1,
xaxs="i", yaxs="i", mar = c(5, 5, 3, 1))
Bae’s Theorem
In 2006 Netflix announced the infamous Netflix Prize. The competition challenged anyone to use a data set of customer movie reviews to inform predictions for a second, held-out set of customer movie reviews. Superficially the Netflix challenge was a mixed success, although from a broader perspective it demonstrated the perils of poorly-chosen metrics for predictive performance and the subtleties of data privacy. Wikipedia summarizes the history well.
Beyond the Netflix Prize itself the corresponding data set is a nice example of some of the problems that can arise in a wide range of practical applications. For instance not only are customer preferences limited to five star ratings but also the interpretation of those ratings are ambiguous and typically not consistent across customers. Some customers are generous with their five star ratings while some are meager with not only their five star ratings but also their four star and sometimes even three star ratings.
In addition to the idiosyncratic rating scales any analysis of this data also has to contend with idiosyncratic customer preferences. Because not every customer will agree on the quality of a given movie we have to decide whether we want to try to learn an aggregate preference across the entire population or the individual customer preferences.
In this chapter I develop a Bayesian analysis of a subset of the Netflix training data set, not in an attempt to win the Netflix Prize decades too late but rather to demonstrate some strategies for approaching these analysis challenges.
Importantly this analysis will not be the first time that Netflix has been associated with Bayesian inference. In 2016 Amy Hogan (@alittlestats) presented her influential Bae’s Theorem, p( \text{chill} \mid \text{Netflix} ) = \frac{ p( \text{Netflix} \mid \text{chill} ) \, p( \text{chill} ) } { p( \text{Netflix} ) }.
1 Setup
As always we begin by setting up our local R
environment.
library(rstan)
rstan_options(auto_write = TRUE) # Cache compiled Stan programs
options(mc.cores = parallel::detectCores()) # Parallelize chains
:::setDefaultClusterOptions(setup_strategy = "sequential") parallel
<- new.env()
util source('mcmc_analysis_tools_rstan.R', local=util)
source('mcmc_visualization_tools.R', local=util)
2 Data Exploration
The full Netflix Prize training data set consisted of 100,480,507 customer-movie pairs, each accompanied by an ordinal rating between one and five “stars”, with one being the worst rating and five being the best. The observed ratings spanned 480,189 anonymized customers and 17,770 movies.
To allow for a more manageable demonstration I reduced the full data set by considering only the first 1000 movies and then randomly subsampling 100 customers and 200 movies with probabilities proportional to the total number of ratings. This left 2,415 total ratings.
Finally to facilitate model implementations, and add another layer of anonymizing obfuscating, I relabeled the selected customers and movies with contiguous indices.
<- read_rdump('data/ratings.data.R')
data
cat(sprintf("%s Customers", data$N_customers))
100 Customers
cat(sprintf("%s Movies", data$N_movies))
200 Movies
cat(sprintf("%s Total Ratings", data$N_ratings))
2415 Total Ratings
Despite the data subsampling favoring customers with more ratings, most of the selected customers rated only a few movies. Overall the training data set is relatively sparse, with only a few customers contributing most of the ratings.
par(mfrow=c(1, 1), mar=c(5, 5, 2, 1))
$plot_line_hist(table(data$customer_idxs),
util-0.5, 95.5, 5,
xlab="Number of Ratings Per Customer")
Similarly most of the selected movies have only a few ratings.
par(mfrow=c(1, 1), mar=c(5, 5, 2, 1))
$plot_line_hist(table(data$movie_idxs),
util-0.5, 55.5, 2,
xlab="Number of Ratings Per Movie")
Warning in check_bin_containment(bin_min, bin_max, values): 2 values (1.0%)
fell above the binning.
This sparsity is even more evident if we visualize the customer-movie pairings.
<- seq(1, data$N_movies, 1)
xs <- seq(1, data$N_customers, 1)
ys <- matrix(0, nrow=data$N_movies, ncol=data$N_customers)
zs
for (n in 1:data$N_ratings) {
$movie_idxs[n], data$customer_idxs[n]] <- 1
zs[data
}
par(mfrow=c(1, 1), mar = c(5, 5, 1, 1))
image(xs, ys, zs, col=c("white", util$c_dark_teal),
xlab="Movie", ylab="Customer")
The observed ratings are slightly biased towards large values, with far more four star ratings than two star ratings. From the data alone, however, we cannot determine whether or not this is because most movies that had been rated were relatively good or because most customers were just generous with their ratings.
par(mfrow=c(1, 1), mar=c(5, 5, 2, 1))
$plot_line_hist(data$ratings,
util-0.5, 6.5, 1, xlab="Ratings")
That said there is substantial heterogeneity in the rating behavior across customers. Customer 70, for example, gave many high ratings while Customer 23 gave many low ratings.
par(mfrow=c(2, 3), mar=c(5, 5, 1, 1))
for (c in c(7, 23, 40, 70, 77, 100)) {
$plot_line_hist(data$ratings[data$customer_idxs == c],
util-0.5, 6.5, 1,
xlab="Rating", main=paste('Customer', c))
}
Similarly we see strong variation in the observed ratings across movies. Movies 117 and 180, for example, are particularly well-reviewed.
par(mfrow=c(2, 3), mar=c(5, 5, 1, 1))
for (m in c(33, 53, 61, 80, 117, 180)) {
$plot_line_hist(data$ratings[data$movie_idxs == m],
util-0.5, 6.5, 1,
xlab="Rating", main=paste('Movie', m))
}
When critiquing any model of these ratings we want to be able to interrogate this variation in rating behavior across customers and movies. Even with the subsampled data, however, visualizing a histogram of ratings for each customer and movie would be far too ungainly. A more scalable, albeit less informative, approach is to compute a scalar summary of the ratings within each group, and then construct a histogram of those stratified summaries.
The only problem here is identifying useful summary statistics. Because ordinal spaces are not equipped with a distinguished metric empirical moments are ill-defined; in particular given any order-preserving map f : \{ 1, 2, 3, 4, 5 \} \rightarrow \mathbb{R} we can construct a distinct empirical mean, \hat{mu}_{f} = \frac{1}{N} \sum_{n = 1}^{N} f(r_{n}), and corresponding higher-order empirical moments.
That said, most of these empirical moments share similar qualitative behaviors. If a stratified histogram is peaked towards central values, for example, then most empirical means will end up somewhere near that peak. Even if empirical moments capture different behaviors, however, they can still be useful for comparing observed and posterior predictive behaviors.
For this case study I’ll assume a metric with equal unit distances between neighboring ordinal elements, with the resulting empirical mean \hat{mu}_{f} = \frac{1}{N} \sum_{n = 1}^{N} r_{n}. We can then stratify this mean by customers and movies and histogram the resulting ensemble of summaries.
par(mfrow=c(1, 2), mar=c(5, 5, 2, 1))
<-
mean_rating_customer sapply(1:data$N_customers,
function(c) mean(data$ratings[data$customer_idxs == c]))
$plot_line_hist(mean_rating_customer,
util0, 6, 1,
xlab="Customer-wise Average Ratings")
<-
mean_rating_movie sapply(1:data$N_movies,
function(m) mean(data$ratings[data$movie_idxs == m]))
$plot_line_hist(mean_rating_movie,
util0, 6, 0.5,
xlab="Movie-wise Average Ratings")
Of course an empirical mean captures only some of the rating behavior within each strata. We can capture more with a summary that is sensitive to the dispersion of ratings in each strata, such as the empirical variance or empirical entropy. Note one nice advantage of the empirical entropy relative to the empirical variance is that it does not require a choice of metric over the ordinal values.
Here let’s go with the empirical variance based on the same assumptions as our empirical mean, or rather a modified empirical variance that defaults to zero when a strata consists of only one value.
<- function(vals) {
safe_var if (length(vals) == 1)
0)
(else
var(vals))
( }
par(mfrow=c(1, 2), mar=c(5, 5, 2, 1))
<-
var_rating_customer sapply(1:data$N_customers,
function(c) safe_var(data$ratings[data$customer_idxs == c]))
$plot_line_hist(var_rating_customer,
util0, 5, 0.5,
xlab="Customer-wise Rating Variances")
<-
var_rating_movie sapply(1:data$N_movies,
function(m) safe_var(data$ratings[data$movie_idxs == m]))
$plot_line_hist(var_rating_movie,
util0, 5, 0.5,
xlab="Movie-wise Rating Variances")
The main limitation with these stratified summary statistics is that they are sensitive to only the marginal variation across movies and customers. In other words they are sensitive to heterogeneity across movies or customers but not heterogeneity across movies and customers at the same time. Unfortunately because each customer-movie pair has at most one rating, and most have no ratings, we can’t just stratify summary statistics by both customer and movie.
One potential compromise is to construct empirical covariances for each pair of customers or movies. For example given our assumed metric the empirical covariance between two movies m_{1} and m_{2} is defined by \hat{\rho}_{m_{1} m_{2}} = \hat{\rho}_{m_{2} m_{1}} = \frac{1}{N_{C} - 1} \sum_{c = 1}^{N_{C}} (r_{c m_{1}} - \hat{\mu}_{m_{1}}) \cdot (r_{c m_{2}} - \hat{\mu}_{m_{2}}) where N_{C} is the number of customers, r_{c m} is the rating given to movie m by customer c, and \hat{\mu}_{m} = \frac{1}{N_{C}} \sum_{c = 1}^{N_{C}} r_{c m}.
The immediate issue is that, because not every customer rates every movie, many of the r_{c m} in these sums will be undefined. All we can do here is limit the sums to the customers who have rated both movies.
More formally let \mathsf{c}_{m} denote the set of customers who have rated movie m and N_{C, m} denote the number of elements in that set. Similarly let \mathsf{c}_{m_{1} m_{2}} denote the set of customers who have rated both movies m_{1} and m_{2} with N_{C, m_{1} m_{2}} the number of elements in that set. Then we can define \hat{\rho}_{m_{1} m_{2}} = \hat{\rho}_{m_{2} m_{1}} = \frac{1}{N_{C, m_{1} m_{2}} - 1} \sum_{c \in \mathsf{c}_{m_{1} m_{2}}} (r_{c m_{1}} - \hat{\mu}_{m_{1}}) \cdot (r_{c m_{2}} - \hat{\mu}_{m_{2}}) with \hat{\mu}_{m} = \frac{1}{N_{C, m}} \sum_{c \in \mathsf{c}_{m}} r_{c m} the movie-wise empirical means that we’ve already constructed.
All of this said given the relative sparsity of the observed ratings the set \mathsf{c}_{m_{1} m_{2}} will be empty for most pairs of movies. Even fewer pairs of movies will have the N_{C, m_{1} m_{2}} > 1 needed for \hat{\rho}_{m_{1} m_{2}} to be well-defined, let alone N_{C, m_{1} m_{2}} large enough for \hat{\rho}_{m_{1} m_{2}} to provide an informative summary.
We can avoid any ill-defined or poorly informative empirical covariances by including only those movie pairs with N_{C, m_{1} m_{2}} sufficiently large enough in the final histogram. This also has the added benefit of reducing the total number of summaries that we have to bin into the final histogram visualization. Here I will require N_{C, m_{1} m_{2}} > 7.
Now that we’ve carefully laid out the math the implementation is relatively straightforward. First we loop over the observed ratings twice, incrementing the partial sums for each pair of movies when appropriate. This gives us \hat{\Sigma}_{m_{1} m_{2}} = \sum_{c \in \mathsf{c}_{m_{1} m_{2}}} (r_{c m_{1}} - \hat{\mu}_{m_{1}}) \cdot (r_{c m_{2}} - \hat{\mu}_{m_{2}}) and N_{C, m_{1} m_{2}} = \sum_{c \in \mathsf{c}_{m_{1} m_{2}}} 1.
<- matrix(0,
covar_rating_movie nrow=data$N_movies,
ncol=data$N_movies)
<- matrix(0,
movie_pair_counts nrow=data$N_movies,
ncol=data$N_movies)
for (n1 in 1:data$N_ratings) {
for (n2 in 1:data$N_ratings) {
if (data$customer_idxs[n1] == data$customer_idxs[n2]) {
<- data$movie_idxs[n1]
m1 <- data$movie_idxs[n2]
m2 <- (data$ratings[n1] - mean_rating_movie[m1]) *
y $ratings[n2] - mean_rating_movie[m2])
(data<- covar_rating_movie[m1, m2] + y
covar_rating_movie[m1, m2] <- covar_rating_movie[m2, m1] + y
covar_rating_movie[m2, m1] <- movie_pair_counts[m1, m2] + 1
movie_pair_counts[m1, m2] <- movie_pair_counts[m2, m1] + 1
movie_pair_counts[m2, m1]
}
} }
Next we compute \hat{\rho}_{m_{1} m_{2}} = \frac{ \hat{\Sigma}_{m_{1} m_{2}} }{ N_{C, m_{1} m_{2}} - 1 } for each pair of movies where N_{C, m_{1} m_{2}} is larger than 7.
<- list()
m_pairs <- c()
covar_rating_movie_filt
for (m1 in 2:data$N_movies) {
for (m2 in 1:(m1 - 1)) {
if (movie_pair_counts[m1, m2] > 7) {
length(m_pairs) + 1]] <- c(m1, m2)
m_pairs[[<- c(covar_rating_movie_filt,
covar_rating_movie_filt /
covar_rating_movie[m1, m2] - 1))
(movie_pair_counts[m1, m2]
}
} }
Finally we bin these values into a histogram that visualizes the range of these partial empirical covariance behaviors.
par(mfrow=c(1, 1), mar=c(5, 5, 2, 1))
$plot_line_hist(covar_rating_movie_filt,
util-4, 4, 0.25,
xlab="Movie-wise Rating Covariances")
In order to construct posterior retrodictive checks later on we will need to select the posterior predictive values for these same selected movie pairs. We might as well construct the appropriate variable names now and have them ready.
<-
covar_rating_movie_filt_names sapply(m_pairs,
function(p) paste0('covar_rating_movie_pred[',
1], ',', p[2], ']')) p[
All of this said I think that there is still a lot of opportunity for better summary statistics in applications like these, such as summaries that are more compatible with the structure of an ordinal space and don’t require the assumption of an arbitrary metric and summaries that better capture couplings between different strata.
3 Homogeneous Customer Model
Now that we’ve familiarized ourselves with the data we can make our first attempt at modeling the data generating process that, well, generated it. Our models will be built on a foundation of ordinal pairwise comparison modeling techniques.
Given a latent logistic probability density function we will use cut points to derive baseline ordinal probabilities for each star rating. This baseline will not be tied to any particular movie but rather a hypothetical default movie implied by the configuration of an induced Dirichlet prior model.
Next we will assume that movies systematically shift these baseline probabilities to lower or higher ratings depending on their quality. This is implemented with affinity parameters for each movie that shift the latent logistic probability density function, and hence the derived ordinal probabilities, based on customer preference. Initially we will assume that the rating behavior is homogeneous across customers so that we need only a single set of cut points to model all of the data.
Lastly assuming that our domain expertise about consumer preferences is exchangeable we will model the movie affinity parameters hierarchically. To avoid degeneracy in the customer-movie comparisons we will need to anchor the population location of this hierarchical model to zero, the same anchor location used in the induced Dirichlet prior model. Given the sparsity of observed ratings we’ll implement this hierarchical model with a monolithic non-centered parameterization.
model1.stan
functions {
// Log probability density function over cut point
// induced by a Dirichlet probability density function
// over baseline probabilities and latent logistic
// density function.
real induced_dirichlet_lpdf(vector c, vector alpha) {
int K = num_elements(c) + 1;
vector[K - 1] Pi = inv_logit(c);
vector[K] p = append_row(Pi, [1]') - append_row([0]', Pi);
// Log Jacobian correction
real logJ = 0;
for (k in 1:(K - 1)) {
if (c[k] >= 0)
2 * log(1 + exp(-c[k]));
logJ += -c[k] - else
2 * log(1 + exp(+c[k]));
logJ += +c[k] -
}
return dirichlet_lpdf(p | alpha) + logJ;
}
}
data {
int<lower=1> N_ratings;
array[N_ratings] int<lower=1, upper=5> ratings;
int<lower=1> N_customers;
array[N_ratings] int<lower=1, upper=N_customers> customer_idxs;
int<lower=1> N_movies;
array[N_ratings] int<lower=1, upper=N_movies> movie_idxs;
}
parameters {
vector[N_movies] gamma_ncp; // Non-centered movie affinities
real<lower=0> tau_gamma; // Movie affinity population scale
ordered[4] cut_points; // Customer rating cut points
}
transformed parameters {
// Centered movie affinities
vector[N_movies] gamma = tau_gamma * gamma_ncp;
}
model {
// Prior model
0, 1);
gamma_ncp ~ normal(0, 5 / 2.57);
tau_gamma ~ normal(
1, 5));
cut_points ~ induced_dirichlet(rep_vector(
// Observational model
ratings ~ ordered_logistic(gamma[movie_idxs], cut_points);
}
generated quantities {
array[N_ratings] int<lower=1, upper=5> rating_pred;
array[N_customers] real mean_rating_customer_pred
0, N_customers);
= rep_array(array[N_customers] real var_rating_customer_pred
0, N_customers);
= rep_array(
array[N_movies] real mean_rating_movie_pred = rep_array(0, N_movies);
array[N_movies] real var_rating_movie_pred = rep_array(0, N_movies);
matrix[N_movies, N_movies] covar_rating_movie_pred;
{array[N_customers] real C = rep_array(0, N_customers);
array[N_movies] real M = rep_array(0, N_movies);
for (n in 1:N_ratings) {
real delta = 0;
int c = customer_idxs[n];
int m = movie_idxs[n];
rating_pred[n] = ordered_logistic_rng(gamma[m], cut_points);
1;
C[c] +=
delta = rating_pred[n] - mean_rating_customer_pred[c];
mean_rating_customer_pred[c] += delta / C[c];
var_rating_customer_pred[c]
+= delta * (rating_pred[n] - mean_rating_customer_pred[c]);
1;
M[m] +=
delta = rating_pred[n] - mean_rating_movie_pred[m];
mean_rating_movie_pred[m] += delta / M[m];
var_rating_movie_pred[m]
+= delta * (rating_pred[n] - mean_rating_movie_pred[m]);
}
for (c in 1:N_customers) {
if (C[c] > 1)
1;
var_rating_customer_pred[c] /= C[c] - else
0;
var_rating_customer_pred[c] =
}for (m in 1:N_movies) {
if (M[m] > 1)
1;
var_rating_movie_pred[m] /= M[m] - else
0;
var_rating_movie_pred[m] =
}
}
{matrix[N_movies, N_movies] counts;
for (m1 in 1:N_movies) {
for (m2 in 1:N_movies) {
0;
counts[m1, m2] = 0;
covar_rating_movie_pred[m1, m2] =
}
}
for (n1 in 1:N_ratings) {
for (n2 in 1:N_ratings) {
if (customer_idxs[n1] == customer_idxs[n2]) {
int m1 = movie_idxs[n1];
int m2 = movie_idxs[n2];
real y = (ratings[n1] - mean_rating_movie_pred[m1])
* (ratings[n2] - mean_rating_movie_pred[m2]);
covar_rating_movie_pred[m1, m2] += y;
covar_rating_movie_pred[m2, m1] += y;1;
counts[m1, m2] += 1;
counts[m2, m1] +=
}
}
}
for (m1 in 1:N_movies) {
for (m2 in 1:N_movies) {
if (counts[m1, m2] > 1)
1;
covar_rating_movie_pred[m1, m2] /= counts[m1, m2] -
}
}
} }
<- stan(file="stan_programs/model1.stan",
fit data=data, seed=8438338,
warmup=1000, iter=2024, refresh=0)
Despite the initial model complexity there are no diagnostic issues indicating suspect computation.
<- util$extract_hmc_diagnostics(fit)
diagnostics $check_all_hmc_diagnostics(diagnostics) util
All Hamiltonian Monte Carlo diagnostics are consistent with reliable
Markov chain Monte Carlo.
<- util$extract_expectand_vals(fit)
samples1 <- util$filter_expectands(samples1,
base_samples c('gamma_ncp',
'tau_gamma',
'cut_points'),
check_arrays=TRUE)
$check_all_expectand_diagnostics(base_samples) util
All expectands checked appear to be behaving well enough for reliable
Markov chain Monte Carlo estimation.
Consequently we’re ready to investigate this model’s retrodictive performance.
The model appears to be flexible enough to capture the behavior of the aggregate ratings.
par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))
$plot_hist_quantiles(samples1, 'rating_pred', -0.5, 6.5, 1,
utilbaseline_values=data$ratings,
xlab="All Ratings")
On the other hand the retrodictive performance is much worse if we look at individual customer behaviors. In particular there is much more heterogeneity in the observed data than what the model can reproduce, which isn’t surprising given that we explicitly assumed homogeneous customer behavior.
par(mfrow=c(2, 3), mar=c(5, 5, 1, 1))
for (c in c(7, 23, 40, 70, 77, 100)) {
<- sapply(which(data$customer_idxs == c),
names function(n) paste0('rating_pred[', n, ']'))
<- util$filter_expectands(samples1, names)
filtered_samples
<- data$ratings[data$customer_idxs == c]
customer_ratings $plot_hist_quantiles(filtered_samples, 'rating_pred',
util-0.5, 6.5, 1,
baseline_values=customer_ratings,
xlab="Ratings",
main=paste('Customer', c))
}
On the other hand the model doesn’t seem to have a problem capturing the heterogeneity in the observed ratings stratified across movies, at least for this quick spot check.
par(mfrow=c(2, 3), mar=c(5, 5, 1, 1))
for (m in c(33, 53, 61, 80, 117, 180)) {
<- sapply(which(data$movie_idxs == m),
names function(n) paste0('rating_pred[', n, ']'))
<- util$filter_expectands(samples1, names)
filtered_samples
<- data$ratings[data$movie_idxs == m]
movie_ratings $plot_hist_quantiles(filtered_samples, 'rating_pred',
util-0.5, 6.5, 1,
baseline_values=movie_ratings,
xlab="Ratings",
main=paste('Movie', m))
}
To better investigate the variation in retrodictive performance, however, we need to look beyond just a few customers and movies. We can examine the behavior of all customers and movies at the same time with histograms of the stratified summary statistics we discussed above.
Here we see that the posterior predictive behavior of the customer-wise empirical means are more narrowly distributed than what we see in the observed data. At the same time the posterior predictive customer-wise empirical variances concentrate at larger values than the observed customer-wise empirical variances.
par(mfrow=c(2, 2), mar=c(5, 5, 1, 1))
$plot_hist_quantiles(samples1, 'mean_rating_customer_pred',
util0, 6, 0.5,
baseline_values=mean_rating_customer,
xlab="Customer-wise Average Ratings")
$plot_hist_quantiles(samples1, 'mean_rating_movie_pred',
util0, 6, 0.6,
baseline_values=mean_rating_movie,
xlab="Movie-wise Average Ratings")
$plot_hist_quantiles(samples1, 'var_rating_customer_pred',
util0, 7, 0.5,
baseline_values=var_rating_customer,
xlab="Customer-wise Rating Variances")
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 239 predictive values (0.1%) fell above the binning.
$plot_hist_quantiles(samples1, 'var_rating_movie_pred',
util0, 7, 0.5,
baseline_values=var_rating_movie,
xlab="Movie-wise Rating Variances")
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 1287 predictive values (0.2%) fell above the binning.
Finally the collection of selected movie empirical covariances appears to be more heavy-tailed in the observed data relative to the posterior predictions.
par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))
<-
filtered_samples $filter_expectands(samples1,
util
covar_rating_movie_filt_names)
$plot_hist_quantiles(filtered_samples, 'covar_rating_movie_pred',
util-4.25, 4.25, 0.25,
baseline_values=covar_rating_movie_filt,
xlab="Filtered Movie-wise Rating Covariances")
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 273 predictive values (0.0%) fell below the binning.
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 2329 predictive values (0.0%) fell above the binning.
4 Independent, Heterogeneous Customer Model
All of our retrodictive checks tell a consistent story – customers do not rate movies the same way as each other. Fortunately it’s straightforward to model each customer’s idiosyncratic behavior by allowing them with their own set of cut points, and hence baseline rating probabilities.
model2.stan
functions {
// Log probability density function over cut point
// induced by a Dirichlet probability density function
// over baseline probabilities and latent logistic
// density function.
real induced_dirichlet_lpdf(vector c, vector alpha) {
int K = num_elements(c) + 1;
vector[K - 1] Pi = inv_logit(c);
vector[K] p = append_row(Pi, [1]') - append_row([0]', Pi);
// Log Jacobian correction
real logJ = 0;
for (k in 1:(K - 1)) {
if (c[k] >= 0)
2 * log(1 + exp(-c[k]));
logJ += -c[k] - else
2 * log(1 + exp(+c[k]));
logJ += +c[k] -
}
return dirichlet_lpdf(p | alpha) + logJ;
}
}
data {
int<lower=1> N_ratings;
array[N_ratings] int<lower=1, upper=5> ratings;
int<lower=1> N_customers;
array[N_ratings] int<lower=1, upper=N_customers> customer_idxs;
int<lower=1> N_movies;
array[N_ratings] int<lower=1, upper=N_movies> movie_idxs;
}
parameters {
vector[N_movies] gamma_ncp; // Non-centered movie qualities
real<lower=0> tau_gamma; // Movie quality population scale
array[N_customers] ordered[4] cut_points; // Customer rating cut points
}
transformed parameters {
vector[N_movies] gamma = tau_gamma * gamma_ncp;
}
model {
// Prior model
0, 1);
gamma_ncp ~ normal(0, 5 / 2.57);
tau_gamma ~ normal(
for (c in 1:N_customers)
1, 5));
cut_points[c] ~ induced_dirichlet(rep_vector(
// Observational model
for (n in 1:N_ratings) {
int c = customer_idxs[n];
int m = movie_idxs[n];
ratings[n] ~ ordered_logistic(gamma[m], cut_points[c]);
}
}
generated quantities {
array[N_ratings] int<lower=1, upper=5> rating_pred;
array[N_customers] real mean_rating_customer_pred
0, N_customers);
= rep_array(array[N_customers] real var_rating_customer_pred
0, N_customers);
= rep_array(
array[N_movies] real mean_rating_movie_pred = rep_array(0, N_movies);
array[N_movies] real var_rating_movie_pred = rep_array(0, N_movies);
matrix[N_movies, N_movies] covar_rating_movie_pred;
{array[N_customers] real C = rep_array(0, N_customers);
array[N_movies] real M = rep_array(0, N_movies);
for (n in 1:N_ratings) {
real delta = 0;
int c = customer_idxs[n];
int m = movie_idxs[n];
rating_pred[n] = ordered_logistic_rng(gamma[m], cut_points[c]);
1;
C[c] +=
delta = rating_pred[n] - mean_rating_customer_pred[c];
mean_rating_customer_pred[c] += delta / C[c];
var_rating_customer_pred[c]
+= delta * (rating_pred[n] - mean_rating_customer_pred[c]);
1;
M[m] +=
delta = rating_pred[n] - mean_rating_movie_pred[m];
mean_rating_movie_pred[m] += delta / M[m];
var_rating_movie_pred[m]
+= delta * (rating_pred[n] - mean_rating_movie_pred[m]);
}
for (c in 1:N_customers) {
if (C[c] > 1)
1;
var_rating_customer_pred[c] /= C[c] - else
0;
var_rating_customer_pred[c] =
}for (m in 1:N_movies) {
if (M[m] > 1)
1;
var_rating_movie_pred[m] /= M[m] - else
0;
var_rating_movie_pred[m] =
}
}
{matrix[N_movies, N_movies] counts;
for (m1 in 1:N_movies) {
for (m2 in 1:N_movies) {
0;
counts[m1, m2] = 0;
covar_rating_movie_pred[m1, m2] =
}
}
for (n1 in 1:N_ratings) {
for (n2 in 1:N_ratings) {
if (customer_idxs[n1] == customer_idxs[n2]) {
int m1 = movie_idxs[n1];
int m2 = movie_idxs[n2];
real y = (ratings[n1] - mean_rating_movie_pred[m1])
* (ratings[n2] - mean_rating_movie_pred[m2]);
covar_rating_movie_pred[m1, m2] += y;
covar_rating_movie_pred[m2, m1] += y;1;
counts[m1, m2] += 1;
counts[m2, m1] +=
}
}
}
for (m1 in 1:N_movies) {
for (m2 in 1:N_movies) {
if (counts[m1, m2] > 1)
1;
covar_rating_movie_pred[m1, m2] /= counts[m1, m2] -
}
}
} }
<- stan(file="stan_programs/model2.stan",
fit data=data, seed=8438330,
warmup=1000, iter=2024, refresh=0)
Frustratingly, the computation suffers from a few stray divergences.
<- util$extract_hmc_diagnostics(fit)
diagnostics $check_all_hmc_diagnostics(diagnostics) util
Chain 4: 2 of 1024 transitions (0.2%) diverged.
Divergent Hamiltonian transitions result from unstable numerical
trajectories. These instabilities are often due to degenerate target
geometry, especially "pinches". If there are only a small number of
divergences then running with adept_delta larger than 0.801 may reduce
the instabilities at the cost of more expensive Hamiltonian
transitions.
<- util$extract_expectand_vals(fit)
samples2 <- util$filter_expectands(samples2,
base_samples c('gamma_ncp',
'tau_gamma',
'cut_points'),
check_arrays=TRUE)
$check_all_expectand_diagnostics(base_samples) util
All expectands checked appear to be behaving well enough for reliable
Markov chain Monte Carlo estimation.
One possibility is that the replicated cut points are messing with the hierarchical geometry of the movie affinities. That said the movie affinities most susceptible to degenerate behavior in a non-centered parameterization are those with the most ratings, and those do not exhibit any obvious geometric pathologies.
<- as.numeric(names(tail(sort(table(data$movie_idxs)), 9)))
idxs <- sapply(idxs, function(m) paste0('gamma[', m, ']'))
names $plot_div_pairs(names, 'tau_gamma', samples2, diagnostics) util
At this point we could spend some time investigating for any degenerate behavior between the cut points that could be causing problems, or even between some cut points and some movie affinities. Given the small number of divergences, however, let’s just run again with a less aggressive step size adaptation and cope with the increased computational cost. We can always come back to this investigation later if this ends up being our final model.
<- stan(file="stan_programs/model2.stan",
fit data=data, seed=8438330,
warmup=1000, iter=2024, refresh=0,
control=list('adapt_delta'=0.9))
Fortunately that seems to have done the trick and now our diagnostics are squeaky clean.
<- util$extract_hmc_diagnostics(fit)
diagnostics $check_all_hmc_diagnostics(diagnostics) util
All Hamiltonian Monte Carlo diagnostics are consistent with reliable
Markov chain Monte Carlo.
<- util$extract_expectand_vals(fit)
samples2 <- util$filter_expectands(samples2,
base_samples c('gamma_ncp',
'tau_gamma',
'cut_points'),
check_arrays=TRUE)
$check_all_expectand_diagnostics(base_samples) util
All expectands checked appear to be behaving well enough for reliable
Markov chain Monte Carlo estimation.
Has our retrodictive performance improved?
Interestingly the behavior of the aggregate ratings isn’t quite as consistent between the observed data and posterior predictions as it was before. That said the increased retrodictive tension isn’t necessarily large enough to be a concern yet.
par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))
$plot_hist_quantiles(samples2, 'rating_pred', -0.5, 6.5, 1,
utilbaseline_values=data$ratings,
xlab="All Ratings")
More importantly the retrodictive performance for the customers that we’ve spot checked is much better.
par(mfrow=c(2, 3), mar=c(5, 5, 1, 1))
for (c in c(7, 23, 40, 70, 77, 100)) {
<- sapply(which(data$customer_idxs == c),
names function(n) paste0('rating_pred[', n, ']'))
<- util$filter_expectands(samples2, names)
filtered_samples
<- data$ratings[data$customer_idxs == c]
customer_ratings $plot_hist_quantiles(filtered_samples, 'rating_pred',
util-0.5, 6.5, 1,
baseline_values=customer_ratings,
xlab="Ratings",
main=paste('Customer', c))
}
This improvement in the customer-wise behaviors hasn’t come at any cost to the movie-wise retrodictive performance, at least for this limited spot check.
par(mfrow=c(2, 3), mar=c(5, 5, 1, 1))
for (m in c(33, 53, 61, 80, 117, 180)) {
<- sapply(which(data$movie_idxs == m),
names function(n) paste0('rating_pred[', n, ']'))
<- util$filter_expectands(samples2, names)
filtered_samples
<- data$ratings[data$movie_idxs == m]
movie_ratings $plot_hist_quantiles(filtered_samples, 'rating_pred',
util-0.5, 6.5, 1,
baseline_values=movie_ratings,
xlab="Ratings",
main=paste('Movie', m))
}
Do the histograms of stratified summary statistics tell a similar story? The retrodictive tension in the customer-wise empirical means and empirical variances has decreased, although some remains in the empirical variances.
par(mfrow=c(2, 2), mar=c(5, 5, 1, 1))
$plot_hist_quantiles(samples2, 'mean_rating_customer_pred',
util0, 6, 0.5,
baseline_values=mean_rating_customer,
xlab="Customer-wise Average Ratings")
$plot_hist_quantiles(samples2, 'mean_rating_movie_pred',
util0, 6, 0.6,
baseline_values=mean_rating_movie,
xlab="Movie-wise Average Ratings")
$plot_hist_quantiles(samples2, 'var_rating_customer_pred',
util0, 7, 0.5,
baseline_values=var_rating_customer,
xlab="Customer-wise Rating Variances")
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 400 predictive values (0.1%) fell above the binning.
$plot_hist_quantiles(samples2, 'var_rating_movie_pred',
util0, 7, 0.5,
baseline_values=var_rating_movie,
xlab="Movie-wise Rating Variances")
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 2589 predictive values (0.3%) fell above the binning.
Moreover the retrodictive tension in the collection of selected movie empirical covariances remains.
par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))
<-
filtered_samples $filter_expectands(samples2,
util
covar_rating_movie_filt_names)
$plot_hist_quantiles(filtered_samples, 'covar_rating_movie_pred',
util-4.25, 4.25, 0.25,
baseline_values=covar_rating_movie_filt,
xlab="Filtered Movie-wise Rating Covariances")
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 370 predictive values (0.0%) fell below the binning.
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 248 predictive values (0.0%) fell above the binning.
If we were content with these mild retrodictive tensions then we would move on to exploring the posterior inferences themselves. For example we could visualize the marginal posterior inferences for the interior cut points of each customer.
par(mfrow=c(4, 1), mar=c(5, 5, 1, 1))
for (k in 1:4) {
<- sapply(1:data$N_customers,
names function(r) paste0('cut_points[', r, ',', k, ']'))
$plot_disc_pushforward_quantiles(samples2, names,
utilxlab="Customer",
display_ylim=c(-6, 6),
ylab=paste0('cut_point[', k, ']'))
}
The interior cut points are a bit more interpretable when visualized together, especially when comparing different customers. For example to compare Customer 23 and Customer 70 we might overlay the marginal posterior visualizations of the four interior cut points of each customer in adjacent plots.
par(mfrow=c(2, 1), mar=c(5, 5, 1, 1))
<- c(util$c_mid, util$c_mid_highlight,
cols $c_dark, util$c_dark_highlight)
util
<- 23
c
<- 1
k <-paste0('cut_points[', c, ',', k, ']')
name $plot_expectand_pushforward(samples2[[name]],
util50, flim=c(-9, 9), ylim=c(0, 2),
col=cols[k],
display_name='Interior Cut Points',
main=paste('Customer', c))
for (k in 2:4) {
<-paste0('cut_points[', c, ',', k, ']')
name $plot_expectand_pushforward(samples2[[name]],
util50, flim=c(-9, 9),
col=cols[k], border="#BBBBBB88",
add=TRUE)
}
text(0, 1.65, "cut_points[1]", col=util$c_mid)
text(2, 1.4, "cut_points[2]", col=util$c_mid_highlight)
text(4, 1.1, "cut_points[3]", col=util$c_dark)
text(6, 0.5, "cut_points[4]", col=util$c_dark_highlight)
<- 70
c
<- 1
k <-paste0('cut_points[', c, ',', k, ']')
name $plot_expectand_pushforward(samples2[[name]],
util50, flim=c(-9, 9), ylim=c(0, 2),
col=cols[k],
display_name='Interior Cut Points',
main=paste('Customer', c))
Warning in util$plot_expectand_pushforward(samples2[[name]], 50, flim = c(-9, :
8 values (0.2%) fell below the histogram binning.
Warning in util$plot_expectand_pushforward(samples2[[name]], 50, flim = c(-9, :
0 values (0.0%) fell above the histogram binning.
for (k in 2:4) {
<-paste0('cut_points[', c, ',', k, ']')
name $plot_expectand_pushforward(samples2[[name]],
util50, flim=c(-9, 9),
col=cols[k], border="#BBBBBB88",
add=TRUE)
}
text(-5.75, 0.35, "cut_points[1]", col=util$c_mid)
text(-3, 0.85, "cut_points[2]", col=util$c_mid_highlight)
text(-2, 1.1, "cut_points[3]", col=util$c_dark)
text(1.0, 1.25, "cut_points[4]", col=util$c_dark_highlight)
This allow us to see that Customer 23 is pretty stingy; a movie affinity needs to be pretty large in order for the probability of a large rating to become non-negligible. On the other hand Customer 70 is much more generous; they are likely to give even a mediocre movie a high rating.
At the same time we can investigate the inferred affinities for each movie.
par(mfrow=c(2, 1), mar=c(5, 5, 1, 1))
$plot_expectand_pushforward(samples2[['tau_gamma']],
util25, flim=c(0, 1),
display_name='tau_gamma')
<- sapply(1:data$N_movies,
names function(m) paste0('gamma[', m, ']'))
$plot_disc_pushforward_quantiles(samples2, names,
utilxlab="Movie",
ylab="Affinity")
While there is a lot of uncertainty, not surprising given the limited data, we can definitely see some trends. For example the concentration of the tau_gamma
marginal posterior distribution away from zero indicates substantial variation in the movie affinities. Moreover despite the uncertainties we can clearly differentiate the best movies from the worst movies. We’ll do even more with these movie affinity inferences in the next section.
5 Hierarchical Customer Model
At this point we need to address a stark asymmetry in the last model. Although we’re accounting for heterogeneity in both customer and movie behaviors, we’re modeling only the latter hierarchically. We don’t have any domain expertise that obstructs the exchangeability of the customers so why don’t we model the individual interior cut points hierarchically as well? All we need is an appropriate multivariate population model.
Conveniently the induced Dirichlet prior naturally composes with the hyper Dirichlet population model that I discussed in my die fairness case study. This makes for a natural interior cut point population model that pools each customer’s baseline ratings towards a common behavior.
Note that this is not the most general hierarchical model that we might consider. This model assumes that the heterogeneity in the interior cut points is independent of the heterogeneity in the movie affinities; more generally those heterogeneities could be coupled together. That said I think that it is pretty reasonable to assume that how each customer translates their preferences into a movie rating is independent of those particular preferences.
model3.stan
functions {
// Log probability density function over cut point
// induced by a Dirichlet probability density function
// over baseline probabilities and latent logistic
// density function.
real induced_dirichlet_lpdf(vector c, vector alpha) {
int K = num_elements(c) + 1;
vector[K - 1] Pi = inv_logit(c);
vector[K] p = append_row(Pi, [1]') - append_row([0]', Pi);
// Log Jacobian correction
real logJ = 0;
for (k in 1:(K - 1)) {
if (c[k] >= 0)
2 * log(1 + exp(-c[k]));
logJ += -c[k] - else
2 * log(1 + exp(+c[k]));
logJ += +c[k] -
}
return dirichlet_lpdf(p | alpha) + logJ;
}
}
data {
int<lower=1> N_ratings;
array[N_ratings] int<lower=1, upper=5> ratings;
int<lower=1> N_customers;
array[N_ratings] int<lower=1, upper=N_customers> customer_idxs;
int<lower=1> N_movies;
array[N_ratings] int<lower=1, upper=N_movies> movie_idxs;
}
parameters {
vector[N_movies] gamma_ncp; // Non-centered movie affinities
real<lower=0> tau_gamma; // Movie affinity population scale
array[N_customers] ordered[4] cut_points; // Customer rating cut points
simplex[5] mu_q; // Rating simplex population location
real<lower=0> tau_q; // Rating simplex population scale
}
transformed parameters {
// Centered movie affinities
vector[N_movies] gamma = tau_gamma * gamma_ncp;
}
model {
vector[5] ones = rep_vector(1, 5);
vector[5] alpha = mu_q / tau_q + ones;
// Prior model
0, 1);
gamma_ncp ~ normal(0, 5 / 2.57);
tau_gamma ~ normal(
5 * ones);
mu_q ~ dirichlet(0, 1);
tau_q ~ normal(
for (c in 1:N_customers)
cut_points[c] ~ induced_dirichlet(alpha);
// Observational model
for (n in 1:N_ratings) {
int c = customer_idxs[n];
int m = movie_idxs[n];
ratings[n] ~ ordered_logistic(gamma[m], cut_points[c]);
}
}
generated quantities {
array[N_ratings] int<lower=1, upper=5> rating_pred;
array[N_customers] real mean_rating_customer_pred
0, N_customers);
= rep_array(array[N_customers] real var_rating_customer_pred
0, N_customers);
= rep_array(
array[N_movies] real mean_rating_movie_pred = rep_array(0, N_movies);
array[N_movies] real var_rating_movie_pred = rep_array(0, N_movies);
matrix[N_movies, N_movies] covar_rating_movie_pred;
{array[N_customers] real C = rep_array(0, N_customers);
array[N_movies] real M = rep_array(0, N_movies);
for (n in 1:N_ratings) {
real delta = 0;
int c = customer_idxs[n];
int m = movie_idxs[n];
rating_pred[n] = ordered_logistic_rng(gamma[m], cut_points[c]);
1;
C[c] +=
delta = rating_pred[n] - mean_rating_customer_pred[c];
mean_rating_customer_pred[c] += delta / C[c];
var_rating_customer_pred[c]
+= delta * (rating_pred[n] - mean_rating_customer_pred[c]);
1;
M[m] +=
delta = rating_pred[n] - mean_rating_movie_pred[m];
mean_rating_movie_pred[m] += delta / M[m];
var_rating_movie_pred[m]
+= delta * (rating_pred[n] - mean_rating_movie_pred[m]);
}
for (c in 1:N_customers) {
if (C[c] > 1)
1;
var_rating_customer_pred[c] /= C[c] - else
0;
var_rating_customer_pred[c] =
}for (m in 1:N_movies) {
if (M[m] > 1)
1;
var_rating_movie_pred[m] /= M[m] - else
0;
var_rating_movie_pred[m] =
}
}
{matrix[N_movies, N_movies] counts;
for (m1 in 1:N_movies) {
for (m2 in 1:N_movies) {
0;
counts[m1, m2] = 0;
covar_rating_movie_pred[m1, m2] =
}
}
for (n1 in 1:N_ratings) {
for (n2 in 1:N_ratings) {
if (customer_idxs[n1] == customer_idxs[n2]) {
int m1 = movie_idxs[n1];
int m2 = movie_idxs[n2];
real y = (ratings[n1] - mean_rating_movie_pred[m1])
* (ratings[n2] - mean_rating_movie_pred[m2]);
covar_rating_movie_pred[m1, m2] += y;
covar_rating_movie_pred[m2, m1] += y;1;
counts[m1, m2] += 1;
counts[m2, m1] +=
}
}
}
for (m1 in 1:N_movies) {
for (m2 in 1:N_movies) {
if (counts[m1, m2] > 1)
1;
covar_rating_movie_pred[m1, m2] /= counts[m1, m2] -
}
}
} }
<- stan(file="stan_programs/model3.stan",
fit data=data, seed=8438338,
warmup=1000, iter=2024, refresh=0)
The hierarchical model over the interior cut points has already proved useful – the mild computational issues that we had considered in the last model fit have vanished.
<- util$extract_hmc_diagnostics(fit)
diagnostics $check_all_hmc_diagnostics(diagnostics) util
All Hamiltonian Monte Carlo diagnostics are consistent with reliable
Markov chain Monte Carlo.
<- util$extract_expectand_vals(fit)
samples3 <- util$filter_expectands(samples3,
base_samples c('gamma_ncp',
'tau_gamma',
'cut_points',
'mu_q', 'tau_q'),
check_arrays=TRUE)
$check_all_expectand_diagnostics(base_samples) util
All expectands checked appear to be behaving well enough for reliable
Markov chain Monte Carlo estimation.
Overall the retrodictive performance is a little bit better than in the previous model. In particular the agreements in the aggregate ratings histogram and stratified empirical covariances histogram have improved slightly. On the other hand the retrodictive tension in the stratified empirical variances stubbornly persists. We shouldn’t expect too much performance, however, given that the hierarchical structure doesn’t add any flexibility to the customer behaviors beyond what we had already included.
par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))
$plot_hist_quantiles(samples3, 'rating_pred', -0.5, 6.5, 1,
utilbaseline_values=data$ratings,
xlab="All Ratings")
par(mfrow=c(2, 3), mar=c(5, 5, 1, 1))
for (c in c(7, 23, 40, 70, 77, 100)) {
<- sapply(which(data$customer_idxs == c),
names function(n) paste0('rating_pred[', n, ']'))
<- util$filter_expectands(samples3, names)
filtered_samples
<- data$ratings[data$customer_idxs == c]
customer_ratings $plot_hist_quantiles(filtered_samples, 'rating_pred',
util-0.5, 6.5, 1,
baseline_values=customer_ratings,
xlab="Ratings",
main=paste('Customer', c))
}
par(mfrow=c(2, 3), mar=c(5, 5, 1, 1))
for (m in c(33, 53, 61, 80, 117, 180)) {
<- sapply(which(data$movie_idxs == m),
names function(n) paste0('rating_pred[', n, ']'))
<- util$filter_expectands(samples3, names)
filtered_samples
<- data$ratings[data$movie_idxs == m]
movie_ratings $plot_hist_quantiles(filtered_samples, 'rating_pred',
util-0.5, 6.5, 1,
baseline_values=movie_ratings,
xlab="Ratings",
main=paste('Movie', m))
}
par(mfrow=c(2, 2), mar=c(5, 5, 1, 1))
$plot_hist_quantiles(samples3, 'mean_rating_customer_pred',
util0, 6, 0.5,
baseline_values=mean_rating_customer,
xlab="Customer-wise Average Ratings")
$plot_hist_quantiles(samples3, 'mean_rating_movie_pred',
util0, 6, 0.6,
baseline_values=mean_rating_movie,
xlab="Movie-wise Average Ratings")
$plot_hist_quantiles(samples3, 'var_rating_customer_pred',
util0, 7, 0.5,
baseline_values=var_rating_customer,
xlab="Customer-wise Rating Variances")
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 227 predictive values (0.1%) fell above the binning.
$plot_hist_quantiles(samples3, 'var_rating_movie_pred',
util0, 7, 0.5,
baseline_values=var_rating_movie,
xlab="Movie-wise Rating Variances")
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 2264 predictive values (0.3%) fell above the binning.
par(mfrow=c(1, 1), mar=c(5, 5, 1, 1))
<-
filtered_samples $filter_expectands(samples3,
util
covar_rating_movie_filt_names)
$plot_hist_quantiles(filtered_samples, 'covar_rating_movie_pred',
util-4.25, 4.25, 0.25,
baseline_values=covar_rating_movie_filt,
xlab="Filtered Movie-wise Rating Covariances")
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 355 predictive values (0.0%) fell below the binning.
Warning in check_bin_containment(bin_min, bin_max, collapsed_values,
"predictive value"): 432 predictive values (0.0%) fell above the binning.
Assuming that we are satisfied with this retrodictive performance there many ways that we can explore and utilize the associated inferences.
For example with the new hierarchical structure we can investigate what we learned about not only each individual customer but also the population of customers. The marginal posterior distribution for tau_q
suggests a small but non-negligible heterogeneity in the interior cut points. Moreover the inferences for the baseline probabilities mu_q
suggest that customers are relatively optimistic, with four star ratings being more probable than tree star ratings for a neutral movie with zero affinity.
par(mfrow=c(2, 1), mar=c(5, 5, 1, 1))
$plot_expectand_pushforward(samples3[['tau_q']],
util25, flim=c(0, 0.3),
display_name='tau_q')
<- sapply(1:5, function(k) paste0('mu_q[', k, ']'))
names $plot_disc_pushforward_quantiles(samples3, names,
utilxlab="Rating",
ylab="Baseline Rating Probability")
The regularizing influence of the hierarchical model is strongest for those customers with the fewest ratings. For example Customer 42 has only three observed ratings, and the hierarchical inferences for their interior cut points shift and narrow pretty substantially relative to the inferences from the previous model.
<- 42
c cat(sprintf("Customer %s: %s ratings",
table(data$customer_idxs)[c])) c,
Customer 42: 3 ratings
par(mfrow=c(2, 2), mar=c(5, 5, 1, 1))
<- c(1.5, 2, -4, -3)
lab2_xs <- c(0.15, 0.25, 0.35, 0.35)
lab2_ys
<- c(-6, -5, 3, 3.5)
lab3_xs <- c(0.25, 0.4, 0.5, 0.6)
lab3_ys
for (k in 1:4) {
<- paste0('cut_points[', c, ',', k, ']')
name $plot_expectand_pushforward(samples2[[name]],
util50, flim=c(-10, 5), ylim=c(0, 1.0),
col=util$c_light,
display_name='Interior Cut Points',
main=paste('Customer', c))
<- paste0('cut_points[', c, ',', k, ']')
name $plot_expectand_pushforward(samples3[[name]],
util50, flim=c(-10, 5),
col=util$c_dark, border="#BBBBBB88",
add=TRUE)
text(lab2_xs[k], lab2_ys[k], "Model 2", col=util$c_light)
text(lab3_xs[k], lab3_ys[k], "Model 3", col=util$c_dark)
}
Warning in util$plot_expectand_pushforward(samples2[[name]], 50, flim = c(-10,
: 1 value (0.0%) fell below the histogram binning.
Warning in util$plot_expectand_pushforward(samples2[[name]], 50, flim = c(-10,
: 0 value (0.0%) fell above the histogram binning.
Warning in util$plot_expectand_pushforward(samples3[[name]], 50, flim = c(-10,
: 1 value (0.0%) fell below the histogram binning.
Warning in util$plot_expectand_pushforward(samples3[[name]], 50, flim = c(-10,
: 0 value (0.0%) fell above the histogram binning.
We can also see the impact of the hierarchical model if we examine the interior cut point inferences for the two models across all customers. The more uncertain customer inferences in the previous model, especially for the first and last cut points, are pulled towards the other customer inferences.
par(mfrow=c(4, 1), mar=c(5, 5, 1, 1))
for (k in 1:4) {
<- sapply(1:data$N_customers,
names function(c) paste0('cut_points[', c, ',', k, ']'))
<- paste0('cut_points[', k, ']')
yname $plot_disc_pushforward_quantiles(samples3, names,
utilxlab="customer",
display_ylim=c(-6, 6),
ylab=yname)
}
The hierarchical influence, however, doesn’t change the qualitative details. For example Customer 23 is still stingy with their ratings while Customer 70 is still generous.
par(mfrow=c(2, 1), mar=c(5, 5, 1, 1))
<- c(util$c_mid, util$c_mid_highlight,
cols $c_dark, util$c_dark_highlight)
util
<- 23
c
<- 1
k <-paste0('cut_points[', c, ',', k, ']')
name $plot_expectand_pushforward(samples3[[name]],
util50, flim=c(-9, 9), ylim=c(0, 2),
col=cols[k],
display_name='Interior Cut Points',
main=paste('Customer', c))
for (k in 2:4) {
<-paste0('cut_points[', c, ',', k, ']')
name $plot_expectand_pushforward(samples3[[name]],
util50, flim=c(-9, 9),
col=cols[k], border="#BBBBBB88",
add=TRUE)
}
text(-1, 1.65, "cut_points[1]", col=util$c_mid)
text(2, 1.55, "cut_points[2]", col=util$c_mid_highlight)
text(4.25, 1, "cut_points[3]", col=util$c_dark)
text(6.25, 0.5, "cut_points[4]", col=util$c_dark_highlight)
<- 70
c
<- 1
k <-paste0('cut_points[', c, ',', k, ']')
name $plot_expectand_pushforward(samples3[[name]],
util50, flim=c(-9, 9), ylim=c(0, 2),
col=cols[k],
display_name='Interior Cut Points',
main=paste('Customer', c))
Warning in util$plot_expectand_pushforward(samples3[[name]], 50, flim = c(-9, :
11 values (0.3%) fell below the histogram binning.
Warning in util$plot_expectand_pushforward(samples3[[name]], 50, flim = c(-9, :
0 values (0.0%) fell above the histogram binning.
for (k in 2:4) {
<-paste0('cut_points[', c, ',', k, ']')
name $plot_expectand_pushforward(samples3[[name]],
util50, flim=c(-9, 9),
col=cols[k], border="#BBBBBB88",
add=TRUE)
}
text(-5.75, 0.4, "cut_points[1]", col=util$c_mid)
text(-3, 0.9, "cut_points[2]", col=util$c_mid_highlight)
text(-2, 1.2, "cut_points[3]", col=util$c_dark)
text(1.0, 1.4, "cut_points[4]", col=util$c_dark_highlight)