The xkcd Neutrino Detector

Author

Michael Betancourt

Published

June 2026

If you enjoy this piece, then you might enjoy the other writing on probability theory, probabilistic modeling, and Bayesian inference on my website.

The 1132th xkcd webcomic was one of the first to draw humor from statistics, or at least attempt to do so depending on whom you ask (Figure 1). Its comparison between a (seemingly) ridiculous hypothesis test and (seemingly) more reasonable Bayesian decision quickly transcended internet humor, becoming a non-uncommon example in more serious statistical discussions.

Figure 1: xkcd 1132 is an early example of the tenuous interplay between internet humor and statistics.

That said, while the immediate conclusion is not wrong, the entire premise is pretty unfair to traditional frequentist techniques. In this short note, I’ll work out both frequentist and Bayesian analyses of the problem presented in this infamous webcomic to better understand what, if any, practical lessons it conveys.

1 The Observational Model

The comic establishes a noisy solar neutrino detector, which sometimes identifies the true state of the sun and sometimes its exact opposite.

More formally, the state of the sun is reduced to a binary categorization of having gone nova or not. I will denote these two states at \mathrm{Y} and \mathrm{N}, respectively.

The detector correctly determines the true state of the sun when the rolling two six-sided dice results in double sixes. In other words, the we would observe the true state with probability 1 - q_{L}, and the the alternative state with probability q_{L}. For fair dice, q_{L} = \left( \frac{1}{6} \right)^{2} = \frac{1}{36}. Here q_{L} refers to the probability of “lying”, following the anthropomorphized language in the comic.

We can incorporate the influence of the unobserved dice rolls with a mixture observational model. If the true state, \mathrm{T}, of the sun is nova, then the probability of each observed state, \mathrm{O}, would be \pi( \mathrm{O} \mid \mathrm{T} = \mathrm{Y}) = \left\{ \begin{array}{ll} q_{L}, & \mathrm{O} = \mathrm{N} \\ 1 - q_{L}, & \mathrm{O} = \mathrm{Y} \end{array} \right. . Similarly, if the true state of the sun is not nova, then the probability of observing each state would be \pi( \mathrm{O} \mid \mathrm{T} = \mathrm{N}) = \left\{ \begin{array}{ll} 1 - q_{L}, & \mathrm{O} = \mathrm{N} \\ q_{L}, & \mathrm{O} = \mathrm{Y} \end{array} \right. .

2 Null Hypothesis Significance Testing

Let’s start our review of potential analyses approaches with a classic null hypothesis significance test. We’ll define our null model as a nominal sun, \mathrm{T} = \mathrm{N}, and reject this null hypothesis if we observe \mathrm{O} = \mathrm{Y}.

This rejection procedure is a decision-making process. To verify that it behaves sufficiently well in practice, we have to quantify its potential performance. Following the classic null hypothesis significance testing methodology, we’ll consider the false positive rate, i.e. “significance”, and true positive rate, i.e. “power”.

The false positive rate is given by \begin{align*} \mathrm{FPR} &= \pi( \mathrm{reject} \mid \mathrm{T} = \mathrm{N}) \\ &= \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{N}) \\ &= q_{L}. \end{align*} For q_{L} = 1 / 36, the false positive rate is less than the conventional threshold of 0.05. Consequently, this would classically be considered a sufficiently sensitive test.

On the other hand, the true positive rate is given by \begin{align*} \mathrm{TPR} &= \pi( \mathrm{reject} \mid \mathrm{T} = \mathrm{Y}) \\ &= \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{Y}) \\ &= 1 - q_{L}. \end{align*} Assuming two fair die, this evaluates to 1 - q_{L} = \frac{35}{36} \approx 0.97. This is an incredibly high-powered test!

All of this is to say that, from a classic null hypothesis significance testing perspective, we should be confident that the sun has exploded if the detector ever displays a warning. Despite any other information that would suggest otherwise.

3 Likelihood Ratio Testing

Of course null hypothesis significance testing is not the entirety of frequentist statistics, and its awkward behavior alone does not imply that all of frequentist statistics is problematic. Let’s see how likelihood ratio testing performs.

The likelihood ratio test selects between two hypotheses by comparing a likelihood ratio statistic to zero. With only two point hypotheses, the likelihood ratio statistic reduces to \begin{align*} \lambda &= 2 \, \log \frac{ \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{N}) } \\ &= 2 \, \log \frac{ 1 - q_{L} }{ q_{L} } \\ &= 2 \, \mathrm{logit}( 1 - q_{L} ). \end{align*}

For our q_{L}, this is approximately \lambda \approx 2 \cdot 3.6 \approx 7. This is so far away from zero that the \mathrm{T} = \mathrm{N} hypothesis would be rejected for any reasonable decision threshold!

It seems like the confidence in an exploding sun is not so much an artifact of null hypothesis significance testing itself, but rather a consequence of the observational model itself.

4 Bayesian Inference

Finally, let’s see if Bayesian inference is any more robust. The posterior probability that the sun is truly exploding is given by an application of Bayes’ Theorem, \begin{align*} \pi( \mathrm{T} &= \mathrm{Y} \mid \mathrm{O} = \mathrm{Y}) \\ &= \frac{ \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{Y}) \, \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{Y}) \, \pi( \mathrm{T} = \mathrm{Y}) + \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{N}) \, \pi( \mathrm{T} = \mathrm{N}) } \\ &= \frac{ 1 }{ 1 + \frac{ \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{N}) \, \pi( \mathrm{T} = \mathrm{N}) }{ \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{Y}) \, \pi( \mathrm{T} = \mathrm{Y}) } } \\ &= \frac{ 1 }{ 1 + \frac{ q_{L} }{ 1 - q_{L} } \, \frac{ \pi( \mathrm{T} = \mathrm{N}) }{ \pi( \mathrm{T} = \mathrm{Y}) } } \\ &= \frac{ 1 }{ 1 + \exp \left[ - \left( -\log \frac{ q_{L} }{ 1 - q_{L} } \, + \log \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } \right) \right] } \\ &= \mathrm{logistic} \left( -\log \frac{ q_{L} }{ 1 - q_{L} } + \log \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } \right) \\ &= \mathrm{logistic} \left( \mathrm{logit}( 1 - q_{L} ) + \log \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } \right) \end{align*}

When \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } = 1, we have \log \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } = 0. In this case, the posterior probability becomes \pi( \mathrm{T} = \mathrm{Y} \mid \mathrm{O} = \mathrm{Y}) = 1 - q_{L} = \frac{35}{36}, equally as confident as what we saw in the previous sections.

In other words, the penchant for confidence in an exploding sun affects Bayesian analyses just as much as the two frequentist analyses that we considered! If we don’t have any domain expertise to motivate a prior model that distinguishes the two possible states of the sun, then a Bayesian analysis will perform just as poorly!

The posterior probabilities of the two sun states are equal if \begin{align*} \log \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } &= \log \frac{ 1 - q_{L} }{ q_{L} } \\ &= \mathrm{logit} (1 - q_{L}), \end{align*} or \begin{align*} \log \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } \\ &= \exp( \mathrm{logit} (1 - q_{L}) ) \\ &\approx 0.03. \end{align*} This is a lot of domain expertise against the sun exploding in the last few minutes.

To confidently bet that the detector is wrong when it signals an exploding sun, we would need \pi( \mathrm{T} = \mathrm{Y} \mid \mathrm{O} = \mathrm{Y}) \ll 1. That requires an incredible amount of domain expertise disfavoring the explosion of the sun, \log \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } \ll \mathrm{logit} (1 - q_{L}), or \frac{ \pi( \mathrm{T} = \mathrm{Y} ) }{ \pi( \mathrm{T} = \mathrm{N} ) } \ll 0.03.

5 Bayesian Decisions

All of this said, the character in the webcomic doesn’t quote posterior inferences. Instead, they state confidence in a particular bet, and that requires using Bayesian decision theory to quantify the outcomes of different betting strategies. Fortunately, the Bayesian decision theory is not too difficult to work out in this case.

To simplify the analysis, let’s assume that we can make bets with no transaction costs. If we bet C units of currently that the sun has not yet gone nova and the sun is indeed still there, then we win double our bet, U(\mathrm{T} = \mathrm{N}) = 2 \, C - C = C. On the other hand, if it turns out that the sun has started to envelope the inner planets then we would lose o - 1 times our bet, U(\mathrm{T} = \mathrm{Y}) = -(o - 1) \, C - C = - o \, C.

The posterior expected utility of this bet when we observe the detector warning of an exploding sun is given by weighting these outcomes by the corresponding posterior probability, \begin{align*} \bar{U} &=\quad U(\mathrm{T} = \mathrm{Y}) \, \pi( \mathrm{T} = \mathrm{Y} \mid \mathrm{O} = \mathrm{Y}) \\ &\quad+ U(\mathrm{T} = \mathrm{N}) \, \pi( \mathrm{T} = \mathrm{N} \mid \mathrm{O} = \mathrm{Y}) \\ &= - o \, C \, \pi( \mathrm{T} = \mathrm{Y} \mid \mathrm{O} = \mathrm{Y}) \\ &\quad+ C \, \pi( \mathrm{T} = \mathrm{N} \mid \mathrm{O} = \mathrm{Y}). \end{align*}

We would expect a winning bet if \begin{align*} 0 &< \bar{U} \\ 0 &< - o \, C \, \pi( \mathrm{T} = \mathrm{Y} \mid \mathrm{O} = \mathrm{Y}) \\ &\quad+ C \, \pi( \mathrm{T} = \mathrm{N} \mid \mathrm{O} = \mathrm{Y}) \\ - o \, C \, \pi( \mathrm{T} = \mathrm{Y} \mid \mathrm{O} = \mathrm{Y}) &< C \, \pi( \mathrm{T} = \mathrm{N} \mid \mathrm{O} = \mathrm{Y}) \\ \frac{ \pi( \mathrm{T} = \mathrm{Y} \mid \mathrm{O} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N} \mid \mathrm{O} = \mathrm{Y}) } <& \frac{1}{o}. \end{align*}

Now, the ratio of posterior probabilities is given by \begin{align*} \frac{ \pi( \mathrm{T} = \mathrm{Y} \mid \mathrm{O} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N} \mid \mathrm{O} = \mathrm{Y}) } &= \frac{ \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{Y}) \, \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{N}) \, \pi( \mathrm{T} = \mathrm{N}) } \\ &= \frac{ \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{O} = \mathrm{Y} \mid \mathrm{T} = \mathrm{N}) } \, \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } \\ &= \frac{ 1 - q_{L} }{ q_{L} } \, \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } \\ &= 35 \, \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}). } \end{align*} Again we see the strong preference of believing that the sun has exploded unless our domain expertise suggests otherwise.

Our beneficial-betting threshold then becomes \begin{align*} 35 \, \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } <& \frac{1}{o} \\ \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } <& \frac{1}{35 \, o}. \end{align*}

To rationalize 50:1 betting odds, for example, we would need \begin{align*} \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } <& \frac{1}{35 \, 50} \\ \frac{ \pi( \mathrm{T} = \mathrm{Y}) }{ \pi( \mathrm{T} = \mathrm{N}) } <& \frac{1}{1750}. \end{align*} This is only about three times larger than the odds of successfully navigating an asteroid field, at least according to the experts.

In order for betting against the detector to be immediately obvious, we need extremely strong domain expertise on the stability of the sun. This is not unrealistic for an astrophysicist, or even general scientist, but it is not necessarily realistic for someone without much scientific background.

6 It Was Poor Dichotomization All Along!

Ultimately, this result is equivalent to the base rate problem when comparing populations. A particular observed state will poorly correlate with the true state of an individual if that state is rare. Here, a strong likelihood function can be misleading if it clashes with our domain expertise. Consequently, frequentist methods and Bayesian methods with only weakly informative prior models can perform poorly.

That said, it’s reasonable to be suspicious of the entire experimental setup if we need such strong domain expertise to discriminate between the two possible states of the sun and regularize the likelihood functions derived from the observational model. Ultimately, the problem is that we’ve partitioned a system into states of wildly unbalanced possibilities.

The sun going supernova or not defines a binary state space. One state, however, is really an accumulation of many more elementary states than the other. An exploding sun accumulates states where the sun goes nova any time between the arrival of solar neutrinos to earth and the present. On the other hand, a nominal sun accumulates all states where the sun goes nova between the present and the end of the universe.

Any principled domain expertise about the mechanics of the sun will heavily favor the latter. With these unbalanced possibilities, any inferential method that treats the states equally will be fragile.

Another circumstance where these problems arise is when someone argues that the probability of the world ending or a team winning is 50%, because there are only two possibilities. These binary states encapsulate very different elementary outcomes, and any inspection into those binary states should result in unbalanced prior probabilities.

7 Conclusion

At the most technical level, xkcd 1132 is absolutely correct. When the states of a system are unbalanced, the performance of any inferential method that treats the states equally will be fragile at best. From this perspective, the ability to easily incorporate Bayesian inference can make it far more robust.

That is, of course, if we actually take the time to elicit the needed domain expertise. When using a sloppy prior model, the performance of Bayesian inference will be just as disappointing as the frequentist methods.

At a more practical level, all inferential techniques will be more robust when we discretize the states of a system to be as balanced as possible. The design of an experiment, and the development of a corresponding observational model, is just as much an opportunity to include domain expertise as a Bayesian prior model!

To be clear, I’m still going with a Bayesian analysis every time. I just don’t want to make that analysis any more difficult by poorly characterizing the system at hand.

License

The text in this case study is copyrighted by Michael Betancourt and licensed under the CC BY-NC 4.0 license:

https://creativecommons.org/licenses/by-nc/4.0/