2 min read

A simple probability problem

Amy Hogan, a stats and maths teacher who blogs at A Little Stats, posted the following quiz on twitter:

(Assuming fair dice) which has the highest probability:

1 six from 6 dice

2 sixes from 12 dice

3 sixes from 18 dice

The calculations aren’t too hard even by hand, and we have pbinom() available (if we remember to check < vs conditions). In that sense the question is easy, but I was looking for an intuitive argument. 

Obviously, the probability of exactly n sixes from 6n dice is decreasing in n, because the distribution is becoming less discrete. On the other hand, the probability of more than n sixes is increasing towards 1/2, since the distribution is becoming more symmetric. It isn’t obvious to me which one wins. 

Although I’d never encountered this before, it turns out to be a real classic. Isaac Newton answered it for Samuel Pepys, and got the brute-force calculations right, but then came up with an incorrect heuristic argument. Stephen Stigler has a paper, Joe Blitzstein pointed me to it before I wasted too much time. 

The neatest relevant fact is that the difference between the median and mean of a Binomial distribution is strictly less than 1, and so when the mean is an integer the two are equal. That implies the sequence P[Bin(nk,1/k)n] will tend to decrease with increasing n for any k, but even that doesn’t quite prove the sequence is strictly monotone: we only know the probability is between 0.5 and 0.5+P[Bin(nk,1/k)=n]. Also, there’s apparently no simple intuition behind the bound on the difference between mean and median. 

In the end, it turns out to be true that P[Bin(nk,1/k)n] is decreasing in n for any integer k,  but (pretty obviously) Bin(nk,p)  doesn’t have to be decreasing with n for general p. Any valid intuition has to take advantage of p=1/k. Stigler seems to think that’s an important barrier; I’m not convinced. Perhaps more off-putting,  any valid intuitive argument would probably have to make it obvious that the mean and median were equal when the mean is an integer.