A simple probability problem

Amy Hogan, a stats and maths teacher who blogs at A Little Stats, posted the following quiz on twitter:

(Assuming fair dice) which has the highest probability:

1 six from 6 dice

2 sixes from 12 dice

3 sixes from 18 dice

The calculations aren’t too hard even by hand, and we have pbinom() available (if we remember to check $$<$$ vs $$\le$$ conditions). In that sense the question is easy, but I was looking for an intuitive argument.

Obviously, the probability of exactly $$n$$ sixes from $$6n$$ dice is decreasing in $$n$$, because the distribution is becoming less discrete. On the other hand, the probability of more than $$n$$ sixes is increasing towards 1/2, since the distribution is becoming more symmetric. It isn’t obvious to me which one wins.

Although I’d never encountered this before, it turns out to be a real classic. Isaac Newton answered it for Samuel Pepys, and got the brute-force calculations right, but then came up with an incorrect heuristic argument. Stephen Stigler has a paper, Joe Blitzstein pointed me to it before I wasted too much time.

The neatest relevant fact is that the difference between the median and mean of a Binomial distribution is strictly less than 1, and so when the mean is an integer the two are equal. That implies the sequence $$P[\mathrm{Bin}(nk,1/k)\ge n]$$ will tend to decrease with increasing $$n$$ for any $$k$$, but even that doesn’t quite prove the sequence is strictly monotone: we only know the probability is between $$0.5$$ and $$0.5+P[\mathrm{Bin}(nk,1/k)=n]$$. Also, there’s apparently no simple intuition behind the bound on the difference between mean and median.

In the end, it turns out to be true that $$P[\mathrm{Bin}(nk,1/k)\ge n]$$ is decreasing in $$n$$ for any integer $$k$$,  but (pretty obviously) $$\mathrm{Bin}(nk,p)$$  doesn’t have to be decreasing with $$n$$ for general $$p$$. Any valid intuition has to take advantage of $$p=1/k$$. Stigler seems to think that’s an important barrier; I’m not convinced. Perhaps more off-putting,  any valid intuitive argument would probably have to make it obvious that the mean and median were equal when the mean is an integer.