Another update on non-transitive dice - Biased and Inefficient

I’ve mentioned before that mathematician Tim Gowers had run a ‘polymath’ (massively collaborative maths research) project on non-transitive dice. There’s an arXiv preprint. There’s also a detailed write-up in Quanta, which is a magazine devoted to popular explanations of maths.

As I’ve said before, this is statistically interesting (as well as being just interesting) because any instance of non-transitive dice is also an instance of a non-transitive Wilcoxon/Mann-Whitney test. So what do we now know about the Wilcoxon test?

The mathematicians looked at dice with $n$ faces whose values were sampled (with replacement) from $1, \dots, n$ . That is, they looked at a specific class of roughly uniform distributions. For these distributions, there were basically two cases

if the means of three distributions were different then the dice/Wilcoxon tests were ordered the same way as the means (ie, the $t$ -test), with high probability
if the means were all the same, there was almost as much non-transitivity as possible: $A$ beats $B$ and $B$ beats $C$ gave almost no information about whether $A$ beats $C$ .

The shape of the distributions is relevant because the distribution of ranks is uniform: exactly, for a single sample, and approximately, for a set of samples from the same distribution. So, another way of phrasing the statement that the Wilcoxon test is a comparison of the mean rank is to say that the Wilcoxon test is a test of the mean if the data have the sort of roughly-uniform distribution that ranks do under the null hypothesis that all the distributions are nearly the same.

The disadvantage of this formulation is that it’s less precise; the advantage is that it is in terms of single-sample summary statistics rather than summaries of the combined samples.