I’ve mentioned before that mathematician Tim Gowers had run a ‘polymath’ (massively collaborative maths research) project on non-transitive dice. There’s an arXiv preprint. There’s also a detailed write-up in Quanta, which is a magazine devoted to popular explanations of maths.
As I’ve said before, this is statistically interesting (as well as being just interesting) because any instance of non-transitive dice is also an instance of a non-transitive Wilcoxon/Mann-Whitney test. So what do we now know about the Wilcoxon test?
The mathematicians looked at dice with \(n\) faces whose values were sampled (with replacement) from \(1,\dots,n\). That is, they looked at a specific class of roughly uniform distributions. For these distributions, there were basically two cases
- if the means of three distributions were different then the dice/Wilcoxon tests were ordered the same way as the means (ie, the \(t\)-test), with high probability
- if the means were all the same, there was almost as much non-transitivity as possible: \(A\) beats \(B\) and \(B\) beats \(C\) gave almost no information about whether \(A\) beats \(C\).
The shape of the distributions is relevant because the distribution of ranks is uniform: exactly, for a single sample, and approximately, for a set of samples from the same distribution. So, another way of phrasing the statement that the Wilcoxon test is a comparison of the mean rank is to say that the Wilcoxon test is a test of the mean if the data have the sort of roughly-uniform distribution that ranks do under the null hypothesis that all the distributions are nearly the same.
The disadvantage of this formulation is that it’s less precise; the advantage is that it is in terms of single-sample summary statistics rather than summaries of the combined samples.