3 min read

A Bayesian t-test, again

The term “t-test” is a bit of a troll here, since I don’t mean either a test or the Normal-based inference developed by Gossett. I’m interested in two-sample comparisons of means, effectively non-parametric in moderate to large samples.

A frequentist in an elementary course would do this by saying \(\bar X\) and \(\bar Y\) are each (roughly) Normal by the Central Limit Theorem, so \(\bar X-\bar Y\) is also (roughly) Normal, giving interval estimates and, if necessary, tail probabilities. Or, in a different elementary course, by bootstrapping \(\bar X\) and \(\bar Y\) and looking at the roughly bell-shaped resampling distribution.

One approach to this for a Bayesian is to do exactly the same thing: \(\bar X\) is Normal conditional on \(\mu_x\) and \(\sigma^2_x\), which have weakly informative priors, and so is \(\bar Y\). It’s a bit of a pain to use your favourite Bayesian probabilistic programming language for this: if you give it the individual observations it will want a distribution for them, so you need to give it just the means and variances and sample sizes and do more work yourself on the distributions, but that’s feasible. Or you could use some sort of conjugate prior.

Another approach, as I’ve written before, is to treat the observations as coming from unknown distributions and specify Bayesian nonparametric priors over these distributions – a Dirichlet process prior is convenient.

I’ve just come across a third approach. There’s an interesting preprint by Fong, Holmes, and Walker on the ‘martingale posterior’. They argue that Bayesian inference can be generalised to think about sampling the unobserved data in the population conditional on the observed data. When you do this with the right sort of summary function you get ordinary Bayesian parametric inference, but you could also go directly to posterior distributions of summary statistics without going via parameters. In particular, if you want to do a comparison of means between two groups you can just resample the observed data and compute means to generate posterior distributions for the means that aren’t based on any particular parametric model. The practical analysis turns out to be basically the same as the second approach.

Of course there will be assumptions – if your data come from a Cauchy distribution then God Herself can’t get you good estimates of the difference in means – but the basic idea and the sort of assumptions are very similar to the elementary frequentist analysis. It feels like our Bayesian sandwich estimator should have a derivation along these lines. Of course, as you get to more multivariate data and more complicated summaries, you’ll need some sort of smoothing in your resampling, and most of the paper is about ways to do this.