2 min read

The sandwich and the t-test

As every schoolchild know, you can derive the Student t-test as a linear regression with a single binary predictor. How about the Welch/Satterthwaite unequal-variance t-test?

We have a technique for handling linear regression with unequal variances in the responses, the ‘model-agnostic’1 or ‘model-robust’ sandwich estimator. You might wonder what happens if you use the sandwich estimator on a linear regression with a single binary predictor.

Let X be binary, coded so it has zero mean (so that it’s orthogonal to the intercept) and fit a linear model with Y as the outcome and X as the predictor: E[Y]=α+βX.

We know β^ is the difference in mean between the two groups. The sandwich variance estimator for (α^,β^) is (XTX)1(i=1nxixiT(yiμ^i)2)(XTX)1 First, note that the two outer matrices are diagonal, because of the centering of X, so that we need only consider the (β,β) component.

We can break the inner sum into sums over the two groups. Within each group, xixiT is constant, so it can be taken out of the sum. Write x(0), x(1) for the two values of X; n0,n1 for the two sample sizes; and S0, S1 for the standard deviations of Y in the two groups. Then (i=1nxixiT(yiμ^i)2)=x(0)2(n01)S02+x(1)2(n11)S02

Next, note that x(0) and x(1) can be determined from n0 and n1: we have x(1)x(0)=1 and n1x(1)+n0x(0)=0, giving x(1)=n0/n and x(0)=n1/n, so the middle term is n12(n01)n2S02+n02(n11)n2S12.

In the outside of the sandwich the (β,β) element is just ixi2, which is n1n02/n2+n0n12/n2=n0n1/n. Putting these together, the variance is nn0n1(n12(n01)n2S02+n02(n11)n2S12)nn0n1=1n0(n01n0S02)+1n1(n11n1S12). This is almost exactly the variance for the Welch-Satterthwaite t-test, except that it uses ni rather than ni1 in the denominator of the individual group variances. Or, writing σ^i2 for the variance estimator in group i using ni in the denominator it’s just σ^02/n0+σ^12/n1.

So, the Welch-Satterthwaite t-statistic is basically just a linear regression with a binary predictor and the sandwich variance estimator, just as Student’s t-test is a linear regression with a binary predictor and the Fisher-information variance estimator.

We don’t get the degrees of freedom that way. Improving on the Normal reference distribution for t-statistics with the sandwich estimator is a bit more complicated.


  1. Nils Lid Hjort’s term for them, which I really like