The sandwich and the t-test - Biased and Inefficient

As every schoolchild know, you can derive the Student $t$ -test as a linear regression with a single binary predictor. How about the Welch/Satterthwaite unequal-variance $t$ -test?

We have a technique for handling linear regression with unequal variances in the responses, the ‘model-agnostic’¹ or ‘model-robust’ sandwich estimator. You might wonder what happens if you use the sandwich estimator on a linear regression with a single binary predictor.

Let $X$ be binary, coded so it has zero mean (so that it’s orthogonal to the intercept) and fit a linear model with $Y$ as the outcome and $X$ as the predictor: $E [Y] = α + β X .$

We know $\hat{β}$ is the difference in mean between the two groups. The sandwich variance estimator for $(\hat{α}, \hat{β})$ is $(X^{T} X)^{- 1} (\sum_{i = 1}^{n} x_{i} x_{i}^{T} (y_{i} - {\hat{μ}}_{i})^{2}) (X^{T} X)^{- 1}$ First, note that the two outer matrices are diagonal, because of the centering of $X$ , so that we need only consider the $(β, β)$ component.

We can break the inner sum into sums over the two groups. Within each group, $x_{i} x_{i}^{T}$ is constant, so it can be taken out of the sum. Write $x_{(0)}$ , $x_{(1)}$ for the two values of $X$ ; $n_{0}, n_{1}$ for the two sample sizes; and $S_{0}$ , $S_{1}$ for the standard deviations of $Y$ in the two groups. Then $(\sum_{i = 1}^{n} x_{i} x_{i}^{T} (y_{i} - {\hat{μ}}_{i})^{2}) = x_{(0)}^{2} (n_{0} - 1) S_{0}^{2} + x_{(1)}^{2} (n_{1} - 1) S_{0}^{2}$

Next, note that $x_{(0)}$ and $x_{(1)}$ can be determined from $n_{0}$ and $n_{1}$ : we have $x_{(1)} - x_{(0)} = 1$ and $n_{1} x_{(1)} + n_{0} x_{(0)} = 0$ , giving $x_{(1)} = n_{0} / n$ and $x_{(0)} = - n_{1} / n$ , so the middle term is $\frac{n_{1}^{2} (n_{0} - 1)}{n^{2}} S_{0}^{2} + \frac{n_{0}^{2} (n_{1} - 1)}{n^{2}} S_{1}^{2} .$

In the outside of the sandwich the $(β, β)$ element is just $\sum_{i} x_{i}^{2}$ , which is $n_{1} n_{0}^{2} / n^{2} + n_{0} n_{1}^{2} / n^{2} = n_{0} n_{1} / n .$ Putting these together, the variance is $\frac{n}{n_{0} n_{1}} (\frac{n_{1}^{2} (n_{0} - 1)}{n^{2}} S_{0}^{2} + \frac{n_{0}^{2} (n_{1} - 1)}{n^{2}} S_{1}^{2}) \frac{n}{n_{0} n_{1}} = \frac{1}{n_{0}} (\frac{n_{0} - 1}{n_{0}} S_{0}^{2}) + \frac{1}{n_{1}} (\frac{n_{1} - 1}{n_{1}} S_{1}^{2}) .$ This is almost exactly the variance for the Welch-Satterthwaite $t$ -test, except that it uses $n_{i}$ rather than $n_{i} - 1$ in the denominator of the individual group variances. Or, writing ${\hat{σ}}_{i}^{2}$ for the variance estimator in group $i$ using $n_{i}$ in the denominator it’s just ${\hat{σ}}_{0}^{2} / n_{0} + {\hat{σ}}_{1}^{2} / n_{1}$ .

So, the Welch-Satterthwaite $t$ -statistic is basically just a linear regression with a binary predictor and the sandwich variance estimator, just as Student’s $t$ -test is a linear regression with a binary predictor and the Fisher-information variance estimator.

We don’t get the degrees of freedom that way. Improving on the Normal reference distribution for $t$ -statistics with the sandwich estimator is a bit more complicated.

Nils Lid Hjort’s term for them, which I really like↩