2 min read

Computing the (simplest) sandwich estimator incrementally

The biglm package in R does {incremental, online, streaming} linear regression for data potentially larger than memory. This isn’t rocket science: accumulating XTX and XTY is trivial; the package just goes one step better than this by using Alan Miller’s incremental QR decomposition code to reduce rounding error in ill-conditioned problems. 

The code also computes the Huber/White heteroscedasticity-consistent variance estimator (sandwich estimator). Someone wants a reference for this. There isn’t one, because it’s too minor to publish, and I didn’t have a blog ten years ago.  But I do now. So:

The Huber/White variance estimator A1BA1, where A1=(XTX)1 and B=(XT(Yˆμ))2

The (i,j) element of B is
Nk=1xki(ykxkˆβ)xkj(ykxkˆβ)

Multiplying this out, we get
Nk=1xkixkjy2k
and about 2p terms that look like
Nk=1xkixkjxkykˆβ
and about p2 terms that look like
Nk=1xkixkjxkxkmˆβˆβm

We can move the βs outside the sums, so the second sort of terms look like ˆβ(Nk=1xkixkjxkyk) and the third sort look like ˆβ(Nk=1xkixkjxkxkm)ˆβm

Now if we define Z to have columns xixj and xiy (for all i,j), the matrix ZTZ contains all the x and y pieces needed for B.  The obvious thing to do is just to accumulate ZTZ in R code, one chunk at a time.

If you were too convinced of your own cleverness you might realise that (X,Z) could be fed into the QR decomposition as if it were X, and that you’d get ZTZ For Free! Where ‘for free’ means at O((p2)3) extra computing time plus the mental anguish of reconstructing ZTZ from the QR decomposition.  It’s not a big deal, since the computation is dominated by the O(np) cost of reading the data, but it does look kinda stupid in retrospect.

I suppose that means I’ve learned something in ten years.