3 min read

Trace estimators and impact factors

For a Secret Project™, I needed a quick estimator of the trace of a matrix. To be precise, I have a rectangular matrix A and I needed tr(B) and tr(B2) where B=ATA. That sounds easy, but A is big enough that you don’t want to compute ATA

The first one actually is easy: tr(B)=ij(Aij)2. The second one is harder.  I tried a sampling approach: estimating a sample of the entries of B and using tr(B2)=ij(Bij)2.  The performance was not great, even when I tried stratified sampling – getting all the diagonal entries of B and a random sample of the off-diagonal entries.

I should have Googled before thinking, rather than after thinking. There is a standard solution: Hutchinson’s randomised trace estimator. If z is a vector whose entries are independent, mean zero, unit variance, then E[zTBz]=tr(B). 
So, if you have k random vectors z1,,zk, a trace estimator is 
tr^(B2)=1kiziTB2zi=1kiziATAATAzi=1kiATAzi22

Hutchinson used Rademacher (random ±1) variables for the entries of z. You could also use standard Normals and it wouldn’t make much difference – it would affect what theoretical tools you could use to get tail bounds.  There’s a slight benefit in having the zi be uncorrelated, so in one application where it was convenient I used vectors Az produced by the subsampled randomised Hadamard transform, but when A is sparse and that isn’t convenient I just used standard Normals. You don’t get high accuracy – the central limit theorem says the error is of order k1/2 – but you get a reasonable estimator quickly.

The Hutchinson trace estimator has also been influential in thinking about degrees of freedom for semiparametric smoothers, as Grace Wahba mentions in her chapter of “Past, Present, Future of Statistical Science”.

Since I knew tr(B) I could improve on the Hutchinson estimator by ratio estimation from survey sampling. If you estimate tr(B) using the same z, a better estimator of tr(B2) is
tr^ratio(B2)=tr^(B2)tr(B)tr^(B)
The sampling errors in the two estimators are correlated, so the error in the ratio is smaller – by about 50% in my case. 

So, impact factors? Hutchinson’s paper “A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines” is in the journal ‘Communications in Statistics - Simulation and Computation’, well known for not being especially picky or well read. It still has quite a few gems, such as Hutchinson’s paper and Pepe & Anderson’s paper about conditioning on X at all times or just the current time in longitudinal data analysis.  “This paper is crap because it’s in a crap journal” has the same form as the classical argumentum ad hominem. It saves effort, but it’s still a fallacy.