4 min read

Two approaches to approximating sums of chisquareds

The distribution of a n×n quadratic form in Gaussian variables, XTAX is a linear combination of χ12 variables: Qi=1nλiZi2 where λi are the eigenvalues of AΞ and Ξ is the covariance matrix of X. This is not in the tables at the back of the book.

For reasonable values of n and reasonable tail probabilities, the Satterthwaite approximation is perfectly reasonable. It is of the form Q~0aχd2 where a=i=1nλi2i=1nλi is a sort of average λ and d=(i=1nλi)2i=1nλi2 is a sort of effective sample size. The Satterthwaite approximation matches the mean and variance of Q.

We were interested, though, in genomic applications where n was unreasonably large and we wanted quite small tail probabilities. The Satterthwaite approximation is not good in the tails: the relative error grows exponentially as you move out into the tail.

When n is large, it matters whether you are given λ or AΞ or something even prior to AΞ. If you just have λ, you can use a saddlepoint approximation, or there are various algorithms for computing the tail probabilities to arbitrary accuracy (if you have arbitrary precision to work with).

There are two other approaches to approximations. The first is to approximate more moments than just two. For example, Liu et al use a scaled non-central χ2 to match the first three cumulants1 and gets the fourth either correct or close. The second approach is to do the first few terms of the sum and use the Satterthwaite approximation for the remainder Q~ki=1kλiZi2+akχdk2 with ak=i=k+1nλi2i=k+1nλi and dk=(i=k+1nλi)2i=k+1nλi2.

I’m going to use an example that I’ve used before, a simulated genomic example from the bigQF package. There’s a 4028×4028 matrix with 3336 non-zero eigenvalues.

This picture shows the first few cumulants of the full sum, the Satterthwaite approximation, the Liu approximation, and the leading-term approximations, with the vertical axis indicating the relative error

The Satterthwaite approximation underestimates the higher cumulants badly: a χd2 has much lighter tails than the weighted sum of χ12, because the tail of the weighted sum is dominated by the terms with larger λ. The Liu approximation is much better, but eventually underestimates for the same reason. The leading-term approaches have the opposite problem: they get the higher cumulants right, but underestimate the the first few, the skewness and kurtosis, and so on.

How does this compare for the tail probabilities?

On the horizontal axis we have the number of standard deviations above the mean, and on the vertical axis the relative error of the approximations to the tail probability. The leading-term approximations have the number of terms in the right margin.

The Satterthwaite approximation is bad in the tails. The Liu approximation is actually very good – you need to go a long way out into the tail before it gets bad – but it shows the same qualitative pattern of getting worse as the tail probabilities get smaller. The leading-term approximations get broadly2 better with more terms, as you’d expect. It shows the qualitative pattern of settling down towards a bounded relative error.

That is, we can think of the Liu approximation as improving the Satterthwaite approximation from the middle of the distribution out, and the leading-term approximation as improving it from the tails in. With 20 terms the leading-term approximation is better than the Liu approximation, but not with just a few terms.

Which should we prefer? Well, if we are just given λ then Liu et al’s approximation is much faster to compute than a good leading-term approximation and also faster than the full saddlepoint approximation, so unless you care about the extremely extreme tails it looks like a good bet. On the other hand, if we are just given AΞ, Liu’s method requires the trace of (AΞ)4. Working this out is no faster for large m than working out all the eigenvalues. Working out the first k eigenvalues3 and the Satterthwaite approximation to the remainder takes only O(m2k) time, so (for suitable k) it’s faster and more accurate.


  1. the mth cumulant, like the mth moment, is the expected value of an mth-order polynomial, and if you don’t know which polynomial you probably don’t care↩︎

  2. not uniformly better, which I think is due to rounding of the χ2 degrees of freedom to an integer somewhere down in the code↩︎

  3. with a Lanczos-type algorithm or stochastic SVD↩︎