3 min read

The fourth-root thing

This post is partly because I think the result is interesting and partly to see if anyone will tell me an original reference.


Suppose we get β^ by solving U(β;α)=0 and that α is a nuisance parameter we plug into the equation. Assume that for any fixed α, E[U(β0;α)]=0. Assume U(β,α)=1ni=1nUi(β,α) and that U converges pointwise (and in mean, assuming finite moments) to its expected value. Also assume enough other regularity that this leads to n(β^β)dN(0,σ2(α)).

Examples include GEE with α as the working correlation parameters, and raking with α as the imputation model and calibration parameters, and stabilised weights with α as the stabilising model parameters.

Now, suppose we have an estimator α^ whose limit in probability exists; we’ll call it α. With enough regularity to differentiate under the expectation αE[U(β0;α)]|α=0=E[αU(β0;α)|α] As the derivative has zero mean, the law of large numbers says αU(β0;α)|α=op(1) and the central limit theorem says αU(β0;α)|α=Op(n1/2) On the other hand, the derivative with respect to β does not have mean zero, so it is Op(1). In a parametric model it would be the average per-observation observed Fisher information.

A Taylor series expansion about (β0,α) gives U(β^,α^)=U(β0,α)=U(β0,α)+(α^α)αU(β0;α)+(β^β0)βU(β0;α)+Op(α^α22)+Op(β^β022) If α^α=op(n1/4) then the second, fourth, and fifth terms are op(n1/2) so U(β^,α^)=U(β0,α)=U(β0,α)+(β^β0)βU(β0;α)+op(n1/2) Under the standard smoothness/moment assumptions we can rearrange to β^β0=[βU(β0;α)]1U(β0,α)+op(n1/2) so the distribution of β^ depends on α^ only through α. ◼️

For most purposes the fourth-root condition doesn’t really matter: if you have a fixed finite-dimensional parameter that you can estimate at all, you can probably estimate it at root-n rate, and if your parameters are infinite-dimensional or growing in size with n you need to worry about more than just powers of n in remainders. However, if you needed root-n convergence you’d worry that low efficiency would be a problem in sub-asymptotic settings, which is less of a worry if you know fourth-root consistency is enough.

I worked this argument out for the GEE case, back when I was a PhD student, but I certainly wasn’t the first person to do so. I have been told that the first person to come up with the fourth-root part of it was Whitney Newey, which would make sense, but I don’t have a reference. If you know that reference or any early (mid 90s or earlier) reference, I’d like to hear about it.

The Biometrika GEE paper in 1986 has the essential idea that αE[U(β0,α)]=0, but it assumes n1/2 consistency for α. Also, some people at the time (and since) have been confused by its using ‘consistency’ both for the assumption that β^ converges to its true value β0 and the assumption that α^ converges to something.