# The fourth-root thing

This post is partly because I think the result is interesting and partly to see if anyone will tell me an original reference.

Suppose we get $$\hat\beta$$ by solving $$U(\beta;\alpha)=0$$ and that $$\alpha$$ is a nuisance parameter we plug into the equation. Assume that for any fixed $$\alpha$$, $E[U(\beta_0;\alpha)]=0.$ Assume $U(\beta,\alpha)=\frac{1}{n}\sum_{i=1}^n U_i(\beta,\alpha)$ and that $$U$$ converges pointwise (and in mean, assuming finite moments) to its expected value. Also assume enough other regularity that this leads to $\sqrt{n}(\hat\beta-\beta)\stackrel{d}{\to} N(0,\sigma^2(\alpha)).$

Examples include GEE with $$\alpha$$ as the working correlation parameters, and raking with $$\alpha$$ as the imputation model and calibration parameters, and stabilised weights with $$\alpha$$ as the stabilising model parameters.

Now, suppose we have an estimator $$\hat\alpha$$ whose limit in probability exists; we’ll call it $$\alpha^*$$. With enough regularity to differentiate under the expectation $\frac{\partial}{\partial\alpha}\left.E[U(\beta_0;\alpha)]\right|_{\alpha^*}=0 = E\left[\left. \frac{\partial}{\partial\alpha}U(\beta_0;\alpha)\right|_{\alpha^*} \right]$ As the derivative has zero mean, the law of large numbers says $\left. \frac{\partial}{\partial\alpha}U(\beta_0;\alpha)\right|_{\alpha^*}=o_p(1)$ and the central limit theorem says $\left. \frac{\partial}{\partial\alpha}U(\beta_0;\alpha)\right|_{\alpha^*}=O_p(n^{-1/2})$ On the other hand, the derivative with respect to $$\beta$$ does not have mean zero, so it is $$O_p(1)$$. In a parametric model it would be the average per-observation observed Fisher information.

A Taylor series expansion about $$(\beta_0,\alpha^*)$$ gives \begin{align}U(\hat\beta,\hat\alpha)=U(\beta_0,\alpha^*)=&U(\beta_0,\alpha^*)+ (\hat\alpha-\alpha^*)\frac{\partial}{\partial\alpha}U(\beta_0;\alpha^*)\\&+(\hat\beta-\beta_0)\frac{\partial}{\partial\beta}U(\beta_0;\alpha^*)\\&+O_p(\|\hat\alpha-\alpha^*\|^2_2)+O_p(\|\hat\beta-\beta_0\|^2_2)\end{align} If $$\hat\alpha-\alpha^*=o_p(n^{-1/4})$$ then the second, fourth, and fifth terms are $$o_p(n^{-1/2})$$ so $U(\hat\beta,\hat\alpha)=U(\beta_0,\alpha^*)=U(\beta_0,\alpha^*)+ (\hat\beta-\beta_0)\frac{\partial}{\partial\beta}U(\beta_0;\alpha^*)+o_p(n^{-1/2})$ Under the standard smoothness/moment assumptions we can rearrange to $\hat\beta-\beta_0= \left[\frac{\partial}{\partial\beta}U(\beta_0;\alpha^*) \right]^{-1}U(\beta_0,\alpha^*)+o_p(n^{-1/2})$ so the distribution of $$\hat\beta$$ depends on $$\hat\alpha$$ only through $$\alpha^*$$. ◼️

For most purposes the fourth-root condition doesn’t really matter: if you have a fixed finite-dimensional parameter that you can estimate at all, you can probably estimate it at root-$$n$$ rate, and if your parameters are infinite-dimensional or growing in size with $$n$$ you need to worry about more than just powers of $$n$$ in remainders. However, if you needed root-$$n$$ convergence you’d worry that low efficiency would be a problem in sub-asymptotic settings, which is less of a worry if you know fourth-root consistency is enough.

I worked this argument out for the GEE case, back when I was a PhD student, but I certainly wasn’t the first person to do so. I have been told that the first person to come up with the fourth-root part of it was Whitney Newey, which would make sense, but I don’t have a reference. If you know that reference or any early (mid 90s or earlier) reference, I’d like to hear about it.

The Biometrika GEE paper in 1986 has the essential idea that $$\partial_\alpha E[U(\beta_0,\alpha)]=0$$, but it assumes $$n^{1/2}$$ consistency for $$\alpha$$. Also, some people at the time (and since) have been confused by its using ‘consistency’ both for the assumption that $$\hat\beta$$ converges to its true value $$\beta_0$$ and the assumption that $$\hat\alpha$$ converges to something.