We often have theory that says \[\sqrt{n}(\hat\theta_n-\theta)\stackrel{d}{\to}N(0,\sigma^2),\]

and then do simulations to see how well the asymptotic approximation applies. After doing so, we often present tables of the empirical mean and standard deviation of \(\hat\theta_n.\) This doesn’t make a lot of sense.

Knowing that \(\sqrt{n}(\hat\theta_n-\theta)\stackrel{d}{\to}N(0,\sigma^2)\) doesn’t tell us anything about the moments of \(\hat\theta_n\) for any finite \(n\). Convergence in distribution does not imply convergence in mean. For example, \(\hat\theta_n\) could be maximum likelihood estimates in a logistic regression model. These have no finite moments for any finite \(n\), because they are infinite with positive probability.

However, the asymptotic result does tell us that the quantiles of \(\hat\theta_n\), suitably scaled, should converge to those of the approximating Normal distribution. The median of \(\hat\theta_n\) should converge to \(\theta\); the MAD of \(\hat\theta_n\) (after scaling by \(\sqrt{n}\)) should converge to \(\sigma\), and the probability that \((\hat\theta_n-1.96\times\widehat{se}[\hat\theta_n],\,\hat\theta_n+1.96\times\widehat{se}[\hat\theta_n])\) includes \(\theta\) should converge to 95%.

Knowing how good the asymptotic approximation is for these quantile-based statistics is usually sufficient to tell us if the approximation is useful in practice. We usually don’t care about the **mean** of \(\hat\theta_n\) in any substantive way, since if we did, we’d be more worried when it didn’t have one.

I think the usefulness of ‘robust statistics’ gets oversold a lot, but this really is a case where it’s meaningful and true to say that the median of an approximately-Normal variable is a better summary of the location parameter than the mean is. We should be presenting robust (ie, weakly continuous) summaries of the results of simulations motivated by asymptotics unless there’s a specific reason to care about moments.