2 min read

Does svyglm use robust standard errors?

Yes

This question comes up from time to time on social media or StackExchange or email, often from reasonable people, so extra emphasis might be useful.

There are two parts to the answer:

Yes

and

If you think about it, what else could it be using?

Write β^ for the svyglm estimator. Theory says the estimator solves the weighted score equations

U(β)=i=1NRiπiUi(β)=0

where N is the population size, Ri is the sampling indicator, and Ui=βi(β) is the score. Doing an Taylor series expansion on this gives

U(β)=U(β^)+(β^β)Uβ+remainder so that β^β[Uβ]1U(β0). The large-sample variance approximation is then the ‘sandwich’ var^[β^]=[Uβ]1var^[U(β0)][Uβ]1.

This is all similar to, eg, Huber or White’s derivation of the sandwich estimator. The only difference is that the middle term1 has to be estimated differently because of the survey design. That is, the svyglm variance estimator generalises the familiar sandwich estimators to allow for non-trivial sampling.

The middle term is the variance of an estimated population total, and is estimated the same way as for any other population total. This is literally true: all the population-total variance estimates go through the function svyrecvar.2

The middle term is i,jRijπijRiUiπiRjUjπj where πij are the pairwise sampling probabilities. If you had independent sampling of individual records, so cov[Ri,Rj]=0, the middle term would reduce to iRiπi[RiUiπi]2 and the whole thing simplifies to a standard sandwich estimator.

Model-based standard error estimates are based on simplifying the sandwich estimator by making stronger assumptions about the structure of the middle term. We can’t do this with survey data: we don’t necessarily assume anything about how the finite population was generated, so no simplifications are available.

So, yes, all the model in the survey and svyVGAM packages use model-robust standard errors.


  1. meat? cheese? falafel? avocado?↩︎

  2. or analogous functions such as ppsvar or twophasevar for other categories of design↩︎