New in the survey package - Biased and Inefficient

Version 4.0 of the survey package is on its way to CRAN.

There are two main updates, which improve the estimation of contrasts

First, a couple of improvements to the handling of replicates. When svycontrast is used on an object that includes replicate estimates, the estimates will now be transformed and then used to estimate a variance, rather than using the delta method. I think that’s the right thing to do, though you might also want to compute a confidence interval on the original scale and transform the interval. Also, there’s a function to estimate variances from a set of arbitrarily transformed replicates, eg correlation matrices

dstrat <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
bstrat<- as.svrepdesign(dstrat,type="subbootstrap")

v <- svyvar(~api00+api99, bstrat, return.replicates=TRUE)
vcor<-cov2cor(as.matrix(v))[2,1]
vreps<-v$replicates
correps<-apply(vreps,1, function(v) v[2]/sqrt(v[1]*v[4]))

vcov(bstrat,correps, centre=vcor)

## [1] 1.486932e-05
## attr(,"means")
## [1] 0.9751789

Second, some functions (so far, svymean, svytotal, svyratio, svykappa,svyglm, svymle) have the option to return influence functions as an attribute of the estimate. This allows svyby to estimate a covariance matrix including between-domain covariances, so that confidence intervals and standard errors are available for between-domain contrasts. Previously, they have only been available for replicate-weight designs (though for totals and means you could get them with an appropriate regression model). Returning influence functions is not the default, because they can take up a lot of space.

For example

dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

totals<-svyby(~enroll, ~stype, svytotal,
                 design=dclus1, covmat=TRUE)

svycontrast(totals, quote(E/H))

##          nlcon     SE
## contrast 3.939 1.4319

svycontrast(totals, quote(E-H))

##            nlcon     SE
## contrast 1574122 543747

The only important function where this does not look feasible is svyquantile, because of the lack of smoothness (though there are tests for equality of quantiles between groups available with svyranktest.)

Other changes

There is a function svynls() to do non-linear least-squares with probability weights. It does most of its work by calling nls, so if you know how to fit your model with nls you’ll be ok.

There are also a couple of significant documentation notes

Lots of people have asked how to get ‘robust’ standard errors for quasipoisson models fitted with svyglm, so there is a note saying that there isn’t any choice: the standard error estimates are generalisations of the ‘sandwich’ standard errors
I had an interesting exchange about standard errors in predict.svyglm. I implemented these the obvious way: $v a r [x \hat{β}] = x^{T} v a r [\hat{β}] x .$ For the special case where the new value of $x$ is the true population mean, most people use a different standard error estimator that’s a rescaling of the weighted residual mean square (especially in simple random samples in textbooks, but also in real life). If I’m reading Chapter 2 of Fuller’s Sampling Statistics correctly, my estimator is noisier, but is valid under weaker assumptions.