2 min read

New in the survey package

Version 4.4-1 of the survey package for R is percolating through CRAN. There are some important additions, visible and invisible

The main invisible addition is from Ben Schneider, who has written a set of C++ routines that do the multistage stratified variance calculations previously done by svyrecvar. The compiled versions are the default; use options(survey.use_rcpp=FALSE) to disable them. The C++ code is faster; perhaps more important is that it gives the same answers independently and so is a check on the central routine of the package.

The most important visible addition is the functions svysmoothUnit and svysmoothArea for small-area estimation. These are just an interface to the SUMMER package, but they make a wider range of analyses available. I’ll write a separate post about svysmoothArea, which fits Bayesian versions of the Fay-Herriot model to smooth the direct survey estimates for small areas. The svysmoothUnit function doesn’t use sampling weights; it assumes that sampling is ignorable given a set of unit-level covariates and fits generalised linear models with area random effects. There’s more background here.

If you want to use the small-area estimation functions you need to install the SUMMER package (which is suggested by survey) and also install INLA (which is needed for the SUMMER models). The small-area estimation vignette describes how to do this. The INLA system isn’t an explicit dependency of survey because many users won’t need it and the fact that it doesn’t live on CRAN might make some institutions more reluctant to install it.

There are also other changes: it’s now possible to have arbitrary designs at phase two of a twophase object by specifying a matrix of pairwise sampling probabilities or sampling covariances. The primary motivation for this was to allow Poisson sampling at phase two as a model for non-response, but it will have other uses. There are also some fixes to standard error estimation for some raked two-phase design objects. And there’s a miscellany of smaller bug fixes: for example, confint would sometimes fail to find a profile confidence interval for generalised linear model objects with replicate weights because it was using bad values for the search limits.