Surveys (especially educational surveys) have a thing called ‘plausible values’, which are a form of multiple imputation, only by design rather than because of non-response. To reduce effort, not everyone answers every question. Often, there are a lot of variables that don’t need imputing, and a few that do.
The data example I showed in the last post, for mixed models, has five plausible values for the maths score. I only used PV1MATH
. This time, I’ve written code to handle the plausible values in a reasonably general way. The code is currently in a GitHub gist, but will make its way to the survey package in due course.
The idea of the code is that you specify mappings that tell R which variables in the dataset are the same actual variable, and what you want to call it in your analysis. You also specify your analysis as a function or quoted expression, and the code goes off to construct appropriate survey design objects behind the scenes.
For a simple example, we’ll use the same data set on maths achievment as in the previous post
library(svylme)
library(mitools)
source("https://gist.githubusercontent.com/tslumley/9ebe5e16bc001ba22978c2d4edb057ac/raw/7136ee5a7f9dc98784faf6cceecbd9257b04dc9c/withPV.R")
data(nzmaths)
des<-svydesign(id=~SCHOOLID+STIDSTD, strata=~STRATUM, nest=TRUE,
weights=~W_FSCHWT+condwt, data=nzmaths)
oo<-options(survey.lonely.psu="remove")
Now, we want to fit a linear mixed model to each plausible value for the maths score. We specify the mapping maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH
to say that each of these variables in turn will be used as maths
, and in the action
argument the variable maths
is used. [As with detecting ‘attached’ in email messages, it might be useful to add a warning when the original variable appears in action
instead of the mapped variable]
options(width=80)
results_list<-withPV(
mapping = maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH,
design = des,
action = function(d) {svy2lme(maths~ (1+ ST04Q01 |SCHOOLID)+ ST04Q01*(PCGIRLS+SMRATIO)+MATHEFF+OPENPS, design=d)}
)
summary(MIcombine(results_list))
## Multiple imputation results:
## withPV.survey.design(mapping = maths ~ PV1MATH + PV2MATH + PV3MATH +
## PV4MATH + PV5MATH, design = des, action = function(d) {
## svy2lme(maths ~ (1 + ST04Q01 | SCHOOLID) + ST04Q01 * (PCGIRLS +
## SMRATIO) + MATHEFF + OPENPS, design = d)
## })
## MIcombine.default(results_list)
## results se (lower upper) missInfo
## (Intercept) 495.42036072 18.14189679 459.8558021 530.9849194 3 %
## ST04Q01Male 62.99564416 21.98507932 19.7946822 106.1966061 10 %
## PCGIRLS 48.79661788 17.33506727 14.7248092 82.8684266 10 %
## SMRATIO -0.04053017 0.09886503 -0.2343792 0.1533188 4 %
## MATHEFF 47.04499873 2.43224141 42.2666121 51.8233854 9 %
## OPENPS 13.26229509 3.31646573 6.6483729 19.8762173 26 %
## ST04Q01Male:PCGIRLS -121.97729327 35.18342318 -191.0941443 -52.8604423 9 %
## ST04Q01Male:SMRATIO -0.05421422 0.11669297 -0.2830117 0.1745833 4 %
You can do simpler analyses, of course
the_means <- withPV(maths~PV1MATH+PV2MATH+PV3MATH+PV4MATH+PV5MATH,
des,
function(d) svymean(~maths,d))
summary(MIcombine(the_means))
## Multiple imputation results:
## withPV.survey.design(maths ~ PV1MATH + PV2MATH + PV3MATH + PV4MATH +
## PV5MATH, des, function(d) svymean(~maths, d))
## MIcombine.default(the_means)
## results se (lower upper) missInfo
## maths 499.7499 4.203189 491.511 507.9888 2 %
The returned list will usually contain references to the environments where all the individual survey designs were created, so you don’t want to keep it around too long if those design objects are big.