The most important part of this post

`library(svylme)`

It exists!

Currently you have to get it from github

`remotes::install_github("tslumley/svylme")`

Let’s look at an example. This is data from the New Zealand component of the PISA educational survey

`data(nzmaths)`

There is only one school in one of the strata, so we’ll combine two strata:

```
nzmaths$cSTRATUM<- nzmaths$STRATUM
nzmaths$cSTRATUM[nzmaths$cSTRATUM=="NZL0102"]<-"NZL0202"
```

We have weights both for the student and the school; the `condwt`

variable is the implied weights for the student stage of sampling. We get a two-stage design with weights at both stages. We need weights at both stages in order to work out the pairwise weights

```
des<-svydesign(id=~SCHOOLID+STIDSTD, strata=~cSTRATUM, nest=TRUE,
weights=~W_FSCHWT+condwt, data=nzmaths)
```

The variable `PCGIRLS`

is percentage of girls at the school, and it’s nearly 0, nearly 100%, or roughly in the middle, so I will centre it. We’re also going to want jackknife weights later, so we can make them now

```
des<-update(des, centPCGIRLS=PCGIRLS-0.5)
jkdes<-as.svrepdesign(des)
```

And finally, a model

```
m1<-svy2lme(PV1MATH~ (1+ ST04Q01 |SCHOOLID)+ST04Q01*(centPCGIRLS+SMRATIO)+MATHEFF+OPENPS, design=des, return.devfun=TRUE)
m1
```

```
## Linear mixed model fitted by pairwise pseudolikelihood
## Formula: PV1MATH ~ (1 + ST04Q01 | SCHOOLID) + ST04Q01 * (centPCGIRLS +
## SMRATIO) + MATHEFF + OPENPS
## Random effects:
## Std.Dev.
## SCHOOLID1:(Intercept) 22.9365
## SCHOOLID2:ST04Q01Male 10.8911
## Residual: 69.4969
## Fixed effects:
## beta SE t p
## (Intercept) 517.93076 12.64958 40.945 < 2e-16
## ST04Q01Male 4.93730 15.49920 0.319 0.750067
## centPCGIRLS 54.28155 16.21814 3.347 0.000817
## SMRATIO -0.02874 0.08868 -0.324 0.745820
## MATHEFF 46.49445 1.97221 23.575 < 2e-16
## OPENPS 14.01873 2.64160 5.307 1.12e-07
## ST04Q01Male:centPCGIRLS -128.97960 32.71461 -3.943 8.06e-05
## ST04Q01Male:SMRATIO -0.07085 0.11432 -0.620 0.535419
```

The model has student gender, proportion of girls at the school, their interaction, student/teacher ratio in mathematics, and the two attitude scores. The scores measure openness to problem solving (‘I can solve problems’) and mathematics self-efficacy (‘I can learn maths’).

The interpretation of the individual/school gender interaction may not be immediately obvious: it says that both boys and girls had higher average scores at single-sex than coeducational schools. We should be cautious about interpreting this causally, as single-sex schools in New Zealand differ from coeducational schools in so many other ways as well.

To get standard errors for the variance parameters we need to use the jackknife or bootstrap. This is a bit slow (there’s a progress bar for interactive use, which I will turn off). Using a faster-than-default BLAS for linear algebra will help.

```
m1var<-boot2lme(m1,jkdes,verbose=FALSE)
SE(m1var,"beta")
```

```
## [1] 13.6283187 18.1704453 17.5141901 0.0963626 2.0416978 2.6849496 36.4502191
## [8] 0.1367522
```

`SE(m1var, "theta")`

`## [1] 0.08366869 0.09340268 0.05727806`

The jackknife SEs for \(\beta\) are slightly larger than the sandwich ones, which shouldn’t surprise anyone. The SEs for \(\theta\) are a bit hard to interpret, because \(\theta\) are a bit hard to interpret – they are components of the Cholesky square root of the relative variance matrix.

More usefully, here are the standard errors for the random-effect standard deviations

`SE(m1var,"SD")`

`## [1] 5.820113 6.754290`

### Issues?

**What if you don’t have weights at each stage?** The answer is currently unclear. More research is needed on ways to approximate the pairwise probabilities when they are not known

**How does this compare to using -gllamm- and its successors, like sensible people do?** It’s pretty similar, but this approach can handle a much wider range of designs and models – you don’t need the model clusters and design clusters to be the same. More systematic comparisons here: there’s some loss of efficiency with pairwise likelihood, but the estimates stay reasonable under more extreme sampling designs.

**What about generalised linear mixed models?** That sounds like a good PhD project!