Someone asked on StackOverflow how to compute the kurtosis and its standard error using the survey package. They were thinking of using svyrecvar
, the variance estimation function that underlies most things in the package. That’s not actually the easiest approach: svyrecvar
works with estimating questions, but for the kurtosis we don’t have estimating functions and instead have an explicit definition in terms of totals:
\[\kappa = \frac{E[(X-\mu)^4]}{E[(X-\mu)^2]^2}.\]
It’s likely to be easier to work with svycontrast
and represent \(\hat\kappa\) as a function of estimated moments.
Let’s set up an example
One way to do this is to represent the second and fourth central moments, which go into the kurtosis formula, in terms of the first four uncentered moments. You could work this out, but it’s easier to look it up: \[E[(X-\mu)^4]= -3E[X]^4+6E[X]^2E[X^2]-4E[X]E[X^3]+E[X^4].\]
First, the raw moments
moments<-svymean(~enroll+I(enroll^2)+I(enroll^3)+I(enroll^4), dstrat)
moments
## mean SE
## enroll 5.9528e+02 1.8509e+01
## I(enroll^2) 5.4895e+05 4.0250e+04
## I(enroll^3) 7.5137e+08 1.0066e+08
## I(enroll^4) 1.3348e+12 2.7683e+11
Now, the central moments
central_moments<-svycontrast(moments, list(mu4=quote(-3*enroll^4+6*enroll^2*`I(enroll^2)`-4*enroll*`I(enroll^3)`+`I(enroll^4)`),
sigma2=quote(`I(enroll^2)`-enroll^2)))
central_moments
## nlcon SE
## mu4 3.3618e+11 1.0458e+11
## sigma2 1.9459e+05 2.3116e+04
And finally, the kurtosis
svycontrast(central_moments, quote(mu4/(sigma2*sigma2)))
## nlcon SE
## contrast 8.8783 1.5678
The standard errors are computed at each step using the delta method, and this all works as long as the transformations are ones that deriv
knows how to differentiate. We’re just using basic arithmetic operations so that’s no problem.
And now you have two problems
The next problem is how to turn this into a function. I would normally think of using bquote
to substitute things into expressions, but quoting things that already contain backticks gets messy.
So, the first solution I thought of: rename the user’s variable to something I can control, so I can write explicit formulas. I could rename it to enroll
and use the code I already have, but that would be silly; the user might have a variable called enroll
. Instead, let’s use a variable they probably don’t have:
svykurt<-function(formula, design){
var_name <- formula[[2]]
if (".x." %in% colnames(design)) stop("🙄")
design<-eval(bquote(update(design, .x.=.(var_name))))
moments<-svymean(~.x.+I(.x.^2)+I(.x.^3)+I(.x.^4), design)
central_moments<-svycontrast(moments,
list(mu4=quote(-3*.x.^4+6*.x.^2*`I(.x.^2)`-4*.x.*`I(.x.^3)`+`I(.x.^4)`),
sigma2=quote(`I(.x.^2)`-`.x.`^2)
))
svycontrast(central_moments, quote(mu4/(sigma2*sigma2)))
}
svykurt(~enroll, dstrat)
## nlcon SE
## contrast 8.8783 1.5678
This works, but it’s a bit icky. The user might have a variable called .x.
, and while you could argue people like that deserve everything they get, the variable might have been converted automatically from a name on some other system that isn’t legal in R.
A second approach is to rename the elements of moments
, so you don’t have to do all the messy backtick stuff in the computation of central_moments
:
svykurt1<-function(formula, design){
var_name <- as.character(formula[[2]])
moments_formula <- as.formula(paste0("~",var_name," + I(",var_name,"^2) + I(",
var_name,"^3) + I(", var_name,"^4)"))
moments<-svymean(moments_formula, design)
attr(moments,"names")<-c("one","two","three","four")
dimnames(attr(moments,"var"))<-list(c("one","two","three","four"),c("one","two","three","four"))
central_moments<-svycontrast(moments,
list(mu4=quote(-3*one^4+6*one^2*two-4*one*three+four),
sigma2=quote(two-one^2)
))
svycontrast(central_moments, quote(mu4/(sigma2*sigma2)))
}
svykurt1(~enroll, dstrat)
## nlcon SE
## contrast 8.8783 1.5678
Gaze in awe/horror at the doubly-nested complex assignment that changes the names on the variance matrix!
This is much better, but it relies on knowing the internal structure of the svystat
object returned by svycontrast
, and so is in a state of sin.
It’s already dead
Another option for substitution is to use strings rather than quoted names. Character strings are already dead and so don’t need to be quoted. That’s usually a disadvantage, but it’s helpful here.
svykurt2<-function(formula, design){
var_name <- as.character(formula[[2]])
moments_char<-paste0("I(",var_name,"^",1:4,")")
moments_formula <- make.formula(moments_char)
moments<-svymean(moments_formula, design)
mu4expr<-substitute(-3*one^4+6*one^2*two-4*one*three+four,
list(one=as.name(moments_char[1]),two=as.name(moments_char[2]),
three=as.name(moments_char[3]),four=as.name(moments_char[4])))
sigma2expr<-substitute(two-one^2,
list(one=as.name(moments_char[1]),two=as.name(moments_char[2])))
central_moments<-svycontrast(moments,
list(mu4=mu4expr,
sigma2=sigma2expr
))
svycontrast(central_moments, quote(mu4/(sigma2*sigma2)))
}
svykurt2(~enroll, dstrat)
## nlcon SE
## contrast 8.8783 1.5678
The substitution could be done with bquote
but it’s a bit unnatural – the variables one
, two
, three
, four
have no reason to be littering the local environment. It makes more sense to keep them in a list, use substitute
to put them into the expressions, and use ordinary evaluation to pass the expressions to svycontrast
.