2 min read

Wald, score, LRT: the picture

One issue in teaching generalised linear models (or likelihood theory) is the relationship between the Wald, score, and likelihood ratio tests. I have a picture.

Let’s make up a score function U(θ), in this case for a trivial binomial model, and draw it.

logit <-function(p) log(p/(1-p))
expit <-function(x) exp(x)/(1+exp(x))
U<-function(theta) 11/12-expit(theta)
thetahat<-logit(11/12)

curve( U(x),from=0, to =3, xlab=expression(theta),ylab=expression(U(theta)))
abline(h=0,lty=2)
abline(v=0,lty=2)

The likelihood ratio statistic is twice the area under the curve 2((θ^)(0))=20θ^U(θ)dθ

curve( U(x),from=0, to =3, xlab=expression(theta),ylab=expression(U(theta)))
abline(h=0,lty=2)
abline(v=0,lty=2)
polygon(c(seq(0, thetahat,length=51),0), c(U(seq(0, thetahat,length=51)),0),col="#00000040")

We can approximate this area by either of two triangles. One approximating triangle is tangent to the curve at θ=0

Udot<-function(theta,d=0.001) (U(theta+d)-U(theta))/d

curve( U(x),from=0, to =3, xlab=expression(theta),ylab=expression(U(theta)))
abline(h=0,lty=2)
abline(v=0,lty=1,col="red")
abline(U(0),Udot(0),col="red")
polygon(x=c(0,0,-U(0)/Udot(0)),y=c(0,U(0),0), border=NA, col="#FF000060")

The red area is half the score test statistic: half of U(0)×(U(0)/U(0))

The other approximating triangle is tangent to the curve at θ^

curve( U(x),from=0, to =3, xlab=expression(theta),ylab=expression(U(theta)))
abline(h=0,lty=1,col="royalblue")
abline(v=0,lty=2)
abline(-Udot(thetahat)*thetahat,Udot(thetahat),col="royalblue")
polygon(x=c(0,0,thetahat),y=c(0,-Udot(thetahat)*thetahat,0), border=NA, col="#0000FF60")

The blue area is half the Wald test statistic: half of (θ^U(θ^))×θ^

Here are all three of them together

curve( U(x),from=0, to =3, xlab=expression(theta),ylab=expression(U(theta)))
abline(h=0,lty=2)
abline(v=0,lty=2)
abline(U(0),Udot(0),col="red")
abline(-Udot(thetahat)*thetahat,Udot(thetahat),col="royalblue")
polygon(x=c(0,0,-U(0)/Udot(0)),y=c(0,U(0),0), border=NA, col="#FF000040")
polygon(x=c(0,0,thetahat),y=c(0,-Udot(thetahat)*thetahat,0), border=NA, col="#0000FF40")
polygon(c(seq(0, thetahat,length=51),0), c(U(seq(0, thetahat,length=51)),0),col="#00000040")

Under local alternatives, when θ^ approaches θ0 the curve U(θ) will straighten out (by Taylor’s Theorem) and the three areas will be closer together

U<-function(theta) 8/14-expit(theta)
thetahat<-logit(8/14)
curve( U(x),from=0, to =0.3, xlab=expression(theta),ylab=expression(U(theta)))
abline(h=0,lty=2)
abline(v=0,lty=2)
abline(U(0),Udot(0),col="red")
abline(-Udot(thetahat)*thetahat,Udot(thetahat),col="royalblue")
polygon(x=c(0,0,-U(0)/Udot(0)),y=c(0,U(0),0), border=NA, col="#FF000040")
polygon(x=c(0,0,thetahat),y=c(0,-Udot(thetahat)*thetahat,0), border=NA, col="#0000FF40")
polygon(c(seq(0, thetahat,length=51),0), c(U(seq(0, thetahat,length=51)),0),col="#00000040")

But under fixed alternatives this isn’t guaranteed even at large sample size:

U<-function(theta) 110/120-expit(theta)
thetahat<-logit(110/120)
curve( U(x),from=0, to =3, xlab=expression(theta),ylab=expression(U(theta)),ylim=c(0,.4))
abline(h=0,lty=2)
abline(v=0,lty=2)
abline(U(0),Udot(0),col="red")
abline(-Udot(thetahat)*thetahat,Udot(thetahat),col="royalblue")
polygon(x=c(0,0,-U(0)/Udot(0)),y=c(0,U(0),0), border=NA, col="#FF000040")
polygon(x=c(0,0,thetahat),y=c(0,-Udot(thetahat)*thetahat,0), border=NA, col="#0000FF40")
polygon(c(seq(0, thetahat,length=51),0), c(U(seq(0, thetahat,length=51)),0),col="#00000040")