Attention Conservation Notice: someone asked about this on social media, and I probably should have it written down for the advanced regression course, but you probably don’t need to know it. Also, if you’re an econometrician then I hate your notation and so you will probably hate mine
In two-stage least squares you have an outcome and want to fit a causal linear regression , but have problems with, say, confounding. If you just did the fit naively you would get an estimate that is biased for the causal and instead estimates the conditional expectation in the current population.
You have some instrumental variables that have all the right properties to be instrumental variables, in particular that the effect of on goes entirely through .
Let be the coefficients for regressing on and be the coefficients for regressing on . The effect of on is , and since this is all mediated by the effect of on is , which can be estimated by . Alternatively, we can take the fitted values and regress on to give directly. These give the same coefficients for
- The ratio: where the hats are the usual empirical estimates, which simplifies to
- The two-stage: Noting that , where is the projection on to the space spanned by , this is The denominator is and the numerator is , and again everything except the last bit cancels.
More generally, you can have multiple s and at least that many s and do the two-stage thing. Again , and the estimator is
Now for the key question. What is the (asymptotic?) variance of ? Is it just what you’d get from the second-stage model? No, but nearly.
We could write in terms of the inputs and then just use bilinearity of variances:
This looks very much like it will simplify to just the ordinary regression variance! There’s one trap: . Our model says , for (say) iid Normal . It doesn’t say .
If we replace by the estimator , where is the identity, we can’t use but we can use to get a consistent estimator of the variance, for some suitable degree-of-freedom correction. The sandwich variance estimator then simplifies in the usual way as the middle and right-hand chunks cancel each other out to give
Now lets’s do an example from AER::ivreg
suppressMessages(library(AER))
data("CigarettesSW", package = "AER")
CigarettesSW <- transform(CigarettesSW,
rprice = price/cpi,
rincome = income/population/cpi,
tdiff = (taxs - tax)/cpi
)
fm2 <- ivreg(log(packs) ~ log(rprice) | tdiff, data = CigarettesSW, subset = year == "1995")
Stage 1
stage1 <-lm(log(rprice)~tdiff, data = CigarettesSW, subset = year == "1995")
hatX <- fitted(stage1)
Y<-with(subset(CigarettesSW, year == "1995"), log(packs))
stage2<-lm(Y~hatX)
The unscaled covariance matrices are the same
summary(stage2)$cov.unscaled
## (Intercept) hatX
## (Intercept) 63.26849 -13.227909
## hatX -13.22791 2.766546
fm2$cov.unscaled
## (Intercept) log(rprice)
## (Intercept) 63.26849 -13.227909
## log(rprice) -13.22791 2.766546
The residual variance in stage2
is not correct
sd(resid(stage2))
## [1] 0.2240255
fm2$sigma
## [1] 0.1903539
But we can compute the right one using instead of
df<-stage2$df.residual
X<-with(subset(CigarettesSW, year == "1995"), log(rprice))
sqrt(sum((Y-cbind(1,X)%*%coef(stage2))^2)/df)
## [1] 0.1903539
The other example on the help page has univariate , bivariate , and univariate adjustment variables , which we handle by pasting on to the right of both and :
fm <- ivreg(log(packs) ~ log(rprice) + log(rincome) | log(rincome) + tdiff + I(tax/cpi),
data = CigarettesSW, subset = year == "1995")
We get
X<- with(subset(CigarettesSW, year == "1995"), log(rprice))
Z<- with(subset(CigarettesSW, year == "1995"), cbind(tdiff,I(tax/cpi)))
A<-with(subset(CigarettesSW, year == "1995"), log(rincome))
stage1<-lm(cbind(X,A)~Z+A)
hatX<-fitted(stage1)
stage2<-lm(Y~hatX+A)
Compare the coefficients and unscaled covariance matrices:
coef(fm)
## (Intercept) log(rprice) log(rincome)
## 9.8949555 -1.2774241 0.2804048
coef(stage2)
## (Intercept) hatXX hatXA A
## 9.8949555 -1.2774241 0.2804048 NA
fm$cov.unscaled
## (Intercept) log(rprice) log(rincome)
## (Intercept) 31.7527079 -6.7990694 0.2898522
## log(rprice) -6.7990694 1.9629850 -0.9648723
## log(rincome) 0.2898522 -0.9648723 1.6127420
summary(stage2)$cov.unscaled
## (Intercept) hatXX hatXA
## (Intercept) 31.7527079 -6.7990694 0.2898522
## hatXX -6.7990694 1.9629850 -0.9648723
## hatXA 0.2898522 -0.9648723 1.6127420
and again we can’t use the residual variance from stage2
but we can fix it
fm$sigma
## [1] 0.187856
summary(stage2)$sigma
## [1] 0.2025322
df<-stage2$df.residual
theta<-na.omit(coef(stage2))
sqrt(sum( (Y-cbind(1,X,A)%*%theta)^2)/df)
## [1] 0.187856