1 min read

Included-variable bias

If you have two regression models E[Y|X]=β0+β1X and E[Y|X,Z]=γ0+γ1X+γ2Z then typically γ1β1, because they are different things1

A common name for this phenomenon is omitted-variable bias. That’s an unfortunate name, because it implies a direction in a situation that’s completely symmetric. Yes, β^1 is biased for γ1, but γ^1 is equally biased for β1.

The idea that γ1 is somehow natural and β1 is wrong comes from the gold-standard2 way of thinking about regression model choice: that there is a true model defined by having all its coefficients non-zero, and that your job is to find it. From this point of view, either γ2=0, so β1 is preferred but β1=γ1, or γ20, so γ1 is preferred.

If you want β1 then γ^1 has included-variable bias. If you want γ1 then β^1 has omitted-variable bias. Or you can stop trying to think of the β and γ as being estimates of the same things and just talk about which one you actually want to estimate.


  1. Also γ2β1, but that doesn’t tend to cause as much confusion.↩︎

  2. ie, old and wrong↩︎