Attention Conservation Notice: Long. Really long. No, longer than that. Here: read the original instead.
The Net Reclassification Index (NRI) is a summary of improvement in prediction when new information is added, and an intuitively plausible one. Suppose that we’re trying to predict vs , and that for person we have an old predicted probability and a new predicted probability . We’d hope that the probabilities for cases () go up and the probabilities for controls () go down when more information is used.
Suppose the test set has cases and controls. The NRI is defined by
The definition avoids evaluating tradeoffs about how much the probabilities go up, and is standardised to be (at least apparently) comparable between data sets with different case:control ratios.
As any schoolchild knows, evaluating the NRI on the same data used to estimate the probabilities will make it biased upwards: you’re basically asking “Do the probabilities in these data change in the same ways as the probabilities in the data where the estimation was done?” This isn’t hard. They do.
Basically everyone except Margaret Pepe assumed that using an independent test dataset would make this bias go away, as it does for other measures of predictiveness, good or bad. That’s not what happens. [The paper by Pepe and co-workers is behind a paywall, but their working paper is available.] After hearing talks about the bias I still didn’t understand why it happened. This post is an attempt to explain. My conclusion for what’s actually going on is a bit different from theirs, but the implications are similar.
First, looking at a silly example shows that NRI can behave badly. Suppose is also binary and is predictive of , and that The prediction rule divides people into ‘high risk’ and ‘low risk’. Now define to be larger than for ‘high-risk’ people and to be smaller than for ‘low-risk’ people. You can do this any way you like.
Since high-risk people are more likely to be cases than low-risk people, a greater proportion of cases than controls will have their probabilities go up. Conversely, a greater proportion of controls than cases will have their probabilities go down. The NRI will be positive, even though the old prediction rule is the best possible one based on and the new rule is strictly worse.
Since this is a silly example, it doesn’t necessarily mean there is a problem with NRI, but it isn’t encouraging. Under the same definitions of and , if we defined
with having zero mean, independent of and , NRI would be zero. That’s still not ideal, since the predictions are worse rather than the same, but it’s certainly better than NRI being positive.
Can we get NRI to be positive (on average) without doing something silly? Yes, in fact. Pepe and co-workers looked at a very simple continuous case, where Normal predicts (binary) , and (Normal) is independent of and . If is based on logistic regression with , and on logistic regression with and , their simulations showed the NRI will be positive (on average) even though the predictions are slightly worse using . I’ve put an example up as a GitHub gist.
The simulation shows that NRI is weird, but it still doesn’t explain why. When confusing things happen with logistic regression, a useful trick is to try the same problem with linear regression. Either the same confusing things will happen, but will be easier to analyse, or they won’t happen, meaning that the non-linearity is important.
In a linear version of the simple problem with Normal predictors, the NRI averages very close to zero. That’s still probably not right – it should be negative – but it is different from logistic regression. Non-linearity is important.
Because logistic regression is an exponential-family model, the maximum likelihood estimators are moment estimators. We have both overall and conditional on . Since has a symmetric distribution, will have an asymmetric, skewed distribution. Specifically, it will be positively skewed when is small, symmetric when , and negatively skewed when is large; that’s the only way to force it into .
A positively-skewed, mean-zero distribution (typically) has a negative median, and a negatively-skewed mean-zero distribution (typically) has a positive median; the ‘typical’ behaviour holds for these logit-Normal distributions. The change in will be positively skewed and have negative median when is small; it will be negatively skewed and have positive median when is large. Since is predictive, is larger for cases than for controls, so is greater than 1/2 for cases and less than 1/2 for controls and the NRI will be positive on average.
The silly example shows that NRI can behave very badly for arbitrary prediction rules. For rules that are well calibrated in the sense of means, the non-linearity of the data-to-probability transformation and the use of ordering rather than differences in NRI makes it tend positive when useless variables are added. Even with a linear model, though, the NRI doesn’t pick up the degradation of performance from irrelevant variables.
[tl;dr: NRI? Just say “No, thank you.”]