3 min read

Blank-cheque inheritance and statistical methods objects

One of the problems with object-oriented programming for statistical methods is that inheritance is backwards. Everything is fine for data structures, and Bioconductor has many examples of broad (often abstract) base classes for biological information or annotation that are specialised by inheritance to more detailed and useful classes. Statistical methods go the other way

In base R, glm for generalised linear models is a generalisation of lm for linear models, but the glm class inherits from the lm class, not the reverse. One example of the problem is plot.lm, which produces plots designed for linear regression. These plots are inherited by glm objects, but they don’t all make sense – notably, the Normal quantile-quantile plot of residuals isn’t much use.

Since lm and glm were designed at more or less the same time (they appear in the White Book), they could have been designed to inherit the other way around, but that’s not a sustainable strategy. Future packages can’t jack up the object hierarchy and stick a new ground floor underneath the previous one. Alternatively, one might have argued that the relationship between lm and glm isn’t inheritance: a glm object has an associated lm object from the IWLS algorithm for Fisher scoring, but we shouldn’t generally expect methods designed for lm objects to make sense for that associated lm. For example, we might want to derive diagnostics for glm from those for lm, but we still need to think about whether deviance residuals or Pearson residuals or working residuals are the appropriate way to generalise.

Combining inheritance-by-generalisation with dynamic loading of packages makes things worse. Alice can write package A that defines an object, Boris can write package B that defines a method for it, and Clarice can write package C with a generalisation inheriting from the object. The generalisation in Package C will inherit the method in Package B, unless someone deliberately overrides the inheritance – but there may be no-one in a position to do so. Clarice and Boris need not know of each other’s existence; there isn’t any central register of object hierarchies even limited to CRAN packages.

I don’t have a solution to this. Perhaps someone curious in category theory will invent a useful notion of co-inheritance for backwards hierarchies of generalisations. Perhaps it’s already in Haskell. But I do think it’s useful to recognise there’s a conceptual problem to work around.

This post was brought to you by lme4::simulate, which (as far as I can tell) doesn’t work correctly on pedigreemm::pedigreemm objects. There’s no reason it should, but there’s also no way for it to tell that it doesn’t, and it’s not completely obvious.

I have heard the merMods calling, each to each
I do not think that they will call for me