The S3 method system only lets you dispatch methods on one argument of the generic. Most people use the first argument, and it’s not unheard of for people to claim that only the first argument is allowed. Actually, other arguments can be used! What’s more, if you write functions using the old-school formula/data structure, there’s a genuine reason to dispatch on the second argument.
Let’s look at the survey
package and the simplest estimation function of all, svytotal
suppressMessages(library(survey))
svytotal
## function (x, design, na.rm = FALSE, ...)
## {
## .svycheck(design)
## UseMethod("svytotal", design)
## }
## <bytecode: 0x128067f88>
## <environment: namespace:survey>
The function svytotal
is an S3 generic. The call to UseMethod
specifies the name of the generic ("svytotal"
) and the second argument to the function1
methods("svytotal")
## [1] svytotal.DBIsvydesign* svytotal.multiframe* svytotal.pps*
## [4] svytotal.survey.design* svytotal.survey.design2* svytotal.svyrep.design*
## [7] svytotal.twophase* svytotal.twophase2* svytotal.xdesign*
## see '?methods' for accessing help and source code
The methods for svydesign
are chosen based on the class of design
methods for types of data object. At the time of writing:
DBIsvydesign
: a database-backed objectmultiframe
: samples taken from overlapping population listspps
: samples with unequal-probability sampling without replacementsurvey.design
: the original data classsurvey.design2
: allowing for multi-stage cluster samplingsvyrep.design
: surveys with resampling instead of design meta-datatwophase
,twophase2
: subsampling from existing cohortsxdesign
: crossed clustering
There is much more interesting variation in the design
class than in the class of x
, so it makes sense to dispatch on the design
argument. I learned about this from S Programming by Venables and Ripley; not many other sources make it clear that dispatch on arguments other than the first is supported.
Here are a couple of the shortest methods
survey:::svytotal.DBIsvydesign
## function (x, design, na.rm = FALSE, ...)
## {
## design$variables <- getvars(x, design$db$connection, design$db$tablename,
## updates = design$updates, subset = design$subset)
## NextMethod("svytotal", design)
## }
## <bytecode: 0x12fc34430>
## <environment: namespace:survey>
survey:::svytotal.multiframe
## function (x, design, na.rm = FALSE, ...)
## {
## if (inherits(x, "formula"))
## x <- multiframe_getdata(x, design$designs)
## else x <- as.matrix(x)
## if (na.rm) {
## x[is.na(x)] <- 0
## design$weights[!complete.cases(x)] <- 0
## }
## total <- colSums(x * design$frame_weights * design$design_weights)
## V <- multiframevar(x * design$frame_weights * design$design_weights,
## design$dchecks)
## attr(total, "var") <- V
## class(total) <- "svystat"
## attr(total, "statistic") <- "total"
## total
## }
## <bytecode: 0x12fbd7d90>
## <environment: namespace:survey>
These look just like any S3 methods. There’s a technical distinction in that whichever argument you use for dispatch has to be evaluated in the generic and so has already been evaluated when you get to the method, but that will rarely matter
In fact you can go further: the second argument to UseMethod
is used only for its class and doesn’t have to actually be an argument to the generic. I’m not entirely convinced of the utility of this extension2
Another way of doing essentially the same thing would be to reverse the order of the arguments so that the data came first and the selection of variables was second. That’s what the tidyverse did – for example, to have dplyr
work with tibbles in similar ways to how dbplyr
works with database connections – but it did require changing the basic layout of function arguments. The approach in survey
was to keep the traditional layout but still dispatch on the data type.
It would be convenient if functions such as lm
and glm
dispatched on their data argument, but unfortunately the S versions already used the first argument and missed the opportunity. It’s not possible to have data
be a database or a time series or a longitudinal-data object or anything like that.
If you’re curious, you can see what the survey
package might have been like with the tidyverse argument ordering by looking at the srvyr
package. Since the tidyverse didn’t exist at the time, that’s not a realistic counterfactual – I don’t know what the survey
package would have looked like if dispatch on the second argument wasn’t possible. There might well have been big ugly switch
statements to give the effect of method dispatch.