# 13 S3

`library(sloop)`

## 13.1 Basics

**Q**: Describe the difference between`t.test()`

and`t.data.frame()`

? When is each function called?**A**: Because of S3’s`generic.class()`

naming scheme, both functions may initially look similar, while they are in fact unrelated.`t.test()`

is a*generic*function that performs a t-test.`t.data.frame()`

is a*method*that gets called by the generic`t()`

to transpose data frame input.

Due to R’s S3 dispatch rules,

`t.test()`

would also get called when`t()`

is a applied to an object of class “test”.**Q**: Make a list of commonly used base R functions that contain`.`

in their name but are not S3 methods.**A**: In the recent years “snake_case”-style has become increasingly common when naming functions (and variables) in R. But many functions in base R will continue to be “point.separated”, which is why some inconsistency in your R code most likely cannot be avoided.`# Some base R functions with point.separated names install.packages() read.csv() list.files() download.file() data.frame() as.character() Sys.Date() all.equal() do.call() on.exit()`

For some of these functions “tidyverse”-replacements may exist such as

`readr::read_csv()`

or`rlang::as_character()`

, which you could use at the cost of an extra dependency.

**Q**: What does the`as.data.frame.data.frame()`

method do? Why is it confusing? How could you avoid this confusion in your own code?**A**: The function`as.data.frame.data.frame()`

implements the data frame*method*for the`as.data.frame()`

*generic*, which coerces objects to data frames.The name is confusing, because it does not clearly communicate the type of the function, which could be a regular function, a generic or a method. Even if we assume a method, the amount of

`.`

’s makes it difficult to separate the generic- and the class-part of the name.We could avoid this confusion by applying a different naming convention (e.g. “snake_case”) for our class and function names.

**Q**: Describe the difference in behaviour in these two calls.`set.seed(1014) some_days <- as.Date("2017-01-31") + sample(10, 5) mean(some_days) #> [1] "2017-02-06" mean(unclass(some_days)) #> [1] 17203`

**A**:`mean()`

is a generic function, which will select the appropriate method based on the class of the input.`some_days`

has the class “Date” and`mean.Date(some_days)`

will be used.After

`unclass()`

has removed the class attribute the default method is chosen by the method dispatch. (`mean.default(unclass(some_days))`

) calculates the mean of the underlying double.**Q**: What class of object does the following code return? What base type is it built on? What attributes does it use?`x <- ecdf(rpois(100, 10)) x #> Empirical CDF #> Call: ecdf(rpois(100, 10)) #> x[1:18] = 2, 3, 4, ..., 2e+01, 2e+01`

**A**: This code returns an object of the class “ecdf” and contains an empirical cumulative distribution function of its input. The object is built on the base type “closure” and the expression, which was used to create it (`rpois(100, 10)`

) is stored in in the`call`

attribute.**Q**: What class of object does the following code return? What base type is it built on? What attributes does it use?`x <- table(rpois(100, 5)) x #> #> 1 2 3 4 5 6 7 8 9 10 #> 7 5 18 14 15 15 14 4 5 3`

**A**: This code returns a “table” object, which is build upon the base type “integer”. The attribute “dimnames” are used to name the elements of the integer vector.

## 13.2 Classes

**Q**: Write a constructor for`data.frame`

objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?**A**: Data frames are built on (named) lists with the additional requirement that all elements must have the same length. Their only attribute is “row.names”. These must be unique, have the same length as each list element and also must be of integer or character type.There are no additional restrictions to column names apart to those of lists, so one could use special characters and surround the names with backticks (which is not recommended).

A very good constructor implementing these criteria used to be part of the sloop package. It is no longer part of the package, but the source can still be found online (https://github.com/r-lib/sloop/blob/be7ce8a6be660536df4bdd3a31fa54f0d627f2d6/R/data.frame.R#L11).

`# Copied from older version of the sloop package new_data.frame <- function(x, row.names = NULL) { stopifnot(is.list(x)) n <- if (length(x) == 0) 0 else length(x[[1]]) lengths <- vapply(x, length, integer(1)) stopifnot(all(lengths == n)) if (is.null(row.names)) { row.names <- .set_row_names(n) } else { stopifnot( is.character(row.names) || is.numeric(row.names) ) stopifnot( length(row.names) == n || length(row.names) == 2 ) } structure( x, class = "data.frame", row.names = row.names ) } # Test x <- list(a = 1, b = 2) new_data.frame(x, row.names = "l1") #> a b #> l1 1 2 new_data.frame(x, row.names = 1) #> Error in attributes(.Data) <- c(attributes(.Data), attrib): row names must #> be 'character' or 'integer', not 'double'`

**Q**: Enhance my`factor()`

helper to have better behaviour when one or more`values`

is not found in`levels`

. What does`base::factor()`

do in this situation?**A**:`base::factor()`

converts these values (silently) into`NA`

’s. To improve our`factor()`

helper we choose to return an informative error message instead.`factor <- function(x, levels = unique(x)) { new_levels <- match(x, levels) # Return error if unseen levels are passed if(any(is.na(new_levels))){ stop("The following values do not occur ", "in the levels of x: ", paste(setdiff(x, levels), collapse = ", ") ".", call. = FALSE) } validate_factor(new_factor(new_levels, levels)) }`

**Q**: Carefully read the source code of`factor()`

. What does it do that our constructor does not?**A**: The original implementation allows a more flexible specification of input for`x`

. The input is coerced to character or replaced by`character(0)`

(in case of`NULL`

). It also ensures that the factor levels are unique. This is achieved by setting the levels via`base::levels<-`

, which fails when duplicate values are supplied.**Q**: Factors have an optional “contrasts” attribute. Read the help for`C()`

, and briefly describe the purpose of the attribute. What type should it have? Rewrite the`new_factor()`

constructor to include this attribute.**A**: When factor variables (representing nominal or ordinal information) are used in statistical models, they are typically encoded as dummy variables and by default each level is compared with the first factor level. However, many different encodings (“contrasts”) are possible: https://en.wikipedia.org/wiki/Contrast_(statistics)Within R’s formula interface you can wrap a factor in

`C`

and specify the contrast of your choice. Alternatively you can set the “contrast” attribute of you factor variable, which accepts matrix input. (see`?contr.helmert`

or similar for details)`# Updated factor constructor new_factor <- function( x = integer(), levels = character(), contrast = NULL ) { stopifnot(is.integer(x)) stopifnot(is.character(levels)) stopifnot(is.matrix(contrast) | is.null(contrast)) structure( x, levels = levels, class = "factor", contrast = contrast ) }`

**Q**: Read the documentation for`utils::as.roman()`

. How would you write a constructor for this class? Does it need a validator? What would a helper look like?**A**: This function transforms numeric input into Roman numbers (how cool is this!). This class is built on the “integer” type, which results in the following constructor.`new_roman <- function(x = integer()){ stopifnot(is.integer(x)) structure(x, class = "roman") }`

The documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function.

`validate_roman <- function(x) { values <- unclass(x) if(any(values < 1 | values > 3899)) { stop( "Roman numbers are only defined between ", "1 and 3899.", call. = FALSE ) } x }`

For convenience, we allow the user to also pass real values to a helper function.

`roman <- function(x = integer()) { x <- as.integer(x) validate_roman(new_roman(x)) } # Test roman(c(1, 753, 2019)) #> [1] I DCCLIII MMXIX roman(0) #> Error: Roman numbers are only defined between 1 and 3899.`

## 13.3 Generics and methods

**Q**: Read the source code for`t()`

and`t.test()`

and confirm that`t.test()`

is an S3 generic and not an S3 method. What happens if you create an object with class`test`

and call`t()`

with it? Why?`x <- structure(1:10, class = "test") t(x)`

**A**: We can see that`t.test()`

is a generic, because it calls`UseMethod()`

`t.test #> function (x, ...) #> UseMethod("t.test") #> <bytecode: 0x23aa9c0> #> <environment: namespace:stats> # or simply call sloop::ftype(t.test) #> [1] "S3" "generic"`

`sloop::ftype()`

confirms via a call to`sloop:::is_s3_generic`

(which then uses`codetools::findGlobals()`

) that`t.test()`

contains a call to`UseMethod()`

.Interestingly R also provides helpers, which list functions that look like methods, but in fact are not:

`tools::nonS3methods("stats") #> [1] "anova.lmlist" "expand.model.frame" "fitted.values" #> [4] "influence.measures" "lag.plot" "t.test" #> [7] "plot.spec.phase" "plot.spec.coherency"`

When we create an object with class

`test`

,`t()`

, will dispatch to`t.test()`

. This happens, because`UseMethod()`

simply searches for functions named`paste0("generic", ".", c(class(x), "default"))`

.Consequently

`t.test()`

is erroneously treated as a method of`t()`

. Because`t.test()`

is a generic itself and doesn’t find a method called`t.test.test()`

, it dispatches to`t.test.default()`

.By defining

`t.test.test()`

, we demonstrate, that this is really what is happening internally.`x <- structure(1:10, class = "test") t(x) #> #> One Sample t-test #> #> data: x #> t = 6, df = 9, p-value = 3e-04 #> alternative hypothesis: true mean is not equal to 0 #> 95 percent confidence interval: #> 3.33 7.67 #> sample estimates: #> mean of x #> 5.5 t.test.test <- function(x) t.default(x) t(x) #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] #> [1,] 1 2 3 4 5 6 7 8 9 10 #> attr(,"class") #> [1] "test"`

**Q**: What generics does the`table`

class have methods for?**A**: We find methods specific for the`table`

class, by searching for functions that end on “.table”.`library(methods) objs <- mget(ls("package:base"), inherits = TRUE) funs <- Filter(is.function, objs) Filter(function(x) grepl(".table$", x), names(funs)) #> [1] "[.table" "aperm.table" "as.data.frame.table" #> [4] "as.table" "is.table" "margin.table" #> [7] "print.summary.table" "print.table" "prop.table" #> [10] "summary.table"`

**Q**: What generics does the`ecdf`

class have methods for?**A**: We use the same approach as above. When this is not successful, we repeat using the superclass. Apparently the classes`ecdf`

and`stepfun`

exist, even though no specific methods are currently implemented (in base R).`class(ecdf(1:2)) #> [1] "ecdf" "stepfun" "function" Filter(function(x) grepl(".ecdf$", x), names(funs)) #> character(0) Filter(function(x) grepl(".stepfun$", x), names(funs)) #> character(0) Filter(function(x) grepl(".function$", x), names(funs)) #> [1] "as.function" "as.list.function" "is.function" #> [4] "print.function" "sys.function"`

**Q**: Which base generic has the greatest number of defined methods?**A**: The generic`print()`

clearly has the most defined methods.`generics <- Filter( function(x) "generic" %in% sloop::ftype(x), funs ) methods_per_generic <- sapply( names(generics), function(x) methods(x), USE.NAMES = TRUE ) tail(sort(lengths(methods_per_generic)), 3) #> [ format print #> 35 70 209`

**Q**: Carefully read the documentation for`UseMethod()`

and explain why the following code returns the results that it does. What two usual rules of function evaluation does`UseMethod()`

violate?`g <- function(x) { x <- 10 y <- 10 UseMethod("g") } g.default <- function(x) c(x = x, y = y) x <- 1 y <- 1 g(x) #> x y #> 1 10`

**A**: R looks for the`x`

argument in`g()`

’s calling environment (the global environment), in which`x`

is bound to 1.`g()`

then dispatches to`g.default()`

. The`x`

argument is passed to`g.default()`

. As`y`

is not defined with`g.default`

’s function environment,`y`

’s value will be taken from the environment where`UseMethod()`

created the call. There`y`

is defined as 10.When invoking

`g.default()`

explicitly, instead of using`UseMethod()`

, the default argument,`x`

, is evaluated in`g.default`

’s calling environment, where it is`10`

and further global variables like`y`

are looked up via lexical scoping in the enclosing (global) environment, where`y`

is`1`

.`g <- function(x) { x <- 10 y <- 10 g.default(x) } g(x) #> x y #> 10 1`

**Q**: What are the arguments to`[`

? Why is this a hard question to answer?**A**: The subsetting operator`[`

is a primitive and generic function as can be inspected via`ftype()`

.`ftype(`[`) #> [1] "primitive" "generic"`

Therefore,

`formals(`

[`)`

returns`NULL`

and one possible way to figure out`[`

’s arguments would be to inspect the underlying C source code, which can be found online via`pryr::show_c_source(.Primitive("["))`

. However, regarding the differing arguments of`[`

’s methods, it seems most probable, that`[`

’s arguemts are`x`

and`...`

.`names(formals(`[.Date`)) #> [1] "x" "..." "drop" names(formals(`[.table`)) #> [1] "x" "i" "j" "..." "drop" names(formals(`[.AsIs`)) #> [1] "x" "i" "..."`

## 13.4 Object styles

**Q**: Categorise the objects returned by`lm()`

,`factor()`

,`table()`

,`as.Date()`

,`ecdf()`

,`ordered()`

,`I()`

into the styles described above.**A**: The returned objects correspond to the following object styles:Vector:

`factor()`

,`table()`

,`as.Date()`

,`ordered()`

Record:

Scalar:`lm()`

,`ecdf()`

Other:`I()`

**Q**: What would a constructor function for`lm`

objects,`new_lm()`

, look like? Use`?lm`

and experimentation to figure out the required fields and their types.**A**: The constructor needs to populate the attributes of an`lm`

object and check their type for correctness.`# Learn about lm-attributes ?lm attributes(lm(cyl ~ ., data = mtcars)) #> $names #> [1] "coefficients" "residuals" "effects" "rank" #> [5] "fitted.values" "assign" "qr" "df.residual" #> [9] "xlevels" "call" "terms" "model" #> #> $class #> [1] "lm" # Define constructor new_lm <- function( coefficiets, residuals, effects, rank, fitted.values, assign, qr, df.residual, xlevels, call, terms, model ) { stopifnot( is.double(coefficients), is.double(residuals), is.double(effects), is.integer(rank), is.double(fitted.values), is.integer(assign), is.list(qr), is.integer(df.residual), is.list(xlevels), is.language(call), is.language(terms), is.list(model) ) structure( list( coefficients = coefficients, residuals = residuals, effects = effects, rank = rank, fitted.values = fitted.values, assign = assign, qr = qr, df.residual = df.residual, xlevels = xlevels, call = call, terms = terms, model = model ), class = "lm" ) }`

## 13.5 Inheritance

**Q**: How does`[.Date`

support subclasses? How does it fail to support subclasses?**A**:`# inspect function `[.Date` # see how it's used x <- Sys.Date() s3_dispatch(x[1]) # attempt to find out, what oldclass does oldClass()`

Maybe one would have to create a subclass to

`Date`

and see what`s3_dispatch`

returns, when it is called on this subclass. I suspect, the delegation to the internal`[`

to be related to the issue here.**Q**: R has two classes for representing date time data,`POSIXct`

and`POSIXlt`

, which both inherit from`POSIXt`

. Which generics have different behaviours for the two classes? Which generics share the same behaviour?**A**: To answer this question, we have to get the respective generics`# define helper get_generics <- function(x) { attr(methods(class = x), "info")[["generic"]] } # get generics generics_t <- get_generics("POSIXt") generics_ct <- get_generics("POSIXct") generics_lt <- get_generics("POSIXlt")`

The generics in

`generics_t`

with a method for the superclass POSIXt potentially share the same behaviour for both subclasses. However, if a generic has a specific method for one of the subclasses, it has to be subtracted:`# These generics provide subclass-specific methods union(generics_ct, generics_lt) #> [1] "[" "[[" "[<-" "as.data.frame" #> [5] "as.Date" "as.list" "as.POSIXlt" "c" #> [9] "coerce" "format" "initialize" "length<-" #> [13] "mean" "print" "rep" "show" #> [17] "slotsFromS3" "split" "summary" "Summary" #> [21] "weighted.mean" "xtfrm" "[[<-" "anyNA" #> [25] "as.double" "as.matrix" "as.POSIXct" "duplicated" #> [29] "is.na" "length" "names" "names<-" #> [33] "sort" "unique" # These generics share (inherited) methods for both subclasses setdiff(generics_t, union(generics_ct, generics_lt)) #> [1] "-" "+" "all.equal" "as.character" #> [5] "Axis" "cut" "diff" "hist" #> [9] "is.numeric" "julian" "Math" "months" #> [13] "Ops" "pretty" "quantile" "quarters" #> [17] "round" "seq" "str" "trunc" #> [21] "weekdays"`

**Q**: What do you expect this code to return? What does it actually return? Why?`generic2 <- function(x) UseMethod("generic2") generic2.a1 <- function(x) "a1" generic2.a2 <- function(x) "a2" generic2.b <- function(x) { class(x) <- "a1" NextMethod() } generic2(structure(list(), class = c("b", "a2")))`

**A**: When we execute the code above, this is what is happening:- we pass an object of classes
`b`

and`a2`

to`generic2()`

, which prompts R to look for a method`generic2.b()`

- the method
`generic2.b()`

then changes the class to`a1`

and calls`NextMethod()`

One would think that this will lead R to call

`generic2.a1()`

, but in fact, as mentioned in the textbook,`NextMethod()`

doesn’t actually work with the class attribute of the object, but instead uses a special global variable (.Class) to keep track of which method to call next.

This is why

`generic2.a2()`

is called instead.- we pass an object of classes

## 13.6 Dispatch details

**Q**: Explain the differences in dispatch below:`x1 <- 1:5 class(x1) #> [1] "integer" s3_dispatch(x1[1]) #> [.integer #> [.numeric #> [.default #> => [ (internal) x2 <- structure(x1, class = "integer") class(x2) #> [1] "integer" s3_dispatch(x2[1]) #> [.integer #> [.default #> => [ (internal)`

**A**:`class()`

returns`"integer"`

for`x1`

and`x2`

. However, they are not identical. While`x2`

has the attribute “class” with the value`"integer"`

,`x1`

doesn’t have a class attribute. Instead,`x1`

has the implicit class`"numeric"`

, see`?class`

.**Q**: What classes have a method for the`Math`

group generic in base R? Read the source code. How do the methods work?**A**: The following functions belong to this group (see ?`Math`

):`abs`

,`sign`

,`sqrt`

,`floor`

,`ceiling`

,`trunc`

,`round`

,`signif`

`exp`

,`log`

,`expm1`

,`log1p`

,`cos`

,`sin`

,`tan`

,`cospi`

,`sinpi`

,`tanpi`

,`acos`

,`asin`

,`atan`

,`cosh`

,`sinh`

,`tanh`

,`acosh`

,`asinh`

,`atanh`

`lgamma`

,`gamma`

,`digamma`

,`trigamma`

`cumsum`

,`cumprod`

,`cummax`

,`cummin`

The following classes have a method for this group generic:

`methods("Math") #> [1] Math,nonStructure-method Math,structure-method #> [3] Math.data.frame Math.Date #> [5] Math.difftime Math.factor #> [7] Math.POSIXt Math.quosure* #> see '?methods' for accessing help and source code`

To read the source code of the S3 classes, we can just enter the name of the method into the console. To get the source code of the S4 classes, we can use

`getMethod()`

, i. e.`getMethod("Math", "nonStructure")`

.To explain the basic idea, we just overwrite the data frame method:

`Math.data.frame <- function(x){"hello"}`

Now all functions from the math generic group, will return

`"hello"`

`abs(iris) #> [1] "hello" exp(iris) #> [1] "hello" lgamma(iris) #> [1] "hello"`

Of course different functions should perform different calculations. Here

`.Generic`

comes into play, which provides us with the calling generic as a string`Math.data.frame <- function(x, ...){ .Generic } abs(iris) #> [1] "abs" exp(iris) #> [1] "exp" lgamma(iris) #> [1] "lgamma" rm(Math.data.frame)`

The original source code of

`Math.data.frame()`

is a good example on how to invoke the string returned by`.Generic`

into a specific method.`Math.factor()`

is a good example of a method, which is simply defined for better error messages.**Q**:`Math.difftime()`

is more complicated than I described. Why?**A**:`Math.difftime()`

also excludes cases apart from`abs`

,`sign`

,`floor`

,`ceiling`

,`trunc`

,`round`

and`signif`

and needs to return a fitting error message.