# 13 S3

## 13.1 Basics

**Q**: Describe the difference between`t.test()`

and`t.data.frame()`

? When is each function called?**A**: Because of S3’s`generic.class()`

naming scheme, both functions may initially look similar, while they are in fact unrelated.`t.test()`

is a*generic*function that performs a t-test.`t.data.frame()`

is a*method*that gets called by the generic`t()`

to transpose data frame input.

Due to R’s S3 dispatch rules,

`t.test()`

would also get called when`t()`

is a applied to an object of class “test”.**Q**: Make a list of commonly used base R functions that contain`.`

in their name but are not S3 methods.**A**: In the recent years “snake_case”-style has become increasingly common when naming functions (and variables) in R. But many functions in base R will continue to be “point.separated”, which is why some inconsistency in your R code most likely cannot be avoided.`# Some base R functions with point.separated names install.packages() read.csv() list.files() download.file() data.frame() as.character() Sys.Date() all.equal() do.call() on.exit()`

For some of these functions “tidyverse”-replacements may exist such as

`readr::read_csv()`

or`rlang::as_character()`

, which you could use at the cost of an extra dependency.

**Q**: What does the`as.data.frame.data.frame()`

method do? Why is it confusing? How could you avoid this confusion in your own code?**A**: The function`as.data.frame.data.frame()`

implements the`data.frame`

*method*for the`as.data.frame()`

*generic*, which coerces objects to data frames.The name is confusing, because it does not clearly communicate the type of the function, which could be a regular function, a generic or a method. Even if we assume a method, the amount of

`.`

’s makes it difficult to separate the generic- and the class-part of the name" Is it the`data.frame.data.frame`

method for the`as`

generic? Is it the`frame.data.frame`

method for the`as.data`

generic?We could avoid this confusion by applying a different naming convention (e.g. “snake_case”) for our class and function names.

**Q**: Describe the difference in behaviour in these two calls.`set.seed(1014) some_days <- as.Date("2017-01-31") + sample(10, 5) mean(some_days) #> [1] "2017-02-06" mean(unclass(some_days)) #> [1] 17203`

**A**:`mean()`

is a generic function, which will select the appropriate method based on the class of the input.`some_days`

has the class “Date” and`mean.Date(some_days)`

will be used.After

`unclass()`

has removed the class attribute the default method is chosen by the method dispatch. (`mean.default(unclass(some_days))`

) calculates the mean of the underlying double.**Q**: What class of object does the following code return? What base type is it built on? What attributes does it use?`x <- ecdf(rpois(100, 10)) x #> Empirical CDF #> Call: ecdf(rpois(100, 10)) #> x[1:18] = 2, 3, 4, ..., 2e+01, 2e+01`

**A**: This code returns an object of the class “ecdf” and contains an empirical cumulative distribution function of its input. The object is built on the base type “closure” (a function) and the expression, which was used to create it (`rpois(100, 10)`

) is stored in in the`call`

attribute.**Q**: What class of object does the following code return? What base type is it built on? What attributes does it use?**A**: This code returns a “table” object, which is build upon the “integer” type. The attribute “dimnames” are used to name the elements of the integer vector.

## 13.2 Classes

**Q**: Write a constructor for`data.frame`

objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?**A**: Data frames are built on a named lists of vectors, where every element is the same length. Their only attribute is “row.names” which must be a character vector the same length as the other elements. We need to provide the number of rows as an input to make it possible to create data frames with 0 columns but multiple rows.This leads to the following constructor:

`new_data.frame <- function(x, n, row.names = NULL) { stopifnot(is.list(x)) # Check all inputs are the same length stopifnot(all(lengths(x) == n)) if (is.null(row.names)) { # Use special row names helper row.names <- .set_row_names(n) } else { # Otherwise check that they're a character vector with the # correct length stopifnot(is.character(row.names), length(row.names) == n) } structure( x, class = "data.frame", row.names = row.names ) } # Test x <- list(a = 1, b = 2) new_data.frame(x, n = 1) #> a b #> 1 1 2 new_data.frame(x, n = 1, row.names = "l1") #> a b #> l1 1 2 # Create a data frame with 0 columns and 2 rows new_data.frame(list(), n = 2) #> data frame with 0 columns and 2 rows`

There are two additional restrictions we could implement if we were being very strict: both the row names and column names should be unique.

**Q**: Enhance my`factor()`

helper to have better behaviour when one or more`values`

is not found in`levels`

. What does`base::factor()`

do in this situation?**A**:`base::factor()`

converts these values (silently) into`NA`

’s. To improve our`factor()`

helper we choose to return an informative error message instead.`factor2 <- function(x, levels = unique(x)) { new_levels <- match(x, levels) # Error if levels don't include all values missing <- unique(setdiff(x, levels)) if (length(missing) > 0) { stop( "The following values do not occur in the levels of x: ", paste0("'", missing, "'", collapse = ", "), ".", call. = FALSE ) } validate_factor(new_factor(new_levels, levels)) } factor2(c("a", "b", "c"), levels = c("a", "b")) #> Error: The following values do not occur in the levels of x: 'c'.`

**Q**: Carefully read the source code of`factor()`

. What does it do that our constructor does not?**A**: The original implementation allows a more flexible specification of input for`x`

. The input is coerced to character or replaced by`character(0)`

(in case of`NULL`

). It also ensures that the factor levels are unique. This is achieved by setting the levels via`base::levels<-`

, which fails when duplicate values are supplied.**Q**: Factors have an optional “contrasts” attribute. Read the help for`C()`

, and briefly describe the purpose of the attribute. What type should it have? Rewrite the`new_factor()`

constructor to include this attribute.**A**: When factor variables (representing nominal or ordinal information) are used in statistical models, they are typically encoded as dummy variables and by default each level is compared with the first factor level. However, many different encodings (“contrasts”) are possible: https://en.wikipedia.org/wiki/Contrast_(statistics)Within R’s formula interface you can wrap a factor in

`C`

and specify the contrast of your choice. Alternatively you can set the “contrast” attribute of you factor variable, which accepts matrix input. (see`?contr.helmert`

or similar for details)`# Updated factor constructor new_factor <- function( x = integer(), levels = character(), contrast = NULL ) { stopifnot(is.integer(x)) stopifnot(is.character(levels)) if (!is.null(constrast)) { # if supplied should be a numeric matrix stopifnot(is.matrix(contrast) && is.numeric(contrast)) } structure( x, levels = levels, class = "factor", contrast = contrast ) }`

**Q**: Read the documentation for`utils::as.roman()`

. How would you write a constructor for this class? Does it need a validator? What would a helper look like?**A**: This function transforms numeric input into Roman numbers (how cool is this!). This class is built on the “integer” type, which results in the following constructor.The documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function.

`validate_roman <- function(x) { values <- unclass(x) if (any(values < 1 | values > 3899)) { stop( "Roman numbers must fall between 1 and 3899.", call. = FALSE ) } x }`

For convenience, we allow the user to also pass real values to a helper function.

## 13.3 Generics and methods

**Q**: Read the source code for`t()`

and`t.test()`

and confirm that`t.test()`

is an S3 generic and not an S3 method. What happens if you create an object with class`test`

and call`t()`

with it? Why?**A**: We can see that`t.test()`

is a generic, because it calls`UseMethod()`

`t.test #> function (x, ...) #> UseMethod("t.test") #> <bytecode: 0xa60ab0> #> <environment: namespace:stats> # or simply call sloop::ftype(t.test) #> [1] "S3" "generic"`

Interestingly R also provides helpers, which list functions that look like methods, but in fact are not:

`tools::nonS3methods("stats") #> [1] "anova.lmlist" "expand.model.frame" "fitted.values" #> [4] "influence.measures" "lag.plot" "t.test" #> [7] "plot.spec.phase" "plot.spec.coherency"`

When we create an object with class

`test`

,`t()`

, will dispatch to`t.test()`

. This happens, because`UseMethod()`

simply searches for functions named`paste0("generic", ".", c(class(x), "default"))`

.Consequently

`t.test()`

is erroneously treated as a method of`t()`

. Because`t.test()`

is a generic itself and doesn’t find a method called`t.test.test()`

, it dispatches to`t.test.default()`

. We can define`t.test.test()`

to demonstrate that this is really what is happening internally.`x <- structure(1:10, class = "test") t(x) #> #> One Sample t-test #> #> data: x #> t = 6, df = 9, p-value = 3e-04 #> alternative hypothesis: true mean is not equal to 0 #> 95 percent confidence interval: #> 3.33 7.67 #> sample estimates: #> mean of x #> 5.5 t.test.test <- function(x) "Hi!" t(x) #> [1] "Hi!"`

**Q**: What generics does the`table`

class have methods for?**A**: This is a simple application of`sloop::s3_methods_class()`

:`s3_methods_class("table") #> # A tibble: 11 x 4 #> generic class visible source #> <chr> <chr> <lgl> <chr> #> 1 [ table TRUE base #> 2 aperm table TRUE base #> 3 as.data.frame table TRUE base #> 4 Axis table FALSE registered S3method #> 5 head table FALSE registered S3method #> 6 lines table FALSE registered S3method #> 7 plot table FALSE registered S3method #> 8 points table FALSE registered S3method #> 9 print table TRUE base #> 10 summary table TRUE base #> 11 tail table FALSE registered S3method`

Interestingly, the

`table`

class has a number of methods designed to help plotting with base graphics.**Q**: What generics does the`ecdf`

class have methods for?**A**: We use the same approach as above:`s3_methods_class("ecdf") #> # A tibble: 4 x 4 #> generic class visible source #> <chr> <chr> <lgl> <chr> #> 1 plot ecdf TRUE stats #> 2 print ecdf FALSE registered S3method #> 3 quantile ecdf FALSE registered S3method #> 4 summary ecdf FALSE registered S3method`

The methods are primarily designed for display (

`plot()`

,`print()`

,`summary()`

), but you can also extract quantiles with`quantile()`

.**Q**: Which base generic has the greatest number of defined methods?**A**: A little experimentation (and thinking about the most popular functions) suggests that the`print()`

generic has the most defined methods.**Q**: Carefully read the documentation for`UseMethod()`

and explain why the following code returns the results that it does. What two usual rules of function evaluation does`UseMethod()`

violate?`g <- function(x) { x <- 10 y <- 10 UseMethod("g") } g.default <- function(x) c(x = x, y = y) x <- 1 y <- 1 g(x)`

**A**: Let’s take this step by step. If you call`g.default()`

directly you get`c(1, 1)`

as you might expect. The value bound to`x`

comes from the argument, the value from`y`

comes from the global environment.But when we call

`g()`

we get`c(1, 10)`

:This is seemingly inconsistent: why does

`x`

come from the value defined inside of`g()`

, and`y`

still come from the global environment? It’s because`UseMethod()`

calls`g.default()`

in a special way so that variables defined inside the generic are available to methods. The exception is argument to the function: they are passed on as is, and cannot be affect by code inside the generic.**Q**: What are the arguments to`[`

? Why is this a hard question to answer?**A**: The subsetting operator`[`

is a primitive and generic function as can be inspected via`ftype()`

.Therefore,

`formals(`

[`)`

returns`NULL`

and one possible way to figure out`[`

’s arguments would be to inspect the underlying C source code, which can be found online via`pryr::show_c_source(.Primitive("["))`

. However, regarding the differing arguments of`[`

’s methods, it seems most probable, that`[`

’s arguemts are`x`

and`...`

.

## 13.4 Object styles

**Q**: Categorise the objects returned by`lm()`

,`factor()`

,`table()`

,`as.Date()`

,`ecdf()`

,`ordered()`

,`I()`

into the styles described above.**A**: The returned objects correspond to the following object styles:- Vector:
`factor()`

,`table()`

,`as.Date()`

,`ordered()`

- Record:

- Scalar:
`lm()`

,`ecdf()`

- Other:
`I()`

- Vector:
**Q**: What would a constructor function for`lm`

objects,`new_lm()`

, look like? Use`?lm`

and experimentation to figure out the required fields and their types.**A**: The constructor needs to populate the attributes of an`lm`

object and check their type for correctness.`# Learn about lm-attributes ?lm attributes(lm(cyl ~ ., data = mtcars)) #> $names #> [1] "coefficients" "residuals" "effects" "rank" #> [5] "fitted.values" "assign" "qr" "df.residual" #> [9] "xlevels" "call" "terms" "model" #> #> $class #> [1] "lm" # Define constructor new_lm <- function( coefficients, residuals, effects, rank, fitted.values, assign, qr, df.residual, xlevels, call, terms, model ) { stopifnot( is.double(coefficients), is.double(residuals), is.double(effects), is.integer(rank), is.double(fitted.values), is.integer(assign), is.list(qr), is.integer(df.residual), is.list(xlevels), is.language(call), is.language(terms), is.list(model) ) structure( list( coefficients = coefficients, residuals = residuals, effects = effects, rank = rank, fitted.values = fitted.values, assign = assign, qr = qr, df.residual = df.residual, xlevels = xlevels, call = call, terms = terms, model = model ), class = "lm" ) }`

## 13.5 Inheritance

**Q**: How does`[.Date`

support subclasses? How does it fail to support subclasses?**A**:`[.Date`

calls`.Date`

with the result of calling`[`

on the parent class, along with`oldClass()`

:`# inspect function `[.Date` #> function (x, ..., drop = TRUE) #> { #> .Date(NextMethod("["), oldClass(x)) #> } #> <bytecode: 0x4a3a400> #> <environment: namespace:base>`

`.Date`

is kind of like a constructor for date classes, although it doesn’t check the input is the correct type:

So what does `oldClass()`

do? It’s implemented in C so we can’t easily see what it does, and the documentation refers to S-PLUS:

Functions oldClass and oldClass<- behave in the same way as functions of those names in S-PLUS 5/6, but in R UseMethod dispatches on the class as returned by class (with some interpolated classes: see the link) rather than oldClass. However, group generics dispatch on the oldClass for efficiency, and internal generics only dispatch on objects for which is.object is true.

Instead lets just try it out:

```
```r
oldClass(Sys.Date())
#> [1] "Date"
oldClass(numeric())
#> NULL
oldClass(data.frame())
#> [1] "data.frame"
oldClass(integer())
#> NULL
```
It seems similar to `class()`, but it returns `NULL` for base types. Together this means that `[.Date` effectively calls `mean()` on the underlying numeric data, then resets the class of the result to the input. This ignores the fact that a subclass might have additional attributes.
```

**Q**: R has two classes for representing date time data,`POSIXct`

and`POSIXlt`

, which both inherit from`POSIXt`

. Which generics have different behaviours for the two classes? Which generics share the same behaviour?**A**: To answer this question, we have to get the respective generics`generics_t <- s3_methods_class("POSIXt")$generic generics_ct <- s3_methods_class("POSIXct")$generic generics_lt <- s3_methods_class("POSIXlt")$generic`

The generics in

`generics_t`

with a method for the superclass POSIXt potentially share the same behaviour for both subclasses. However, if a generic has a specific method for one of the subclasses, it has to be subtracted:`# These generics provide subclass-specific methods union(generics_ct, generics_lt) #> [1] "[" "[[" "[<-" "as.data.frame" #> [5] "as.Date" "as.list" "as.POSIXlt" "c" #> [9] "format" "length<-" "mean" "print" #> [13] "rep" "split" "summary" "Summary" #> [17] "weighted.mean" "xtfrm" "[[<-" "anyNA" #> [21] "as.double" "as.matrix" "as.POSIXct" "duplicated" #> [25] "is.na" "length" "names" "names<-" #> [29] "sort" "unique" # These generics share (inherited) methods for both subclasses setdiff(generics_t, union(generics_ct, generics_lt)) #> [1] "-" "+" "all.equal" "as.character" #> [5] "Axis" "cut" "diff" "hist" #> [9] "is.numeric" "julian" "Math" "months" #> [13] "Ops" "pretty" "quantile" "quarters" #> [17] "round" "seq" "str" "trunc" #> [21] "weekdays"`

**Q**: What do you expect this code to return? What does it actually return? Why?`generic2 <- function(x) UseMethod("generic2") generic2.a1 <- function(x) "a1" generic2.a2 <- function(x) "a2" generic2.b <- function(x) { class(x) <- "a1" NextMethod() } generic2(structure(list(), class = c("b", "a2")))`

**A**: When we execute the code above, this is what is happening:- we pass an object of classes
`b`

and`a2`

to`generic2()`

, which prompts R to look for a method`generic2.b()`

- the method
`generic2.b()`

then changes the class to`a1`

and calls`NextMethod()`

One would think that this will lead R to call

`generic2.a1()`

, but in fact, as mentioned in the textbook,`NextMethod()`

doesn’t actually work with the class attribute of the object, but instead uses a special global variable (.Class) to keep track of which method to call next.

This is why

`generic2.a2()`

is called instead.- we pass an object of classes

## 13.6 Dispatch details

**Q**: Explain the differences in dispatch below:`x1 <- 1:5 class(x1) #> [1] "integer" s3_dispatch(x1[1]) #> [.integer #> [.numeric #> [.default #> => [ (internal) x2 <- structure(x1, class = "integer") class(x2) #> [1] "integer" s3_dispatch(x2[1]) #> [.integer #> [.default #> => [ (internal)`

**A**:`class()`

returns`"integer"`

for`x1`

and`x2`

, but the class of`x1`

is implicit, while the class of`x2`

is explicit. This is important because`[`

is an internal generic, so when the class is explicitly set, the “implicit” parent class`numeric`

is not considered.**Q**: What classes have a method for the`Math`

group generic in base R? Read the source code. How do the methods work?**A**: The following functions belong to this group (see ?`Math`

):`abs`

,`sign`

,`sqrt`

,`floor`

,`ceiling`

,`trunc`

,`round`

,`signif`

`exp`

,`log`

,`expm1`

,`log1p`

,`cos`

,`sin`

,`tan`

,`cospi`

,`sinpi`

,`tanpi`

,`acos`

,`asin`

,`atan`

,`cosh`

,`sinh`

,`tanh`

,`acosh`

,`asinh`

,`atanh`

`lgamma`

,`gamma`

,`digamma`

,`trigamma`

`cumsum`

,`cumprod`

,`cummax`

,`cummin`

The following classes have a method for this group generic:

`s3_methods_generic("Math") #> # A tibble: 8 x 4 #> generic class visible source #> <chr> <chr> <lgl> <chr> #> 1 Math data.frame TRUE base #> 2 Math Date TRUE base #> 3 Math difftime TRUE base #> 4 Math factor TRUE base #> 5 Math POSIXt TRUE base #> 6 Math quosure FALSE registered S3method #> 7 Math vctrs_sclr FALSE registered S3method #> 8 Math vctrs_vctr FALSE registered S3method`

To explain the basic idea, we just overwrite the data frame method:

Now all functions from the math generic group, will return

`"hello"`

Of course different functions should perform different calculations. Here

`.Generic`

comes into play, which provides us with the calling generic as a string`Math.data.frame <- function(x, ...){ .Generic } abs(iris) #> [1] "abs" exp(iris) #> [1] "exp" lgamma(iris) #> [1] "lgamma" rm(Math.data.frame)`

The original source code of

`Math.data.frame()`

is a good example on how to invoke the string returned by`.Generic`

into a specific method.`Math.factor()`

is a good example of a method, which is simply defined for better error messages.**Q**:`Math.difftime()`

is more complicated than I described. Why?**A**:`Math.difftime()`

also excludes cases apart from`abs`

,`sign`

,`floor`

,`ceiling`

,`trunc`

,`round`

and`signif`

and needs to return a fitting error message.