11 S3


To interact with S3 objects, we will mainly use the sloop package.20

11.1 Basics

Q1: Describe the difference between t.test() and t.data.frame()? When is each function called?

A: Because of S3’s generic.class() naming scheme, both functions may initially look similar, while they are in fact unrelated.

  • t.test() is a generic function that performs a t-test.
  • t.data.frame() is a method that gets called by the generic t() to transpose data frame input.

Due to R’s S3 dispatch rules, t.test() would also get called when t() is applied to an object of class test.

Q2: Make a list of commonly used base R functions that contain . in their name but are not S3 methods.

A: In recent years “snake_case”-style has become increasingly common when naming functions and variables in R. But many functions in base R will continue to be “point.separated,” which is why some inconsistency in your R code most likely cannot be avoided.21

# Some base R functions with point.separated names





Q3: What does the as.data.frame.data.frame() method do? Why is it confusing? How could you avoid this confusion in your own code?

A: The function as.data.frame.data.frame() implements the data.frame() method for the as.data.frame() generic, which coerces objects to data frames.

The name is confusing, because it does not clearly communicate the type of the function, which could be a regular function, a generic or a method. Even if we assume a method, the amount of .’s makes it difficult to separate the generic- and the class-part of the name. Is it the data.frame.data.frame() method for the as() generic? Is it the frame.data.frame() method for the as.data() generic?

We could avoid this confusion by applying a different naming convention (e.g. “snake_case”) for our class and function names.

Q4: Describe the difference in behaviour in these two calls.

some_days <- as.Date("2017-01-31") + sample(10, 5)

#> [1] "2017-02-06"
#> [1] 17203

A: mean() is a generic function, which will select the appropriate method based on the class of the input. some_days has the class Date and mean.Date(some_days) will be used to calculate the mean date of some_days.

After unclass() has removed the class attribute from some_date, the default method is chosen. mean.default(unclass(some_days)) then calculates the mean of the underlying double.

Q5: What class of object does the following code return? What base type is it built on? What attributes does it use?

x <- ecdf(rpois(100, 10))
#> Empirical CDF 
#> Call: ecdf(rpois(100, 10))
#>  x[1:18] =  2,  3,  4,  ..., 2e+01, 2e+01

A: It returns an object of the class ecdf (empirical cumulative distribution function) with the superclasses stepfun and function. The ecdf object is built on the base type closure (a function). The expression, which was used to create it (rpois(100, 10)), is stored in in the call attribute.

#> [1] "closure"

#> $class
#> [1] "ecdf"     "stepfun"  "function"
#> $call
#> ecdf(rpois(100, 10))

Q6: What class of object does the following code return? What base type is it built on? What attributes does it use?

x <- table(rpois(100, 5))
#>  1  2  3  4  5  6  7  8  9 10 
#>  7  5 18 14 15 15 14  4  5  3

A: This code returns a table object, which is built upon the integer type. The attribute dimnames is used to name the elements of the integer vector.

#> [1] "integer"

#> $dim
#> [1] 10
#> $dimnames
#> $dimnames[[1]]
#>  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
#> $class
#> [1] "table"

11.2 Classes

Q1: Write a constructor for data.frame objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?

A: Data frames are built on named lists of vectors, which all have the same length. Besides the class and the column names (names), the row.names are their only further attribute. This must be a character vector with the same length as the other vectors.

We need to provide the number of rows as an input to make it possible to create data frames with 0 columns but multiple rows.

This leads to the following constructor:

new_data.frame <- function(x, n, row.names = NULL) {
  # Check if the underlying object is a list
  # Check all inputs are the same length
  # (This check also allows that x has length 0)
  stopifnot(all(lengths(x) == n))
  if (is.null(row.names)) {
    # Use special row names helper from base R
    row.names <- .set_row_names(n)
  } else {
    # Otherwise check that they're a character vector with the 
    # correct length
    stopifnot(is.character(row.names), length(row.names) == n)
    class = "data.frame",
    row.names = row.names

# Test
x <- list(a = 1, b = 2)
new_data.frame(x, n = 1)
#>   a b
#> 1 1 2
new_data.frame(x, n = 1, row.names = "l1")
#>    a b
#> l1 1 2

# Create a data frame with 0 columns and 2 rows
new_data.frame(list(), n = 2)
#> data frame with 0 columns and 2 rows

There are two additional restrictions we could implement if we were being very strict: both the row names and column names should be unique.

Q2: Enhance my factor() helper to have better behaviour when one or more values is not found in levels. What does base::factor() do in this situation?

A: base::factor() converts these values (silently) into NAs:

factor(c("a", "b", "c"), levels = c("a", "b"))
#> [1] a    b    <NA>
#> Levels: a b

The factor() helper including the constructor (new_factor()) and its validator (validate_factor()) were given in Advanced R. However, as the goal of this question is to throw an early error within the helper, we only repeat the code for the helper:

# Simplified version of the factor() helper, as defined in Advanced R
factor <- function(x = character(), levels = unique(x)) {
  ind <- match(x, levels)
  validate_factor(new_factor(ind, levels))

To improve the factor() helper we choose to return an informative error message instead.

factor2 <- function(x, levels = unique(x)) {
  new_levels <- match(x, levels)
  # Error if levels don't include all values
  missing <- unique(setdiff(x, levels))
  if (length(missing) > 0) {
      "The following values do not occur in the levels of x: ",
      paste0("'", missing, "'", collapse = ", "), ".", 
      call. = FALSE
  validate_factor(new_factor(new_levels, levels))

# Test
factor2(c("a", "b", "c"), levels = c("a", "b"))
#> Error: The following values do not occur in the levels of x: 'c'.

Q3: Carefully read the source code of factor(). What does it do that our constructor does not?

A: The original implementation (base::factor()) allows more flexible input for x. It coerces x to character or replaces it with character(0) (in case of NULL). It also ensures that the levels are unique. This is achieved by setting them via base::levels<-, which fails when duplicate values are supplied.

Q4: Factors have an optional “contrasts” attribute. Read the help for C(), and briefly describe the purpose of the attribute. What type should it have? Rewrite the new_factor() constructor to include this attribute.

A: When factor variables (representing nominal or ordinal information) are used in statistical models, they are typically encoded as dummy variables and by default each level is compared with the first factor level. However, many different encodings (“contrasts”) are possible, see https://en.wikipedia.org/wiki/Contrast_(statistics).

Within R’s formula interface you can wrap a factor in stats::C() and specify the contrast of your choice. Alternatively, you can set the contrasts attribute of your factor variable, which accepts matrix input. (See ?contr.helmert or similar for details).

The new_factor() constructor was given in Advanced R as:

# new_factor() constructor from Advanced R
new_factor <- function(x = integer(), levels = character()) {

    levels = levels,
    class = "factor"

Our updated new_factor() constructor gets a contrasts argument, which accepts a numeric matrix or NULL (default).

# Updated new_factor() constructor
new_factor <- function(
  x = integer(),
  levels = character(),
  contrasts = NULL
) {
  if (!is.null(constrasts)) {
    stopifnot(is.matrix(contrasts) && is.numeric(contrasts))
    levels = levels,
    class = "factor",
    contrasts = contrasts

Q5: Read the documentation for utils::as.roman(). How would you write a constructor for this class? Does it need a validator? What might a helper do?

A: This function transforms numeric input into Roman numbers. It is built on the integer type, which results in the following constructor.

new_roman <- function(x = integer()) {
  structure(x, class = "roman")

The documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function.

validate_roman <- function(x) {
  values <- unclass(x)
  if (any(values < 1 | values > 3899)) {
      "Roman numbers must fall between 1 and 3899.",
      call. = FALSE

For convenience, we allow the user to also pass real values to a helper function.

roman <- function(x = integer()) {
  x <- as.integer(x)

# Test
roman(c(1, 753, 2019))
#> [1] I       DCCLIII MMXIX
#> Error: Roman numbers must fall between 1 and 3899.

11.3 Generics and methods

Q1: Read the source code for t() and t.test() and confirm that t.test() is an S3 generic and not an S3 method. What happens if you create an object with class test and call t() with it? Why?

x <- structure(1:10, class = "test")

A: We can see that t.test() is a generic because it calls UseMethod():

#> function (x, ...) 
#> UseMethod("t.test")
#> <bytecode: 0x7fce09d181a8>
#> <environment: namespace:stats>

# or simply call
#> [1] "S3"      "generic"

# The same holds for t()
#> function (x) 
#> UseMethod("t")
#> <bytecode: 0x7fce0ef52548>
#> <environment: namespace:base>

Interestingly, R also provides helpers, which list functions that look like methods, but in fact are not:

#> [1] "anova.lmlist"        "expand.model.frame"  "fitted.values"      
#> [4] "influence.measures"  "lag.plot"            "t.test"             
#> [7] "plot.spec.phase"     "plot.spec.coherency"

When we create an object with class test, t() dispatches to the t.default() method. This happens, because UseMethod() simply searches for functions named paste0("generic", ".", c(class(x), "default")).

x <- structure(1:10, class = "test")

#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,]    1    2    3    4    5    6    7    8    9    10
#> attr(,"class")
#> [1] "test"

However, in older versions of R (pre R 4.0.0; when Advanced R was written) this behaviour was slightly different. Instead of dispatching to the t.default() method, the t.test() generic was erroneously treated as a method of t() which then dispatched to t.test.default() or (when defined) to t.test.test().

# Output in R version 3.6.2
x <- structure(1:10, class = "test")
#>  One Sample t-test
#> data:  x
#> t = 5.7446, df = 9, p-value = 0.0002782
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#>  3.334149 7.665851
#> sample estimates:
#> mean of x 
#>       5.5 

t.test.test <- function(x) "Hi!"
#>[1] "Hi!"

Q2: What generics does the table class have methods for?

A: This is a simple application of sloop::s3_methods_class():

#> # A tibble: 10 x 4
#>    generic       class visible source             
#>    <chr>         <chr> <lgl>   <chr>              
#>  1 [             table TRUE    base               
#>  2 aperm         table TRUE    base               
#>  3 as.data.frame table TRUE    base               
#>  4 Axis          table FALSE   registered S3method
#>  5 lines         table FALSE   registered S3method
#>  6 plot          table FALSE   registered S3method
#>  7 points        table FALSE   registered S3method
#>  8 print         table TRUE    base               
#>  9 summary       table TRUE    base               
#> 10 tail          table FALSE   registered S3method

Interestingly, the table class has a number of methods designed to help plotting with base graphics.

x <- rpois(100, 5)

Q3: What generics does the ecdf class have methods for?

A: We use the same approach as above:

#> # A tibble: 4 x 4
#>   generic  class visible source             
#>   <chr>    <chr> <lgl>   <chr>              
#> 1 plot     ecdf  TRUE    stats              
#> 2 print    ecdf  FALSE   registered S3method
#> 3 quantile ecdf  FALSE   registered S3method
#> 4 summary  ecdf  FALSE   registered S3method

The methods are primarily designed for display (plot(), print(), summary()), but you can also extract quantiles with quantile().

Q4: Which base generic has the greatest number of defined methods?

A: A little experimentation (and thinking about the most popular functions) suggests that the print() generic has the most defined methods.

#> [1] 274
#> [1] 38
#> [1] 34

Let’s verify this programmatically with the tools we have learned in this and the previous chapters.


ls(all.names = TRUE, env = baseenv()) %>% 
  mget(envir = baseenv()) %>% 
  keep(is_function) %>% 
  names() %>% 
  keep(is_s3_generic) %>% 
  map(~ set_names(nrow(s3_methods_generic(.x)), .x)) %>% 
  flatten_int() %>% 
  sort(decreasing = TRUE) %>% 
#>        print       format            [ as.character      summary         plot 
#>          274          107           57           39           38           34

Q5: Carefully read the documentation for UseMethod() and explain why the following code returns the results that it does. What two usual rules of function evaluation does UseMethod() violate?

g <- function(x) {
  x <- 10
  y <- 10
g.default <- function(x) c(x = x, y = y)

x <- 1
y <- 1
#>  x  y 
#>  1 10

A: Let’s take this step by step. If you call g.default(x) directly you get c(1, 1) as you might expect.

The value bound to x comes from the argument, the value from y comes from the global environment.

#> x y 
#> 1 1

But when we call g(x) we get c(1, 10):

#>  x  y 
#>  1 10

This is seemingly inconsistent: why does x come from the value defined inside of g(), and y still come from the global environment? It’s because UseMethod() calls g.default() in a special way so that variables defined inside the generic are available to methods. The exception are arguments supplied to the function: they are passed on as is and cannot be affected by code inside the generic.

Q6: What are the arguments to [? Why is this a hard question to answer?

A: The subsetting operator [ is a primitive and a generic function, which can be confirmed via ftype().

#> [1] "primitive" "generic"

For primitive functions formals([) returns NULL so we need to find another way to determine the functions arguments. One possible way to figure out [’s arguments would be to inspect the underlying C source code, which can be searched for via pryr::show_c_source(.Primitive("[")).

When we inspect the arguments of some of [’s methods, we see that the arguments vary with the class of x.

#> [1] "x"    "i"    "j"    "drop"
#> [1] "x"    "i"    "j"    "..."  "drop"
#> [1] "x"    "..."  "drop"
#> [1] "x"   "i"   "..."

To finally get a better overview, we have to put in a little more effort and also use s3_methods_generic() again.


s3_methods_generic("[") %>%
  filter(visible) %>%
    method = paste0("[.", class),
    argnames = purrr::map(method, ~ names(formals(.x))),
    args = purrr::map(method, ~ formals(.x)),
    args = purrr::map2(
      argnames, args,
      ~ paste(.x, .y, sep = " = ")
    args = purrr::set_names(args, method)
  ) %>%
  pull(args) %>%
#> $`[.AsIs`
#> [1] "x = "   "i = "   "... = "
#> $`[.data.frame`
#> [1] "x = "                                              
#> [2] "i = "                                              
#> [3] "j = "                                              
#> [4] "drop = if (missing(i)) TRUE else length(cols) == 1"
#> $`[.Date`
#> [1] "x = "        "... = "      "drop = TRUE"
#> $`[.difftime`
#> [1] "x = "        "... = "      "drop = TRUE"
#> $`[.Dlist`
#> [1] "x = "   "i = "   "... = "
#> $`[.DLLInfoList`
#> [1] "x = "   "... = "

11.4 Object styles

Q1: Categorise the objects returned by lm(), factor(), table(), as.Date(), as.POSIXct(), ecdf(), ordered(), I() into the styles described above.

A: We can categorise the return values into the various object styles by observing how the number of observations is calculated: For vector style classes, length(x) represents the number of observations. Record style objects use a list of equal length elements to represent individual components. For data frames and matrices, the observations are represented by the rows. Scalar style objects use a list to represent a single thing.

This leads us to:

The object style of I() depends on the input since this function returns a “copy of the object with class AsIs prepended to the class(es).”

Q2: What would a constructor function for lm objects, new_lm(), look like? Use ?lm and experimentation to figure out the required fields and their types.

A: The constructor needs to populate the attributes of an lm object and check their types for correctness. Let’s start by creating a simple lm object and explore it’s underlying base type and attributes:

mod <- lm(cyl ~ ., data = mtcars)

#> [1] "list"

#> $names
#>  [1] "coefficients"  "residuals"     "effects"       "rank"         
#>  [5] "fitted.values" "assign"        "qr"            "df.residual"  
#>  [9] "xlevels"       "call"          "terms"         "model"        
#> $class
#> [1] "lm"

As mod is built upon a list, we can simply use map(mod, typeof) to find out the base types of its elements. (Additionally, we inspect ?lm, to learn more about the individual attributes.)

map_chr(mod, typeof)
#>  coefficients     residuals       effects          rank fitted.values 
#>      "double"      "double"      "double"     "integer"      "double" 
#>        assign            qr   df.residual       xlevels          call 
#>     "integer"        "list"     "integer"        "list"    "language" 
#>         terms         model 
#>    "language"        "list"

Now we should have enough information to write a constructor for new lm objects.

new_lm <- function(
  coefficients, residuals, effects, rank, fitted.values, assign,
  qr, df.residual, xlevels, call, terms, model
) {
    is.double(coefficients), is.double(residuals), 
    is.double(effects), is.integer(rank), is.double(fitted.values),
    is.integer(assign), is.list(qr), is.integer(df.residual),
    is.list(xlevels), is.language(call), is.language(terms),
      coefficients = coefficients,
      residuals = residuals,
      effects = effects,
      rank = rank, 
      fitted.values = fitted.values,
      assign = assign,
      qr = qr,
      df.residual = df.residual,
      xlevels = xlevels,
      call = call,
      terms = terms, 
      model = model
    class = "lm"

11.5 Inheritance

Q1: How does [.Date support subclasses? How does it fail to support subclasses?

A: [.Date calls .Date with the result of calling [ on the parent class, along with oldClass():

#> function (x, ..., drop = TRUE) 
#> {
#>     .Date(NextMethod("["), oldClass(x))
#> }
#> <bytecode: 0x7fce0ef9b070>
#> <environment: namespace:base>

.Date is kind of like a constructor for date classes, although it doesn’t check the input is the correct type:

#> function (xx, cl = "Date") 
#> `class<-`(xx, cl)
#> <bytecode: 0x7fce0cdaf338>
#> <environment: namespace:base>

oldClass() is basically the same as class(), except that it doesn’t return implicit classes, i.e. it’s basically attr(x, "class") (looking at the C code that’s exactly what it does, except that it also handles S4 objects).

As oldClass() is “basically” class(), we can rewrite [.Date to make the implementation more clear:

`[.Date` <- function(x, ..., drop = TRUE) {
  out <- NextMethod("[")
  class(out) <- class(x)

So, [.Date ensures that the output has the same class as in the input. But what about other attributes that a subclass might possess? They get lost:

x <- structure(1:4, test = "test", class = c("myDate", "Date"))
#> $class
#> [1] "myDate" "Date"

Q2: R has two classes for representing date time data, POSIXct and POSIXlt, which both inherit from POSIXt. Which generics have different behaviours for the two classes? Which generics share the same behaviour?

A: To answer this question, we have to get the respective generics

generics_t  <- s3_methods_class("POSIXt")$generic
generics_ct <- s3_methods_class("POSIXct")$generic
generics_lt <- s3_methods_class("POSIXlt")$generic

The generics in generics_t with a method for the superclass POSIXt potentially share the same behaviour for both subclasses. However, if a generic has a specific method for one of the subclasses, it has to be subtracted:

# These generics provide subclass-specific methods
union(generics_ct, generics_lt)
#>  [1] "["             "[["            "[<-"           "as.data.frame"
#>  [5] "as.Date"       "as.list"       "as.POSIXlt"    "c"            
#>  [9] "format"        "length<-"      "mean"          "print"        
#> [13] "rep"           "split"         "summary"       "Summary"      
#> [17] "weighted.mean" "xtfrm"         "[[<-"          "anyNA"        
#> [21] "as.double"     "as.matrix"     "as.POSIXct"    "duplicated"   
#> [25] "is.na"         "length"        "names"         "names<-"      
#> [29] "sort"          "unique"

# These generics share (inherited) methods for both subclasses
setdiff(generics_t, union(generics_ct, generics_lt))
#>  [1] "-"            "+"            "all.equal"    "as.character" "Axis"        
#>  [6] "cut"          "diff"         "hist"         "is.numeric"   "julian"      
#> [11] "Math"         "months"       "Ops"          "pretty"       "quantile"    
#> [16] "quarters"     "round"        "seq"          "str"          "trunc"       
#> [21] "weekdays"

Q3: What do you expect this code to return? What does it actually return? Why?

generic2 <- function(x) UseMethod("generic2")
generic2.a1 <- function(x) "a1"
generic2.a2 <- function(x) "a2"
generic2.b <- function(x) {
  class(x) <- "a1"

generic2(structure(list(), class = c("b", "a2")))

A: When we execute the code above, this is what is happening:

  • we pass an object of classes b and a2 to generic2(), which prompts R to look for a methodgeneric2.b()

  • the method generic2.b() then changes the class to a1 and calls NextMethod()

  • One would think that this will lead R to call generic2.a1(), but in fact, as mentioned in Advanced R, NextMethod()

    doesn’t actually work with the class attribute of the object, but instead uses a special global variable (.Class) to keep track of which method to call next.

    This is why generic2.a2() is called instead.

    generic2(structure(list(), class = c("b", "a2")))
    #> [1] "a2"

Let’s just double check the statement above and evaluate .Class explicitly within the generic2.b() method.

generic2.b <- function(x) {
  class(x) <- "a1"

generic2(structure(list(), class = c("b", "a2")))
#> [1] "b"  "a2"
#> [1] "a2"

11.6 Dispatch details

Q1: Explain the differences in dispatch below:

length.integer <- function(x) 10

x1 <- 1:5
#> [1] "integer"
#>  * length.integer
#>    length.numeric
#>    length.default
#> => length (internal)

x2 <- structure(x1, class = "integer")
#> [1] "integer"
#> => length.integer
#>    length.default
#>  * length (internal)

A: class() returns integer in both cases. However, while the class of x1 is created implicitly and inherits from the numeric class, the class of x2 is set explicitly. This is important because length() is an internal generic and internal generics only dispatch to methods when the class attribute has been set, i.e. internal generics do not use implicit classes.

An object has no explicit class if attr(x, "class") returns NULL:

attr(x1, "class")
attr(x2, "class")
#> [1] "integer"

To see the relevant classes for the S3 dispatch, one can use sloop::s3_class():

s3_class(x1)  # implicit
#> [1] "integer" "numeric"

s3_class(x2)  # explicit
#> [1] "integer"

For a better understanding of s3_dipatch()’s output we quote from ?s3_dispatch:

  • => method exists and is found by UseMethod().
  • -> method exists and is used by NextMethod().
  • * method exists but is not used.
  • Nothing (and greyed out in console): method does not exist.

Q2: What classes have a method for the Math() group generic in base R? Read the source code. How do the methods work?

A: The following functions belong to this group (see ?Math):

  • abs, sign, sqrt, floor, ceiling, trunc, round, signif
  • exp, log, expm1, log1p, cos, sin, tan, cospi, sinpi, tanpi, acos, asin, atan, cosh, sinh, tanh, acosh, asinh, atanh
  • lgamma, gamma, digamma, trigamma
  • cumsum, cumprod, cummax, cummin

The following classes have a method for this group generic:

#> # A tibble: 8 x 4
#>   generic class      visible source             
#>   <chr>   <chr>      <lgl>   <chr>              
#> 1 Math    data.frame TRUE    base               
#> 2 Math    Date       TRUE    base               
#> 3 Math    difftime   TRUE    base               
#> 4 Math    factor     TRUE    base               
#> 5 Math    POSIXt     TRUE    base               
#> 6 Math    quosure    FALSE   registered S3method
#> 7 Math    vctrs_sclr FALSE   registered S3method
#> 8 Math    vctrs_vctr FALSE   registered S3method

To explain the basic idea, we just overwrite the data frame method:

Math.data.frame <- function(x) "hello"

Now all functions from the math generic group, will return "hello"

#> [1] "hello"
#> [1] "hello"
#> [1] "hello"

Of course, different functions should perform different calculations. Here .Generic comes into play, which provides us with the calling generic as a string

Math.data.frame <- function(x, ...) {

#> [1] "abs"
#> [1] "exp"
#> [1] "lgamma"


The original source code of Math.data.frame() is a good example on how to invoke the string returned by .Generic into a specific method. Math.factor() is a good example of a method, which is simply defined for better error messages.

Q3: Math.difftime() is more complicated than I described. Why?

A: Math.difftime() also excludes cases apart from abs, sign, floor, ceiling, trunc, round and signif and needs to return a fitting error message.

For comparison: Math.difftime() as defined in Advanced R:

Math.difftime <- function(x, ...) {
  new_difftime(NextMethod(), units = attr(x, "units"))

Math.difftime() as defined in the {base} package:

#> function (x, ...) 
#> {
#>     switch(.Generic, abs = , sign = , floor = , ceiling = , trunc = , 
#>         round = , signif = {
#>             units <- attr(x, "units")
#>             .difftime(NextMethod(), units)
#>         }, stop(gettextf("'%s' not defined for \"difftime\" objects", 
#>             .Generic), domain = NA))
#> }
#> <bytecode: 0x7fce0f21fd28>
#> <environment: namespace:base>