13 S3

library(sloop)

13.1 Basics

  1. Q: Describe the difference between t.test() and t.data.frame()? When is each function called?

    A: Because of S3’s generic.class() naming scheme, both functions may initially look similar, while they are in fact unrelated.

    • t.test() is a generic function that performs a t-test.
    • t.data.frame() is a method that gets called by the generic t() to transpose data frame input.

    Due to R’s S3 dispatch rules, t.test() would also get called when t() is a applied to an object of class “test”.

  2. Q: Make a list of commonly used base R functions that contain . in their name but are not S3 methods.

    A: In the recent years “snake_case”-style has become increasingly common when naming functions (and variables) in R. But many functions in base R will continue to be “point.separated”, which is why some inconsistency in your R code most likely cannot be avoided.

    # Some base R functions with point.separated names
    install.packages()
    read.csv()
    
    list.files()
    download.file()
    
    data.frame()
    as.character()
    Sys.Date()
    
    all.equal()
    
    do.call()
    on.exit()

    For some of these functions “tidyverse”-replacements may exist such as readr::read_csv() or rlang::as_character(), which you could use at the cost of an extra dependency.

  1. Q: What does the as.data.frame.data.frame() method do? Why is it confusing? How could you avoid this confusion in your own code?

    A: The function as.data.frame.data.frame() implements the data frame method for the as.data.frame() generic, which coerces objects to data frames.

    The name is confusing, because it does not clearly communicate the type of the function, which could be a regular function, a generic or a method. Even if we assume a method, the amount of .’s makes it difficult to separate the generic- and the class-part of the name.

    We could avoid this confusion by applying a different naming convention (e.g. “snake_case”) for our class and function names.

  2. Q: Describe the difference in behaviour in these two calls.

    set.seed(1014)
    some_days <- as.Date("2017-01-31") + sample(10, 5)
    
    mean(some_days)
    #> [1] "2017-02-06"
    mean(unclass(some_days))
    #> [1] 17203

    A: mean() is a generic function, which will select the appropriate method based on the class of the input. some_days has the class “Date” and mean.Date(some_days) will be used.

    After unclass() has removed the class attribute the default method is chosen by the method dispatch. (mean.default(unclass(some_days))) calculates the mean of the underlying double.

  3. Q: What class of object does the following code return? What base type is it built on? What attributes does it use?

    x <- ecdf(rpois(100, 10))
    x
    #> Empirical CDF 
    #> Call: ecdf(rpois(100, 10))
    #>  x[1:18] =  2,  3,  4,  ..., 2e+01, 2e+01

    A: This code returns an object of the class “ecdf” and contains an empirical cumulative distribution function of its input. The object is built on the base type “closure” and the expression, which was used to create it (rpois(100, 10)) is stored in in the call attribute.

  4. Q: What class of object does the following code return? What base type is it built on? What attributes does it use?

    x <- table(rpois(100, 5))
    x
    #> 
    #>  1  2  3  4  5  6  7  8  9 10 
    #>  7  5 18 14 15 15 14  4  5  3

    A: This code returns a “table” object, which is build upon the base type “integer”. The attribute “dimnames” are used to name the elements of the integer vector.

13.2 Classes

  1. Q: Write a constructor for data.frame objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?

    A: Data frames are built on (named) lists with the additional requirement that all elements must have the same length. Their only attribute is “row.names”. These must be unique, have the same length as each list element and also must be of integer or character type.

    There are no additional restrictions to column names apart to those of lists, so one could use special characters and surround the names with backticks (which is not recommended).

    A very good constructor implementing these criteria used to be part of the sloop package. It is no longer part of the package, but the source can still be found online (https://github.com/r-lib/sloop/blob/be7ce8a6be660536df4bdd3a31fa54f0d627f2d6/R/data.frame.R#L11).

    # Copied from older version of the sloop package
    new_data.frame <- function(x, row.names = NULL) {
      stopifnot(is.list(x))
    
      n <- if (length(x) == 0) 0 else length(x[[1]])
      lengths <- vapply(x, length, integer(1))
      stopifnot(all(lengths == n))
    
      if (is.null(row.names)) {
        row.names <- .set_row_names(n)
      } else {
        stopifnot(
          is.character(row.names) || 
            is.numeric(row.names)
          )
        stopifnot(
          length(row.names) == n || 
            length(row.names) == 2
          )
      }
    
      structure(
        x,
        class = "data.frame",
        row.names = row.names
      )
    }
    
    # Test
    x <- list(a = 1, b = 2)
    new_data.frame(x, row.names = "l1")
    #>    a b
    #> l1 1 2
    new_data.frame(x, row.names = 1)
    #> Error in attributes(.Data) <- c(attributes(.Data), attrib): row names must
    #> be 'character' or 'integer', not 'double'
  2. Q: Enhance my factor() helper to have better behaviour when one or more values is not found in levels. What does base::factor() do in this situation?

    A: base::factor() converts these values (silently) into NA’s. To improve our factor() helper we choose to return an informative error message instead.

    factor <- function(x, levels = unique(x)) {
      new_levels <- match(x, levels)
    
      # Return error if unseen levels are passed
      if(any(is.na(new_levels))){
        stop("The following values do not occur ",
             "in the levels of x: ",
             paste(setdiff(x, levels), collapse = ", ")
             ".", 
             call. = FALSE)
        }
    
      validate_factor(new_factor(new_levels, levels))
    }
  3. Q: Carefully read the source code of factor(). What does it do that our constructor does not?

    A: The original implementation allows a more flexible specification of input for x. The input is coerced to character or replaced by character(0) (in case of NULL). It also ensures that the factor levels are unique. This is achieved by setting the levels via base::levels<-, which fails when duplicate values are supplied.

  4. Q: Factors have an optional “contrasts” attribute. Read the help for C(), and briefly describe the purpose of the attribute. What type should it have? Rewrite the new_factor() constructor to include this attribute.

    A: When factor variables (representing nominal or ordinal information) are used in statistical models, they are typically encoded as dummy variables and by default each level is compared with the first factor level. However, many different encodings (“contrasts”) are possible: https://en.wikipedia.org/wiki/Contrast_(statistics)

    Within R’s formula interface you can wrap a factor in C and specify the contrast of your choice. Alternatively you can set the “contrast” attribute of you factor variable, which accepts matrix input. (see ?contr.helmert or similar for details)

    # Updated factor constructor
    new_factor <- function(
      x = integer(),
      levels = character(),
      contrast = NULL
    ) {
      stopifnot(is.integer(x))
      stopifnot(is.character(levels))
      stopifnot(is.matrix(contrast) | is.null(contrast))
    
      structure(
        x,
        levels = levels,
        class = "factor",
        contrast = contrast
      )
    }
  5. Q: Read the documentation for utils::as.roman(). How would you write a constructor for this class? Does it need a validator? What would a helper look like?

    A: This function transforms numeric input into Roman numbers (how cool is this!). This class is built on the “integer” type, which results in the following constructor.

    new_roman <- function(x = integer()){
      stopifnot(is.integer(x))
      structure(x, class = "roman")
    }

    The documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function.

    validate_roman <- function(x) {
      values <- unclass(x)
    
      if(any(values < 1 | values > 3899)) {
        stop(
          "Roman numbers are only defined between ",
          "1 and 3899.",
          call. = FALSE
        )
      }
    
      x
    }

    For convenience, we allow the user to also pass real values to a helper function.

    roman <- function(x = integer()) {
      x <- as.integer(x)
    
      validate_roman(new_roman(x))
    }
    
    # Test
    roman(c(1, 753, 2019))
    #> [1] I       DCCLIII MMXIX
    roman(0)
    #> Error: Roman numbers are only defined between 1 and 3899.

13.3 Generics and methods

  1. Q: Read the source code for t() and t.test() and confirm that t.test() is an S3 generic and not an S3 method. What happens if you create an object with class test and call t() with it? Why?

    x <- structure(1:10, class = "test")
    t(x)

    A: We can see that t.test() is a generic, because it calls UseMethod()

    t.test
    #> function (x, ...) 
    #> UseMethod("t.test")
    #> <bytecode: 0x23aa9c0>
    #> <environment: namespace:stats>
    
    # or simply call
    sloop::ftype(t.test)
    #> [1] "S3"      "generic"

    sloop::ftype() confirms via a call to sloop:::is_s3_generic (which then uses codetools::findGlobals()) that t.test() contains a call to UseMethod().

    Interestingly R also provides helpers, which list functions that look like methods, but in fact are not:

    tools::nonS3methods("stats")
    #> [1] "anova.lmlist"        "expand.model.frame"  "fitted.values"      
    #> [4] "influence.measures"  "lag.plot"            "t.test"             
    #> [7] "plot.spec.phase"     "plot.spec.coherency"

    When we create an object with class test, t(), will dispatch to t.test(). This happens, because UseMethod() simply searches for functions named paste0("generic", ".", c(class(x), "default")).

    Consequently t.test() is erroneously treated as a method of t(). Because t.test() is a generic itself and doesn’t find a method called t.test.test(), it dispatches to t.test.default().

    By defining t.test.test(), we demonstrate, that this is really what is happening internally.

    x <- structure(1:10, class = "test")
    t(x)
    #> 
    #>  One Sample t-test
    #> 
    #> data:  x
    #> t = 6, df = 9, p-value = 3e-04
    #> alternative hypothesis: true mean is not equal to 0
    #> 95 percent confidence interval:
    #>  3.33 7.67
    #> sample estimates:
    #> mean of x 
    #>       5.5
    
    t.test.test <- function(x) t.default(x)
    t(x)
    #>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    #> [1,]    1    2    3    4    5    6    7    8    9    10
    #> attr(,"class")
    #> [1] "test"
  2. Q: What generics does the table class have methods for?

    A: We find methods specific for the table class, by searching for functions that end on “.table”.

    library(methods)
    objs <- mget(ls("package:base"), inherits = TRUE)
    funs <- Filter(is.function, objs)
    
    Filter(function(x) grepl(".table$", x), names(funs))
    #>  [1] "[.table"             "aperm.table"         "as.data.frame.table"
    #>  [4] "as.table"            "is.table"            "margin.table"       
    #>  [7] "print.summary.table" "print.table"         "prop.table"         
    #> [10] "summary.table"
  3. Q: What generics does the ecdf class have methods for?

    A: We use the same approach as above. When this is not successful, we repeat using the superclass. Apparently the classes ecdf and stepfun exist, even though no specific methods are currently implemented (in base R).

    class(ecdf(1:2))
    #> [1] "ecdf"     "stepfun"  "function"
    
    Filter(function(x) grepl(".ecdf$", x), names(funs))
    #> character(0)
    Filter(function(x) grepl(".stepfun$", x), names(funs))
    #> character(0)
    Filter(function(x) grepl(".function$", x), names(funs))
    #> [1] "as.function"      "as.list.function" "is.function"     
    #> [4] "print.function"   "sys.function"
  4. Q: Which base generic has the greatest number of defined methods?

    A: The generic print() clearly has the most defined methods.

    generics <- Filter(
      function(x) "generic" %in% sloop::ftype(x),
      funs
    )
    
    methods_per_generic <- sapply(
      names(generics), function(x) methods(x),
      USE.NAMES = TRUE
    )
    
    tail(sort(lengths(methods_per_generic)), 3)
    #>      [ format  print 
    #>     35     70    209
  5. Q: Carefully read the documentation for UseMethod() and explain why the following code returns the results that it does. What two usual rules of function evaluation does UseMethod() violate?

    g <- function(x) {
      x <- 10
      y <- 10
      UseMethod("g")
    }
    g.default <- function(x) c(x = x, y = y)
    
    x <- 1
    y <- 1
    g(x)
    #>  x  y 
    #>  1 10

    A: R looks for the x argument in g()’s calling environment (the global environment), in which x is bound to 1.

    g() then dispatches to g.default(). The x argument is passed to g.default(). As y is not defined with g.default’s function environment, y’s value will be taken from the environment where UseMethod() created the call. There y is defined as 10.

    When invoking g.default() explicitly, instead of using UseMethod(), the default argument, x, is evaluated in g.default’s calling environment, where it is 10 and further global variables like y are looked up via lexical scoping in the enclosing (global) environment, where y is 1.

    g <- function(x) {
      x <- 10
      y <- 10
      g.default(x)
    }
    g(x)
    #>  x  y 
    #> 10  1
  6. Q: What are the arguments to [? Why is this a hard question to answer?

    A: The subsetting operator [ is a primitive and generic function as can be inspected via ftype().

    ftype(`[`)
    #> [1] "primitive" "generic"

    Therefore, formals([) returns NULL and one possible way to figure out [’s arguments would be to inspect the underlying C source code, which can be found online via pryr::show_c_source(.Primitive("[")). However, regarding the differing arguments of [’s methods, it seems most probable, that [’s arguemts are x and ....

    names(formals(`[.Date`))
    #> [1] "x"    "..."  "drop"
    names(formals(`[.table`))
    #> [1] "x"    "i"    "j"    "..."  "drop"
    names(formals(`[.AsIs`))
    #> [1] "x"   "i"   "..."

13.4 Object styles

  1. Q: Categorise the objects returned by lm(), factor(), table(), as.Date(), ecdf(), ordered(), I() into the styles described above.

    A: The returned objects correspond to the following object styles:

    Vector: factor(), table(), as.Date(), ordered()
    Record:
    Scalar: lm(), ecdf()
    Other: I()

  2. Q: What would a constructor function for lm objects, new_lm(), look like? Use ?lm and experimentation to figure out the required fields and their types.

    A: The constructor needs to populate the attributes of an lm object and check their type for correctness.

    # Learn about lm-attributes
    ?lm
    attributes(lm(cyl ~ ., data = mtcars))
    #> $names
    #>  [1] "coefficients"  "residuals"     "effects"       "rank"         
    #>  [5] "fitted.values" "assign"        "qr"            "df.residual"  
    #>  [9] "xlevels"       "call"          "terms"         "model"        
    #> 
    #> $class
    #> [1] "lm"
    
    # Define constructor
    new_lm <- function(
      coefficiets, residuals, effects, rank, fitted.values, assign,
      qr, df.residual, xlevels, call, terms, model
    ) {
    
      stopifnot(
        is.double(coefficients), is.double(residuals), 
        is.double(effects), is.integer(rank), is.double(fitted.values),
        is.integer(assign), is.list(qr), is.integer(df.residual),
        is.list(xlevels), is.language(call), is.language(terms),
        is.list(model)
      )
    
      structure(
        list(
          coefficients = coefficients,
          residuals = residuals,
          effects = effects,
          rank = rank, 
          fitted.values = fitted.values,
          assign = assign,
          qr = qr,
          df.residual = df.residual,
          xlevels = xlevels,
          call = call,
          terms = terms, 
          model = model
        ),
        class = "lm"
      )
    }

13.5 Inheritance

  1. Q: How does [.Date support subclasses? How does it fail to support subclasses?

    A:

    # inspect function
    `[.Date`
    
    # see how it's used
    x <- Sys.Date()
    s3_dispatch(x[1])
    
    # attempt to find out, what oldclass does
    oldClass()

    Maybe one would have to create a subclass to Date and see what s3_dispatch returns, when it is called on this subclass. I suspect, the delegation to the internal [ to be related to the issue here.

  2. Q: R has two classes for representing date time data, POSIXct and POSIXlt, which both inherit from POSIXt. Which generics have different behaviours for the two classes? Which generics share the same behaviour?

    A: To answer this question, we have to get the respective generics

    # define helper
    get_generics <- function(x) {
      attr(methods(class = x), "info")[["generic"]]
    }
    
    # get generics
    generics_t  <- get_generics("POSIXt")
    generics_ct <- get_generics("POSIXct")
    generics_lt <- get_generics("POSIXlt")

    The generics in generics_t with a method for the superclass POSIXt potentially share the same behaviour for both subclasses. However, if a generic has a specific method for one of the subclasses, it has to be subtracted:

    # These generics provide subclass-specific methods
    union(generics_ct, generics_lt)
    #>  [1] "["             "[["            "[<-"           "as.data.frame"
    #>  [5] "as.Date"       "as.list"       "as.POSIXlt"    "c"            
    #>  [9] "coerce"        "format"        "initialize"    "length<-"     
    #> [13] "mean"          "print"         "rep"           "show"         
    #> [17] "slotsFromS3"   "split"         "summary"       "Summary"      
    #> [21] "weighted.mean" "xtfrm"         "[[<-"          "anyNA"        
    #> [25] "as.double"     "as.matrix"     "as.POSIXct"    "duplicated"   
    #> [29] "is.na"         "length"        "names"         "names<-"      
    #> [33] "sort"          "unique"
    
    # These generics share (inherited) methods for both subclasses
    setdiff(generics_t, union(generics_ct, generics_lt))
    #>  [1] "-"            "+"            "all.equal"    "as.character"
    #>  [5] "Axis"         "cut"          "diff"         "hist"        
    #>  [9] "is.numeric"   "julian"       "Math"         "months"      
    #> [13] "Ops"          "pretty"       "quantile"     "quarters"    
    #> [17] "round"        "seq"          "str"          "trunc"       
    #> [21] "weekdays"
  3. Q: What do you expect this code to return? What does it actually return? Why?

    generic2 <- function(x) UseMethod("generic2")
    generic2.a1 <- function(x) "a1"
    generic2.a2 <- function(x) "a2"
    generic2.b <- function(x) {
      class(x) <- "a1"
      NextMethod()
    }
    
    generic2(structure(list(), class = c("b", "a2")))

    A: When we execute the code above, this is what is happening:

    • we pass an object of classes b and a2 to generic2(), which prompts R to look for a methodgeneric2.b()
    • the method generic2.b() then changes the class to a1 and calls NextMethod()
    • One would think that this will lead R to call generic2.a1(), but in fact, as mentioned in the textbook, NextMethod()

      doesn’t actually work with the class attribute of the object, but instead uses a special global variable (.Class) to keep track of which method to call next.

    This is why generic2.a2() is called instead.

13.6 Dispatch details

  1. Q: Explain the differences in dispatch below:

    x1 <- 1:5
    class(x1)
    #> [1] "integer"
    s3_dispatch(x1[1])
    #>    [.integer
    #>    [.numeric
    #>    [.default
    #> => [ (internal)
    
    x2 <- structure(x1, class = "integer")
    class(x2)
    #> [1] "integer"
    s3_dispatch(x2[1])
    #>    [.integer
    #>    [.default
    #> => [ (internal)

    A: class() returns "integer" for x1 and x2. However, they are not identical. While x2 has the attribute “class” with the value "integer", x1 doesn’t have a class attribute. Instead, x1 has the implicit class "numeric", see ?class.

  2. Q: What classes have a method for the Math group generic in base R? Read the source code. How do the methods work?

    A: The following functions belong to this group (see ?Math):

    • abs, sign, sqrt, floor, ceiling, trunc, round, signif
    • exp, log, expm1, log1p, cos, sin, tan, cospi, sinpi, tanpi, acos, asin, atan, cosh, sinh, tanh, acosh, asinh, atanh
    • lgamma, gamma, digamma, trigamma
    • cumsum, cumprod, cummax, cummin

    The following classes have a method for this group generic:

    methods("Math")
    #> [1] Math,nonStructure-method Math,structure-method   
    #> [3] Math.data.frame          Math.Date               
    #> [5] Math.difftime            Math.factor             
    #> [7] Math.POSIXt              Math.quosure*           
    #> see '?methods' for accessing help and source code

    To read the source code of the S3 classes, we can just enter the name of the method into the console. To get the source code of the S4 classes, we can use getMethod(), i. e. getMethod("Math", "nonStructure").

    To explain the basic idea, we just overwrite the data frame method:

    Math.data.frame <- function(x){"hello"}

    Now all functions from the math generic group, will return "hello"

    abs(iris)
    #> [1] "hello"
    exp(iris)
    #> [1] "hello"
    lgamma(iris)
    #> [1] "hello"

    Of course different functions should perform different calculations. Here .Generic comes into play, which provides us with the calling generic as a string

    Math.data.frame <- function(x, ...){
      .Generic
    }
    
    abs(iris)
    #> [1] "abs"
    exp(iris)
    #> [1] "exp"
    lgamma(iris)
    #> [1] "lgamma"
    
    rm(Math.data.frame)

    The original source code of Math.data.frame() is a good example on how to invoke the string returned by .Generic into a specific method. Math.factor() is a good example of a method, which is simply defined for better error messages.

  3. Q: Math.difftime() is more complicated than I described. Why?

    A: Math.difftime() also excludes cases apart from abs, sign, floor, ceiling, trunc, round and signif and needs to return a fitting error message.