13 S3

13.1 Basics

  1. Q: Describe the difference between t.test() and t.data.frame()? When is each function called?

    A: Because of S3’s generic.class() naming scheme, both functions may initially look similar, while they are in fact unrelated.

    • t.test() is a generic function that performs a t-test.
    • t.data.frame() is a method that gets called by the generic t() to transpose data frame input.

    Due to R’s S3 dispatch rules, t.test() would also get called when t() is a applied to an object of class “test”.

  2. Q: Make a list of commonly used base R functions that contain . in their name but are not S3 methods.

    A: In the recent years “snake_case”-style has become increasingly common when naming functions (and variables) in R. But many functions in base R will continue to be “point.separated”, which is why some inconsistency in your R code most likely cannot be avoided.

    For some of these functions “tidyverse”-replacements may exist such as readr::read_csv() or rlang::as_character(), which you could use at the cost of an extra dependency.

  1. Q: What does the as.data.frame.data.frame() method do? Why is it confusing? How could you avoid this confusion in your own code?

    A: The function as.data.frame.data.frame() implements the data.frame method for the as.data.frame() generic, which coerces objects to data frames.

    The name is confusing, because it does not clearly communicate the type of the function, which could be a regular function, a generic or a method. Even if we assume a method, the amount of .’s makes it difficult to separate the generic- and the class-part of the name" Is it the data.frame.data.frame method for the as generic? Is it the frame.data.frame method for the as.data generic?

    We could avoid this confusion by applying a different naming convention (e.g. “snake_case”) for our class and function names.

  2. Q: Describe the difference in behaviour in these two calls.

    A: mean() is a generic function, which will select the appropriate method based on the class of the input. some_days has the class “Date” and mean.Date(some_days) will be used.

    After unclass() has removed the class attribute the default method is chosen by the method dispatch. (mean.default(unclass(some_days))) calculates the mean of the underlying double.

  3. Q: What class of object does the following code return? What base type is it built on? What attributes does it use?

    A: This code returns an object of the class “ecdf” and contains an empirical cumulative distribution function of its input. The object is built on the base type “closure” (a function) and the expression, which was used to create it (rpois(100, 10)) is stored in in the call attribute.

  4. Q: What class of object does the following code return? What base type is it built on? What attributes does it use?

    A: This code returns a “table” object, which is build upon the “integer” type. The attribute “dimnames” are used to name the elements of the integer vector.

13.2 Classes

  1. Q: Write a constructor for data.frame objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?

    A: Data frames are built on a named lists of vectors, where every element is the same length. Their only attribute is “row.names” which must be a character vector the same length as the other elements. We need to provide the number of rows as an input to make it possible to create data frames with 0 columns but multiple rows.

    This leads to the following constructor:

    There are two additional restrictions we could implement if we were being very strict: both the row names and column names should be unique.

  2. Q: Enhance my factor() helper to have better behaviour when one or more values is not found in levels. What does base::factor() do in this situation?

    A: base::factor() converts these values (silently) into NA’s. To improve our factor() helper we choose to return an informative error message instead.

  3. Q: Carefully read the source code of factor(). What does it do that our constructor does not?

    A: The original implementation allows a more flexible specification of input for x. The input is coerced to character or replaced by character(0) (in case of NULL). It also ensures that the factor levels are unique. This is achieved by setting the levels via base::levels<-, which fails when duplicate values are supplied.

  4. Q: Factors have an optional “contrasts” attribute. Read the help for C(), and briefly describe the purpose of the attribute. What type should it have? Rewrite the new_factor() constructor to include this attribute.

    A: When factor variables (representing nominal or ordinal information) are used in statistical models, they are typically encoded as dummy variables and by default each level is compared with the first factor level. However, many different encodings (“contrasts”) are possible: https://en.wikipedia.org/wiki/Contrast_(statistics)

    Within R’s formula interface you can wrap a factor in C and specify the contrast of your choice. Alternatively you can set the “contrast” attribute of you factor variable, which accepts matrix input. (see ?contr.helmert or similar for details)

  5. Q: Read the documentation for utils::as.roman(). How would you write a constructor for this class? Does it need a validator? What would a helper look like?

    A: This function transforms numeric input into Roman numbers (how cool is this!). This class is built on the “integer” type, which results in the following constructor.

    The documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function.

    For convenience, we allow the user to also pass real values to a helper function.

13.3 Generics and methods

  1. Q: Read the source code for t() and t.test() and confirm that t.test() is an S3 generic and not an S3 method. What happens if you create an object with class test and call t() with it? Why?

    A: We can see that t.test() is a generic, because it calls UseMethod()

    Interestingly R also provides helpers, which list functions that look like methods, but in fact are not:

    When we create an object with class test, t(), will dispatch to t.test(). This happens, because UseMethod() simply searches for functions named paste0("generic", ".", c(class(x), "default")).

    Consequently t.test() is erroneously treated as a method of t(). Because t.test() is a generic itself and doesn’t find a method called t.test.test(), it dispatches to t.test.default(). We can define t.test.test() to demonstrate that this is really what is happening internally.

  2. Q: What generics does the table class have methods for?

    A: This is a simple application of sloop::s3_methods_class():

    Interestingly, the table class has a number of methods designed to help plotting with base graphics.

  3. Q: What generics does the ecdf class have methods for?

    A: We use the same approach as above:

    The methods are primarily designed for display (plot(), print(), summary()), but you can also extract quantiles with quantile().

  4. Q: Which base generic has the greatest number of defined methods?

    A: A little experimentation (and thinking about the most popular functions) suggests that the print() generic has the most defined methods.

  5. Q: Carefully read the documentation for UseMethod() and explain why the following code returns the results that it does. What two usual rules of function evaluation does UseMethod() violate?

    A: Let’s take this step by step. If you call g.default() directly you get c(1, 1) as you might expect. The value bound to x comes from the argument, the value from y comes from the global environment.

    But when we call g() we get c(1, 10):

    This is seemingly inconsistent: why does x come from the value defined inside of g(), and y still come from the global environment? It’s because UseMethod() calls g.default() in a special way so that variables defined inside the generic are available to methods. The exception is argument to the function: they are passed on as is, and cannot be affect by code inside the generic.

  6. Q: What are the arguments to [? Why is this a hard question to answer?

    A: The subsetting operator [ is a primitive and generic function as can be inspected via ftype().

    Therefore, formals([) returns NULL and one possible way to figure out [’s arguments would be to inspect the underlying C source code, which can be found online via pryr::show_c_source(.Primitive("[")). However, regarding the differing arguments of [’s methods, it seems most probable, that [’s arguemts are x and ....

13.5 Inheritance

  1. Q: How does [.Date support subclasses? How does it fail to support subclasses?

    A:

    [.Date calls .Date with the result of calling [ on the parent class, along with oldClass():

    .Date is kind of like a constructor for date classes, although it doesn’t check the input is the correct type:

So what does oldClass() do? It’s implemented in C so we can’t easily see what it does, and the documentation refers to S-PLUS:

Functions oldClass and oldClass<- behave in the same way as functions of those names in S-PLUS 5/6, but in R UseMethod dispatches on the class as returned by class (with some interpolated classes: see the link) rather than oldClass. However, group generics dispatch on the oldClass for efficiency, and internal generics only dispatch on objects for which is.object is true.

Instead lets just try it out:

```r
oldClass(Sys.Date())
#> [1] "Date"
oldClass(numeric())
#> NULL
oldClass(data.frame())
#> [1] "data.frame"
oldClass(integer())
#> NULL
```

It seems similar to `class()`, but it returns `NULL` for base types. Together this means that `[.Date` effectively calls `mean()` on the underlying numeric data, then resets the class of the result to the input. This ignores the fact that a subclass might have additional attributes.
  1. Q: R has two classes for representing date time data, POSIXct and POSIXlt, which both inherit from POSIXt. Which generics have different behaviours for the two classes? Which generics share the same behaviour?

    A: To answer this question, we have to get the respective generics

    The generics in generics_t with a method for the superclass POSIXt potentially share the same behaviour for both subclasses. However, if a generic has a specific method for one of the subclasses, it has to be subtracted:

  2. Q: What do you expect this code to return? What does it actually return? Why?

    A: When we execute the code above, this is what is happening:

    • we pass an object of classes b and a2 to generic2(), which prompts R to look for a methodgeneric2.b()
    • the method generic2.b() then changes the class to a1 and calls NextMethod()
    • One would think that this will lead R to call generic2.a1(), but in fact, as mentioned in the textbook, NextMethod()

      doesn’t actually work with the class attribute of the object, but instead uses a special global variable (.Class) to keep track of which method to call next.

    This is why generic2.a2() is called instead.

13.6 Dispatch details

  1. Q: Explain the differences in dispatch below:

    A: class() returns "integer" for x1 and x2, but the class of x1 is implicit, while the class of x2 is explicit. This is important because [ is an internal generic, so when the class is explicitly set, the “implicit” parent class numeric is not considered.

  2. Q: What classes have a method for the Math group generic in base R? Read the source code. How do the methods work?

    A: The following functions belong to this group (see ?Math):

    • abs, sign, sqrt, floor, ceiling, trunc, round, signif
    • exp, log, expm1, log1p, cos, sin, tan, cospi, sinpi, tanpi, acos, asin, atan, cosh, sinh, tanh, acosh, asinh, atanh
    • lgamma, gamma, digamma, trigamma
    • cumsum, cumprod, cummax, cummin

    The following classes have a method for this group generic:

    To explain the basic idea, we just overwrite the data frame method:

    Now all functions from the math generic group, will return "hello"

    Of course different functions should perform different calculations. Here .Generic comes into play, which provides us with the calling generic as a string

    The original source code of Math.data.frame() is a good example on how to invoke the string returned by .Generic into a specific method. Math.factor() is a good example of a method, which is simply defined for better error messages.

  3. Q: Math.difftime() is more complicated than I described. Why?

    A: Math.difftime() also excludes cases apart from abs, sign, floor, ceiling, trunc, round and signif and needs to return a fitting error message.