# 13 S3

## 13.1 Basics

**Q**: The most important S3 objects in base R are factors, data frames, difftimes, and date/times (Dates, POSIXct, POSIXlt). You’ve already seen the attributes and base type that factors are built on. What base types and attributes are the others built on?**A**: TODO: Add answer for difftime.**data frame:**Data frames are build up on (named) lists. Together with the`row.names`

attribute and after setting the class to “data.frame”, we get a classical data frame`df_build <- structure(list(1:2, 3:4), names = c("a", "b"), row.names = 1:2, class = "data.frame") df_classic <- data.frame(a = 1:2, b = 3:4) identical(df_build, df_classic) #> [1] TRUE`

**date/times (Dates, POSIXct, POSIXlt):**Date is just a double with the class attribute set to “Date”`date_build <- structure(0, class = "Date") date_classic <- as.Date("1970-01-01") identical(date_build, date_classic) #> [1] TRUE`

POSIXct is a class for date/times that inherits from POSIXt and is built on doubles as well. The only attribute is tz (for timezone)

`POSIXct_build <- structure(1, class = c("POSIXct", "POSIXt"), tzone = "CET") POSIXct_classic <- .POSIXct(1, tz = "CET") # note that tz's default is NULL identical(POSIXct_build, POSIXct_classic) #> [1] TRUE`

POSIXlt is another date/time class that inherits from POSIXt. It is built on top of a named list and a tzone attribute. Differences between POSIXct and POSIXlt are described in

`?DateTimeClasses`

.`POSIXlt_build <- structure(list(sec = 30, min = 30L, hour = 14L, mday = 1L, mon = 0L, year = 70L, wday = 4L, yday = 0L, isdst = 0L, zone = "CET", gmtoff = 3600L), tzone = c("", "CET", "CEST"), class = c("POSIXlt", "POSIXt")) POSIXlt_classic <- as.POSIXlt(.POSIXct(13.5 * 3600 + 30)) identical(POSIXlt_build, POSIXlt_classic) #> [1] FALSE`

**Q**: Describe the difference between t.test() and t.data.frame()?**A**:**Q**: Describe the difference in behaviour in these two calls.`set.seed(1014) some_days <- as.Date("2017-01-31") + sample(10, 5) mean(some_days) #> [1] "2017-02-05" mean(unclass(some_days)) #> [1] 17202.2`

**A**: Since`mean()`

is a generic and`some_days`

is an object of class Date, the first call results in`mean.Date(some_days)`

.In the second call

`unclass()`

removes the class attribute from`some_days`

, which means that`unclass(some_days)`

is not an OO object and the call results in`mean.default(unclass(some_days))`

, which calculates the mean of the underlying double.When you look into the source code of

`mean.Date()`

(one line), you will see that the difference in the resulting objects is only the class attribute.**Q**: What does the`as.data.frame.data.frame()`

method do? Why is it confusing? How should you avoid this confusion in your own code?**A**: The name is confusing, because it is not clear, from the name, if it is a normal function, a generic or a method. Even if we know, that it is a method, the name doesn’t tell us, which part is the name of the generic and which part is the class name.

We can easily avoid this confusion, by forgoing to use period separated class and function names.To reveal the solution:

`as.data.frame.data.frame()`

is the data frame method of the`as.data.frame()`

generic. Methods of this generic generally coerce objects to data frames.

This specific method strips all class attributes preceding the “data.frame” class.

If row names are supplied, this method will check their length and then set these as new “row.names” attribute.**Q**: What does the following code return? What base type is it built on? What attributes does it use?`x <- ecdf(rpois(100, 10)) x #> Empirical CDF #> Call: ecdf(rpois(100, 10)) #> x[1:18] = 2, 3, 4, ..., 18, 19`

**A**: It returns the Empirical Cumulative Distribution Function of`rpois(100, 10)`

. It is built on the base type “closure” and it saves the expression, which was used to create it, in its`call`

attribute.

## 13.2 Classes

**Q**: Write a constructor for`data.frame`

objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?**A**: Data frames are built on (named) lists and their only attribute is row.names. Row names must be unique and have the same length as observations within the data. They must be of type integer or character. Also all elements must have the same length. Technically there are no restrictions to column names apart to those of lists, so one could surround special names via backticks at creation (of course this is not recommended). A very good constructor regarding these criteria is already implemented within the sloop package.**Q**: Enhance our`factor()`

helper to have better behaviour when one or more`values`

is not found in`levels`

. What does`base::factor()`

do in this situation?**A**:`base::factor()`

converts these values (silently) into`NA`

’s. To improve our`factor()`

helper, we write a more informative error message.`factor <- function(x, levels = unique(x)) { ind <- match(x, levels) # error when values occur, which are not in the levels if(any(is.na(ind))){ stop("The following values do not occur in the levels: ", paste(setdiff(x,levels), collapse = ", "), ".", call. = FALSE) } validate_factor(new_factor(ind, levels)) }`

**Q**: Carefully read the source code of`factor()`

. What does it do that our constructor does not?**A**: It allows more general values as`x`

input and converts them to character or replaces them by`character(0)`

(in case of`NULL`

). It also ensures that the levels are unique. This is done by setting the levels via the`base::levels<-`

function, which fails when one tries to supply duplicated level values.**Q**: Factors have an optional “contrasts” attribute. Read the help for`C()`

, briefly describe it’s purpose. Rewrite our`new_factor()`

constructor to include this attribute.**A**:**Q**: Read the documentation for`utils::as.roman()`

. How would you write a constructor? Does it need a validator? What would a helper look like?**A**:

## 13.3 Generics and methods

**Q**: Read the source code for`t()`

and`t.test()`

and confirm that`t.test()`

is an S3 generic and not an S3 method. What happens if you create an object with class`test`

and call`t()`

with it? Why?`x <- structure(1:10, class = "test") t(x) #> #> One Sample t-test #> #> data: x #> t = 5.7446, df = 9, p-value = 0.0002782 #> alternative hypothesis: true mean is not equal to 0 #> 95 percent confidence interval: #> 3.334149 7.665851 #> sample estimates: #> mean of x #> 5.5`

**A**: We can see that`t.test()`

is a generic, because it calls`UseMethod()`

`t.test #> function (x, ...) #> UseMethod("t.test") #> <bytecode: 0x1025958> #> <environment: namespace:stats>`

There are also other (programmatic) possibilities to check this like

`pryr::ftype()`

, which checks via its internal`pryr:::is_s3_generic`

, which uses`codetools::findGlobals()`

that`t.test()`

contains a call to`UseMethod()`

.Interestingly, while digging a bit around in the pryr package, one can also find some dependencies to the tools package, where a list of functions exists, which tells you about some functions that look like methods, but are not.

`tools::nonS3methods("stats") #> [1] "anova.lmlist" "expand.model.frame" "fitted.values" #> [4] "influence.measures" "lag.plot" "t.test" #> [7] "plot.spec.phase" "plot.spec.coherency"`

However, if we create an object with class

`test`

,`t()`

, will dispatch to`t.test()`

. This simply happens, because`UseMethod()`

just looks for functions named`paste0("generic", ".", c(class(x), "default"))`

. So`t.test()`

is erroneously treated like a method of`t()`

. Since`t.test()`

is a generic itself and doesn’t find a method called`t.test.test()`

, it dispatches to`t.test.default()`

. Just to proof the latter quickly, we specify a method`t.test.test()`

and see what happens:**Q**: Which base generic has the greatest number of defined methods?**A**:`print()`

has clearly the most`library(methods) objs <- mget(ls("package:base"), inherits = TRUE) funs <- Filter(is.function, objs) generics <- Filter(function(x) ("generic" %in% pryr::ftype(x)), funs) sort( lengths(sapply(names(generics), function(x) methods(x), USE.NAMES = TRUE)), decreasing = TRUE )[1:3] #> print format summary #> 206 75 34`

**Q**: The table() function produces output with class “table”. What attributes does this class possess? What generics does it have methods for?**A**:**Q**: Carefully read the documentation for`UseMethod()`

and explain why the following code returns the results that it does. What two usual rules of function evaluation does`UseMethod()`

violate?`g <- function(x) { x <- 10 y <- 10 UseMethod("g") } g.default <- function(x) c(x = x, y = y) x <- 1 y <- 1 g(x) #> x y #> 1 10`

**A**: R looks for the`x`

argument in`g()`

’s calling environment (the global environment), where`x`

is defined as 1. Then`g()`

dispatches to`g.default()`

. The`x`

argument is given to`g.default()`

.`y`

is not defined inside it, so`g.default()`

takes`y`

’s value from the environment where`UseMethod()`

created the call. There`y`

is defined as 10.`UseMethod()`

behaves special in many ways, two of them are:- it never “returns” (any statement after
`UseMethod`

won’t be evaluated) - the argument matching can become a bit tricky (generic and methods should have the same order of arguments or you must name them)

- it never “returns” (any statement after
**Q**: What are the arguments to`[`

? Why is this a hard question to answer?**A**:

## 13.4 Object styles

**Q**: Categorise the objects returned by`lm()`

,`factor()`

,`table()`

,`as.Date()`

,`ecdf()`

,`ordered()`

,`I()`

into “vector”, “scalar”, and “other”.**A**:vector:

`factor()`

,`table()`

,`as.Date()`

,`ordered()`

scalar:`lm()`

other:`ecdf()`

,`I()`

**Q**: What would a constructor function for`lm`

objects,`new_lm()`

, look like? Why is a constructor function less useful for linear models? (Think about what functions would call`new_lm()`

.)**A**: TODO: 2nd part of the question.`new_lm <- function(coefficiets, residuals, effects, rank, fitted.values, assign, qr, df.residual, xlevels, call, terms, model) { stopifnot(is.double(coefficients), is.double(residuals), is.double(effects), is.integer(rank), is.double(fitted.values), is.integer(assign), is.list(qr), is.integer(df.residual), is.list(xlevels), is.language(call), is.language(terms), is.list(model)) structure( list( coefficients = coefficients, residuals = residuals, effects = effects, rank = rank, fitted.values = fitted.values, assign = assign, qr = qr, df.residual = df.residual, xlevels = xlevels, call = call, terms = terms, model = model ), class = "lm" ) }`

## 13.5 Inheritance

**Q**: The`ordered`

class is a subclass of`factor`

, but it’s implemented in a very ad hoc way in base R. Implement it in a principled way by building a constructor and providing`vec_restore()`

method.`f1 <- factor("a", c("a", "b")) as.factor(f1) #> [1] a #> Levels: a b as.ordered(f1) # loses levels #> [1] a #> Levels: a`

**A**: TODO: the olad exercise text ended on “an`as_ordered`

generic”. Check the answer if it needs to be updated.ordered is a subclass of factor, so we need to do the following

- for factors: add a subclass argument to the constructor and helper
- for ordered: add a constructor
- write an
`as_ordered()`

generic with methods ordered, factor and default

We use the

**factor**constructor from the textbook and add the subclass argument`new_factor <- function(x, levels, ..., subclass = NULL) { stopifnot(is.integer(x)) stopifnot(is.character(levels)) structure( x, levels = levels, class = c(subclass, "factor") ) }`

We also use the validator for factors from the textbook

`validate_factor <- function(x) { values <- unclass(x) levels <- attr(x, "levels") if (!all(!is.na(values) & values > 0)) { stop( "All `x` values must be non-missing and greater than zero", call. = FALSE ) } if (length(levels) < max(values)) { stop( "There must at least as many `levels` as possible values in `x`", call. = FALSE ) } x }`

And we add the subclass argument for the helper from the textbook and the exercises

`factor <- function(x, levels = unique(x), ... , subclass = NULL) { ind <- match(x, levels) # error when values occur, which are not in the levels if(any(is.na(ind))){ stop("The following values do not occur in the levels: ", paste(setdiff(x,levels), collapse = ", "), ".", call. = FALSE) } validate_factor(new_factor(ind, levels, subclass = subclass)) }`

A constructor for ordered is already implemented in the sloop package:

`new_ordered <- function (x, levels) { stopifnot(is.integer(x)) stopifnot(is.character(levels)) structure(x, levels = levels, class = c("ordered", "factor")) }`

The implementation of the

**generic**and the first two methods is straight forward`as_ordered <- function(x, ...) { UseMethod("as_ordered") } as_ordered.ordered <- function(x, ...) x as_ordered.default <- function(x, ...) { stop( "Don't know how to coerce object of class ", paste(class(x), collapse = "/"), " into an ordered factor", call. = FALSE ) }`

For the factor method of

`as_ordered()`

we use the factor helper, since it saves us some typing:Finally, our new method preserves all levels:

For a real scenario, we might want to add an

`as_factor.ordered()`

method to the`as_factor()`

generic from the textbook.**Q**: How does`[.Date`

support subclasses? How does it fail to support subclasses?**A**:**Q**: R has two classes for representing date time data,`POSIXct`

and`POSIXlt`

, which both inherit from`POSIXt`

. Which generics have different behaviours for the two classes? Which generics share the same behaviour?**A**: To answer this question, we have to get the regarding generics`# generics of POSIXt generics_t <- attr(methods(class = "POSIXt"), "info")[["generic"]] # Generics of POSIXct generics_ct <- attr(methods(class = "POSIXct"), "info")[["generic"]] # generics of POSIXlt generics_lt <- attr(methods(class = "POSIXlt"), "info")[["generic"]]`

Those generics that have a method for POSIXt are potentially sharing the same behaviour (

`generics_t`

). However, those generics that have a specific method for one of the sublcases have to be subtractet:`generics_same <- setdiff(generics_t, union(generics_ct, generics_lt)) generics_same #> [1] "-" "+" "all.equal" "as.character" #> [5] "Axis" "cut" "diff" "hist" #> [9] "is.numeric" "julian" "Math" "months" #> [13] "Ops" "pretty" "quantile" "quarters" #> [17] "round" "seq" "str" "trunc" #> [21] "weekdays"`

The rest is different

`generics_different <- union(generics_ct, generics_lt) generics_different #> [1] "[" "[[" "[<-" "as.data.frame" #> [5] "as.Date" "as.list" "as.POSIXlt" "c" #> [9] "coerce" "format" "initialize" "length<-" #> [13] "mean" "print" "rep" "show" #> [17] "slotsFromS3" "split" "summary" "Summary" #> [21] "weighted.mean" "xtfrm" "anyNA" "as.double" #> [25] "as.matrix" "as.POSIXct" "duplicated" "is.na" #> [29] "length" "names" "names<-" "sort" #> [33] "unique"`

**Q**: What do you expect this code to return? What does it actually return? Why?`generic2 <- function(x) UseMethod("generic2") generic2.a1 <- function(x) "a1" generic2.a2 <- function(x) "a2" generic2.b <- function(x) { class(x) <- "a1" NextMethod() } generic2(sloop::new_s3_scalar(class = c("b", "a2")))`

**A**: TODO: Question text changed from “Explain what is happening in the following code.”…So check, if the answer needs to be updated accordingly. Also set eval to true in the code chunk and check why new_s3_scalar is not found (maybe wrong sloop version?).- We supply an object of classes
`b`

and`a2`

to`generic2()`

, so R will look for a method`generic2.b()`

`generic2.b()`

changes the class to`a1`

and then calls`NextMethod()`

One could think, that R calls now

`generic2.a1()`

. But in fact, as mentioned in the textbook,`NextMethod()`

doesn’t actually work with the class attribute of the object, but instead uses a special global variable (.Class) to keep track of which method to call next.

We can easily verify (for example via

`print()`

) that`.Class`

is still`c("b", "a2")`

and so`generic2.a2()`

gets called.

- We supply an object of classes

## 13.6 Dispatch details

**Q**: Explain the differences in dispatch below:`x1 <- 1:5 class(x1) #> [1] "integer" sloop::s3_dispatch(x1[1]) #> [.integer #> [.numeric #> [.default #> -> [ x2 <- structure(x1, class = "integer") class(x2) #> [1] "integer" sloop::s3_dispatch(x2[1]) #> [.integer #> [.default #> -> [`

**A**:**Q**: What classes have a method for the`Math`

group generic in base R? Read the source code. How do the methods work?

**A**: The following functions belong to this group (see ?`Math`

):

- abs, sign, sqrt, floor, ceiling, trunc, round, signif
- exp, log, expm1, log1p, cos, sin, tan, cospi, sinpi, tanpi, acos, asin, atan, cosh, sinh, tanh, acosh, asinh, atanh
- lgamma, gamma, digamma, trigamma
- cumsum, cumprod, cummax, cummin

```
The following classes have a method for this group generic:
```r
methods("Math")
#> [1] Math,nonStructure-method Math,structure-method
#> [3] Math.data.frame Math.Date
#> [5] Math.difftime Math.factor
#> [7] Math.POSIXt Math.quosure*
#> see '?methods' for accessing help and source code
```
To read the source code of the S3 classes, we can just enter the name of the method into the console. To get the source code of the S4 classes, we can use `getMethod()`, i. e. `getMethod("Math", "nonStructure")`.
To explain the basic idea, we just overwrite the data frame method:
```r
Math.data.frame <- function(x){"hello"}
```
Now all functions from the math generic group, will return `"hello"`
```r
abs(iris)
#> [1] "hello"
exp(iris)
#> [1] "hello"
lgamma(iris)
#> [1] "hello"
```
So, I hope the idea is clear. However, of course different functions should perform different calculations. Here `.Generic` comes into play, which provides us the calling generic as a string
```r
Math.data.frame <- function(x, ...){
.Generic
}
abs(iris)
#> [1] "abs"
exp(iris)
#> [1] "exp"
lgamma(iris)
#> [1] "lgamma"
rm(Math.data.frame)
```
`Math.data.frame()` is a good example, how to invoke `.Generic` to build sth. constructive up on `.Generic`. `Math.factor()` is a good example of a method, which is simply defined for better error messages.
```

**Q**:`Math.difftime()`

is more complicated than I described. Why?**A**:`Math.difftime()`

needs to exclude other cases than abs, sign, floor, ceiling, trunc, round and signif and supply an according error message.

## 13.7 Old exercises

**Q**: Draw a Venn diagram illustrating the relationships between functions, generics, and methods.**A**: Funtions don’t have to be generics or methods, but both the latter are functions. It is also possible that a function is both, a method and a generic, at the same time, which seems to be relatively awkward, so that also the author of the textbook doesn’t recommend it, see`?pryr::ftype`

This function figures out whether the input function is a regular/primitive/internal function, a internal/S3/S4 generic, or a S3/S4/RC method. This is function is slightly simplified as it’s possible for a method from one class to be a generic for another class, but that seems like such a bad idea that hopefully no one has done it.

**Q**: Write a constructor for`difftime`

objects. What base type are they built on? What attributes do they use? You’ll need to consult the documentation, read some code, and perform some experiments.**A**: Our constructor should be named`new_class_name`

, have one argument for its base type and each attribute and check the base types of these arguments as well.`new_difftime <- function(x, units = "auto") { stopifnot(is.double(x), is.character(units)) structure(x, units = units, class = "difftime") }`

However, since the following result prints awkward

we get a little bit more “inspiration” by the original

`difftime()`

function and make the regarding changes. Basically we need to implement logic for the units attribute, in case it is set to`"auto"`

and convert the value of the underlying double from seconds to the regarding unit, as commented in the following`new_difftime <- function(x, units = "auto") { stopifnot(is.double(x), is.character(units)) # case units == "auto": if (units == "auto") # when all time differences are NA, units should be "secs" units <- if (all(is.na(x))){ "secs" } else { # otherwise set the units regarding to the minimal time difference x_min <- min(abs(x), na.rm = TRUE) if (!is.finite(x_min) || x_min < 60) { "secs" } else if (x_min < 3600) { "mins" } else if (x_min < 86400) { "hours" } else { "days" } } # we rescale the underlying double, according to the units x <- switch(units, secs = x, mins = x/60, hours = x/3600, days = x/86400, weeks = x/(7 * 86400)) structure(x, units = units, class = "difftime") } # test new_difftime(c(NA, -3600, 86400)) #> Time differences in hours #> [1] NA -1 24`