26 Deprecated

26.1 Conditions

  1. Q: What does options(error = recover) do? Why might you use it?

    A: In case of options(error = recover) utils::recover() will be called (without arguments) in case of an error. This will print out a list of calls which precede the error and lets the user choose to incorporate browser() directly in any of the regarding environments allowing a practical mode for debugging.

  2. Q: What does options(error = quote(dump.frames(to.file = TRUE))) do? Why might you use it?

    A: This option writes a dump of the evaluation environment where an error occurs into a file ending on .rda. When this option is set, R will continue to run after the first error. To stop R at the first error use quote({dump.frames(to.file=TRUE); q()}). These options are especially useful for debugging non-interactive R scripts afterwards (“post mortem debugging”).

26.2 Expressions (new)

  1. Q: base::alist() is useful for creating pairlists to be used for function arguments:

    foo <- function() {}
    formals(foo) <- alist(x = , y = 1)
    foo
    #> function (x, y = 1) 
    #> {
    #> }

    What makes alist() special compared to list()?

    A: From ?alist:

    alist handles its arguments as if they described function arguments. So the values are not evaluated, and tagged arguments with no value are allowed whereas list simply ignores them. alist is most often used in conjunction with formals.

26.3 Functionals

26.3.1 My first functional: lapply()

  1. Q: Why are the following two invocations of lapply() equivalent?

    trims <- c(0, 0.1, 0.2, 0.5)
    x <- rcauchy(100)
    
    lapply(trims, function(trim) mean(x, trim = trim))
    lapply(trims, mean, x = x)

    A: In the first statement each element of trims is explicitly supplied to mean()’s second argument. In the latter statement this happens via positional matching, since mean()’s first argument is supplied via name in lapply()’s third argument (...).

  2. Q: The function below scales a vector so it falls in the range [0, 1]. How would you apply it to every column of a data frame? How would you apply it to every numeric column in a data frame?

    scale01 <- function(x) {
      rng <- range(x, na.rm = TRUE)
      (x - rng[1]) / (rng[2] - rng[1])
    }

    A: Since this function needs numeric input, one can check this via an if clause. If one also wants to return non-numeric input columns, these can be supplied to the else argument of the if() “function”:

    data.frame(lapply(iris, function(x) if (is.numeric(x)) scale01(x) else x))
  3. Q: Use both for loops and lapply() to fit linear models to the mtcars using the formulas stored in this list:

    formulas <- list(
      mpg ~ disp,
      mpg ~ I(1 / disp),
      mpg ~ disp + wt,
      mpg ~ I(1 / disp) + wt
    )

    A: Like in the first exercise, we can create two lapply() versions:

    # lapply (2 versions)
    la1 <- lapply(formulas, lm, data = mtcars)
    la2 <- lapply(formulas, function(x) lm(formula = x, data = mtcars))
    
    # for loop
    lf1 <- vector("list", length(formulas))
    for (i in seq_along(formulas)){
      lf1[[i]] <- lm(formulas[[i]], data = mtcars)
    }

    Note that all versions return the same content, but they won’t be identical, since the values of the “call” element will differ between each version.

  4. Q: Fit the model mpg ~ disp to each of the bootstrap replicates of mtcars in the list below by using a for loop and lapply(). Can you do it without an anonymous function?

    bootstraps <- lapply(1:10, function(i) {
      rows <- sample(1:nrow(mtcars), rep = TRUE)
      mtcars[rows, ]
    })

    A:

    # lapply without anonymous function
    la <- lapply(bootstraps, lm, formula = mpg ~ disp)
    
    # for loop
    lf <- vector("list", length(bootstraps))
    for (i in seq_along(bootstraps)){
      lf[[i]] <- lm(mpg ~ disp, data = bootstraps[[i]])
    }
  5. Q: For each model in the previous two exercises, extract \(R^2\) using the function below.

    rsq <- function(mod) summary(mod)$r.squared

    A: For the models in exercise 3:

    sapply(la1, rsq)
    #> [1] 0.718 0.860 0.781 0.884
    sapply(la2, rsq)
    #> [1] 0.718 0.860 0.781 0.884
    sapply(lf1, rsq)
    #> [1] 0.718 0.860 0.781 0.884

    And the models in exercise 4:

    sapply(la, rsq)
    #>  [1] 0.628 0.656 0.668 0.668 0.677 0.753 0.683 0.810 0.668 0.730
    sapply(lf, rsq)
    #>  [1] 0.628 0.656 0.668 0.668 0.677 0.753 0.683 0.810 0.668 0.730

26.3.2 For loops functionals: friends of lapply():

  1. Q: Use vapply() to:

    1. Compute the standard deviation of every column in a numeric data frame.

    2. Compute the standard deviation of every numeric column in a mixed data frame. (Hint: you’ll need to use vapply() twice.)

    A: As a numeric data.frame we choose cars:

    vapply(cars, sd, numeric(1))

    And as a mixed data.frame we choose iris:

    vapply(iris[vapply(iris, is.numeric, logical(1))],
           sd, 
           numeric(1))
  2. Q: Why is using sapply() to get the class() of each element in a data frame dangerous?

    A: Columns of data.frames might have more than one class, so the class of sapply()’s output may differ from time to time (silently). If …

    • all columns have one class: sapply() returns a character vector
    • one column has more classes than the others: sapply() returns a list
    • all columns have the same number of classes, which is more than one: sapply() returns a matrix

    For example:

    a <- letters[1:3]
    class(a) <- c("class1", "class2")
    df <- data.frame(a = character(3))
    df$a <- a
    df$b <- a
    class(sapply(df, class))
    #> [1] "matrix"

    Note that this case often appears, wile working with the POSIXt types, POSIXct and POSIXlt.

  3. Q: The following code simulates the performance of a t-test for non-normal data. Use sapply() and an anonymous function to extract the p-value from every trial.

    trials <- replicate(
      100, 
      t.test(rpois(10, 10), rpois(7, 10)),
      simplify = FALSE
    )

    Extra challenge: get rid of the anonymous function by using [[ directly.

    A:

    # anonymous function:
    sapply(trials, function(x) x[["p.value"]])
    # without anonymous function:
    sapply(trials, "[[", "p.value")
  4. Q: What does replicate() do? What sort of for loop does it eliminate? Why do its arguments differ from lapply() and friends?

    A: As stated in ?replicate:

    replicate is a wrapper for the common use of sapply for repeated evaluation of an expression (which will usually involve random number generation).

    We can see this clearly in the source code:

    #> function (n, expr, simplify = "array") 
    #> sapply(integer(n), eval.parent(substitute(function(...) expr)), 
    #>     simplify = simplify)
    #> <bytecode: 0x52e31f8>
    #> <environment: namespace:base>

    Like sapply() replicate() eliminates a for loop. As explained for Map() in the textbook, also every replicate() could have been written via lapply(). But using replicate() is more concise, and more clearly indicates what you’re trying to do.

  5. Q: Implement a version of lapply() that supplies FUN with both the name and the value of each component.

    A:

    lapply_nms <- function(X, FUN, ...){
      Map(FUN, X, names(X), ...)
    }
    lapply_nms(iris, function(x, y) c(class(x), y))
    #> $Sepal.Length
    #> [1] "numeric"      "Sepal.Length"
    #> 
    #> $Sepal.Width
    #> [1] "numeric"     "Sepal.Width"
    #> 
    #> $Petal.Length
    #> [1] "numeric"      "Petal.Length"
    #> 
    #> $Petal.Width
    #> [1] "numeric"     "Petal.Width"
    #> 
    #> $Species
    #> [1] "factor"  "Species"
  6. Q: Implement a combination of Map() and vapply() to create an lapply() variant that iterates in parallel over all of its inputs and stores its outputs in a vector (or a matrix). What arguments should the function take?

    A As we understand this exercise, it is about working with a list of lists, like in the following example:

    testlist <- list(iris, mtcars, cars)
    lapply(testlist, function(x) vapply(x, mean, numeric(1)))
    #> Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
    #> returning NA
    #> [[1]]
    #> Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
    #>         5.84         3.06         3.76         1.20           NA 
    #> 
    #> [[2]]
    #>     mpg     cyl    disp      hp    drat      wt    qsec      vs      am 
    #>  20.091   6.188 230.722 146.688   3.597   3.217  17.849   0.438   0.406 
    #>    gear    carb 
    #>   3.688   2.812 
    #> 
    #> [[3]]
    #> speed  dist 
    #>  15.4  43.0

    So we can get the same result with a more specialized function:

    lmapply <- function(X, FUN, FUN.VALUE, simplify = FALSE){
      out <- Map(function(x) vapply(x, FUN, FUN.VALUE), X)
      if(simplify == TRUE){return(simplify2array(out))}
      out
    }
    
    lmapply(testlist, mean, numeric(1))
    #> Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
    #> returning NA
    #> [[1]]
    #> Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
    #>         5.84         3.06         3.76         1.20           NA 
    #> 
    #> [[2]]
    #>     mpg     cyl    disp      hp    drat      wt    qsec      vs      am 
    #>  20.091   6.188 230.722 146.688   3.597   3.217  17.849   0.438   0.406 
    #>    gear    carb 
    #>   3.688   2.812 
    #> 
    #> [[3]]
    #> speed  dist 
    #>  15.4  43.0
  7. Q: Implement mcsapply(), a multi-core version of sapply(). Can you implement mcvapply(), a parallel version of vapply()? Why or why not?

26.3.3 Manipulating matrices and data frames

  1. Q: How does apply() arrange the output? Read the documentation and perform some experiments.

    A:

    apply() arranges its output columns (or list elements) according to the order of the margin. The rows are ordered by the other dimensions, starting with the “last” dimension of the input object. What this means should become clear by looking at the three and four dimensional cases of the following example:

    # for two dimensional cases everything is sorted by the other dimension
    arr2 <- array(1:9, dim = c(3, 3), dimnames = list(paste0("row", 1:3),
                                                  paste0("col", 1:3)))
    arr2
    apply(arr2, 1, head, 1) # Margin is row
    apply(arr2, 1, head, 9) # sorts by col
    
    apply(arr2, 2, head, 1) # Margin is col
    apply(arr2, 2, head, 9) # sorts by row
    
    # 3 dimensional
    arr3 <- array(1:27, dim = c(3,3,3), dimnames = list(paste0("row", 1:3),
                                                    paste0("col", 1:3),
                                                    paste0("time", 1:3)))
    arr3
    apply(arr3, 1, head, 1) # Margin is row
    apply(arr3, 1, head, 27) # sorts by time and col
    
    apply(arr3, 2, head, 1) # Margin is col
    apply(arr3, 2, head, 27) # sorts by time and row
    
    apply(arr3, 3, head, 1) # Margin is time
    apply(arr3, 3, head, 27) # sorts by col and row
    
    # 4 dimensional
    arr4 <- array(1:81, dim = c(3,3,3,3), dimnames = list(paste0("row", 1:3),
                                                      paste0("col", 1:3),
                                                      paste0("time", 1:3),
                                                      paste0("var", 1:3)))
    arr4
    
    apply(arr4, 1, head, 1) # Margin is row
    apply(arr4, 1, head, 81) # sorts by var, time, col
    
    apply(arr4, 2, head, 1) # Margin is col
    apply(arr4, 2, head, 81) # sorts by var, time, row
    
    apply(arr4, 3, head, 1) # Margin is time
    apply(arr4, 3, head, 81) # sorts by var, col, row
    
    apply(arr4, 4, head, 1) # Margin is var
    apply(arr4, 4, head, 81) # sorts by time, col, row
  2. Q: There’s no equivalent to split() + vapply(). Should there be? When would it be useful? Implement one yourself.

    A: We can modify the tapply2() approach from the book, where split() and sapply() were combined:

    v_tapply <- function(x, group, f, FUN.VALUE, ..., USE.NAMES = TRUE) {
      pieces <- split(x, group)
      vapply(pieces, f, FUN.VALUE, ..., USE.NAMES = TRUE)
    }

    tapply() has a SIMPLIFY argument. When you set it to FALSE, tapply() will always return a list. It is easy to create cases where the length and the types/classes of the list elements vary depending on the input. The vapply() version could be useful, if you want to control the structure of the output to get an error according to some logic of a specific usecase or you want typestable output to build up other functions on top of it.

  3. Q: Implement a pure R version of split(). (Hint: use unique() and subsetting.) Can you do it without a for loop?

    A:

    split2 <- function(x, f, drop = FALSE, ...){
      # there are three relevant cases for f. f is a character, f is a factor and all
      # levels occur, f is a factor and some levels don't occur.
    
      # first we check if f is a factor
      fact <- is.factor(f)
    
      # if drop it set to TRUE, we drop the non occuring levels.
      # (If f is a character, this has no effect.)
      if(drop){f <- f[, drop = TRUE]}
    
      # now we want all unique elements/levels of f
      levs <- if (fact) {unique(levels(f))} else {as.character(unique(f))}
    
      # we use these levels to subset x and supply names for the resulting output.
      setNames(lapply(levs, function(lv) x[f == lv, , drop = FALSE]), levs)
    }
  4. Q: What other types of input and output are missing? Brainstorm before you look up some answers in the plyr paper.

    A: From the suggested plyr paper, we can extract a lot of possible combinations and list them up on a table. Sean C. Anderson already has done this based on a presentation from Hadley Wickham and provided the following result here.

    object type array data frame list nothing
    array apply . . .
    data frame . aggregate by .
    list sapply . lapply .
    n replicates replicate . replicate .
    function arguments mapply . mapply .

    Note the column nothing, which is specifically for usecases, where sideeffects like plotting or writing data are intended.

26.3.4 Manipulating lists

  1. Q: Why isn’t is.na() a predicate function? What base R function is closest to being a predicate version of is.na()?

    A: Because a predicate function always returns TRUE or FALSE. is.na(NULL) returns logical(0), which excludes it from being a predicate function. The closest in base that we are aware of is anyNA(), if one applies it elementwise.

  2. Q: Use Filter() and vapply() to create a function that applies a summary statistic to every numeric column in a data frame.

    A:

    vapply_num <- function(X, FUN, FUN.VALUE){
      vapply(Filter(is.numeric, X), FUN, FUN.VALUE)
    }
  3. Q: What’s the relationship between which() and Position()? What’s the relationship between where() and Filter()?

    A: which() returns all indices of true entries from a logical vector. Position() returns just the first (default) or the last integer index of all true entries that occur by applying a predicate function on a vector. So the default relation is Position(f, x) <=> min(which(f(x))).

    where(), defined in the book as:

    where <- function(f, x) {
      vapply(x, f, logical(1))
    } 

    is useful to return a logical vector from a condition asked on elements of a list or a data frame. Filter(f, x) returns all elements of a list or a data frame, where the supplied predicate function returns TRUE. So the relation is Filter(f, x) <=> x[where(f, x)].

  4. Q: Implement Any(), a function that takes a list and a predicate function, and returns TRUE if the predicate function returns TRUE for any of the inputs. Implement All() similarly.

    A: Any():

    Any <- function(l, pred){
      stopifnot(is.list(l))
    
      for (i in seq_along(l)){
        if (pred(l[[i]])) return(TRUE)
      }
    
      return(FALSE)
    }

    All():

    All <- function(l, pred){
      stopifnot(is.list(l))
    
      for (i in seq_along(l)){
        if (!pred(l[[i]])) return(FALSE)
      }
    
      return(TRUE)
    }
  5. Q: Implement the span() function from Haskell: given a list x and a predicate function f, span returns the location of the longest sequential run of elements where the predicate is true. (Hint: you might find rle() helpful.)

    A: Our span_r() function returns the first index of the longest sequential run of elements where the predicate is true. In case of more than one longest sequenital, more than one first_index is returned.

    span_r <- function(l, pred){
      # We test if l is a list
      stopifnot(is.list(l))
    
      # we preallocate a logical vector and save the result
      # of the predicate function applied to each element of the list
      test <- vector("logical", length(l))
      for (i in seq_along(l)){
        test[i] <- (pred(l[[i]]))
      }
      # we return NA, if the output of pred is always FALSE
      if(!any(test)) return(NA_integer_)
    
      # Otherwise we look at the length encoding of TRUE and FALSE values.
      rle_test <- rle(test)
      # Since it might happen, that more than one maximum series of TRUE's appears,
      # we have to implement some logic, which might be easier, if we save the rle 
      # output in a data.frmame
      rle_test <- data.frame(lengths = rle_test[["lengths"]],
                              values = rle_test[["values"]],
                              cumsum = cumsum(rle_test[["lengths"]]))
      rle_test[["first_index"]] <- rle_test[["cumsum"]] - rle_test[["lengths"]] + 1
      # In the last line we calculated the first index in the original list for every encoding
      # In the next line we calculate a column, which gives the maximum 
      # encoding length among all encodings with the value TRUE
      rle_test[["max"]] <-  max(rle_test[rle_test[, "values"] == TRUE, ][,"lengths"])
      # Now we just have to subset for maximum length among all TRUE values and return the
      # according "first index":
      rle_test[rle_test$lengths == rle_test$max & rle_test$values == TRUE, ]$first_index
    }

26.3.5 List of functions

  1. Q: Implement a summary function that works like base::summary(), but uses a list of functions. Modify the function so it returns a closure, making it possible to use it as a function factory.

  2. Q: Which of the following commands is equivalent to with(x, f(z))?

    1. x$f(x$z).
    2. f(x$z).
    3. x$f(z).
    4. f(z).
    5. It depends.

26.3.6 Mathematical functionals

  1. Q: Implement arg_max(). It should take a function and a vector of inputs, and return the elements of the input where the function returns the highest value. For example, arg_max(-10:5, function(x) x ^ 2) should return -10. arg_max(-5:5, function(x) x ^ 2) should return c(-5, 5). Also implement the matching arg_min() function.

    A: arg_max():

    arg_max <- function(x, f){
      x[f(x) == max(f(x))]
    }

    arg_min():

    arg_min <- function(x, f){
      x[f(x) == min(f(x))]
    }
  2. Q: Challenge: read about the fixed point algorithm. Complete the exercises using R.

26.3.7 A family of functions

  1. Q: Implement smaller and larger functions that, given two inputs, return either the smaller or the larger value. Implement na.rm = TRUE: what should the identity be? (Hint: smaller(x, smaller(NA, NA, na.rm = TRUE), na.rm = TRUE) must be x, so smaller(NA, NA, na.rm = TRUE) must be bigger than any other value of x.) Use smaller and larger to implement equivalents of min(), max(), pmin(), pmax(), and new functions row_min() and row_max().

    A: We can do almost everything as shown in the case study in the textbook. First we define the functions smaller_() and larger_(). We use the underscore suffix, to built up non suffixed versions on top, which will include the na.rm parameter. In contrast to the add() example from the book, we change two things at this step. We won’t include errorchecking, since this is done later at the top level and we return NA_integer_ if any of the arguments is NA (this is important, if na.rm is set to FALSE and wasn’t needed by the add() example, since + already returns NA in this case.)

    smaller_ <- function(x, y){
      if(anyNA(c(x, y))){return(NA_integer_)}
      out <- x
      if(y < x) {out <- y}
      out
    }
    
    larger_ <- function(x, y){
      if(anyNA(c(x, y))){return(NA_integer_)}
      out <- x
      if(y > x) {out <- y}
      out
    }

    We can take na.rm() from the book:

    rm_na <- function(x, y, identity) {
      if (is.na(x) && is.na(y)) {
        identity
        } else if (is.na(x)) {
          y
          } else {
            x
          }
    }

    To find the identity value, we can apply the same argument as in the textbook, hence our functions are also associative and the following equation should hold:

    3 = smaller(smaller(3, NA), NA) = smaller(3, smaller(NA, NA)) = 3

    So the identidy has to be greater than 3. When we generalize from 3 to any real number this means that the identity has to be greater than any number, which leads us to infinity. Hence identity has to be Inf for smaller() (and -Inf for larger()), which we implement next:

    smaller <- function(x, y, na.rm = FALSE) {
      stopifnot(length(x) == 1, length(y) == 1, is.numeric(x) | is.logical(x),
                is.numeric(y) | is.logical(y))
      if (na.rm && (is.na(x) || is.na(y))) rm_na(x, y, Inf) else smaller_(x,y)
    }
    
    larger <- function(x, y, na.rm = FALSE) {
      stopifnot(length(x) == 1, length(y) == 1, is.numeric(x) | is.logical(x),
                is.numeric(y) | is.logical(y))
      if (na.rm && (is.na(x) || is.na(y))) rm_na(x, y, -Inf) else larger_(x,y)
    }

    Like min() and max() can act on vectors, we can implement this easyly for our new functions. As shown in the book, we also have to set the init parameter to the identity value.

    r_smaller <- function(xs, na.rm = TRUE) {
      Reduce(function(x, y) smaller(x, y, na.rm = na.rm), xs, init = Inf)
    }
    # some tests
    r_smaller(c(1:3, 4:(-1)))
    #> [1] -1
    r_smaller(NA, na.rm = TRUE)
    #> [1] Inf
    r_smaller(numeric())
    #> [1] Inf
    
    r_larger <- function(xs, na.rm = TRUE) {
      Reduce(function(x, y) larger(x, y, na.rm = na.rm), xs, init = -Inf)
    }
    # some tests
    r_larger(c(1:3), c(4:1))
    #> [1] 3
    r_larger(NA, na.rm = TRUE)
    #> [1] -Inf
    r_larger(numeric())
    #> [1] -Inf

    We can also create vectorised versions as shown in the book. We will just show the smaller() case to become not too verbose.

    v_smaller1 <- function(x, y, na.rm = FALSE){
      stopifnot(length(x) == length(y), is.numeric(x) | is.logical(x), 
                is.numeric(y)| is.logical(x))
      if (length(x) == 0) return(numeric())
      simplify2array(
        Map(function(x, y) smaller(x, y, na.rm = na.rm), x, y)
      )
    }
    
    v_smaller2 <- function(x, y, na.rm = FALSE) {
      stopifnot(length(x) == length(y), is.numeric(x) | is.logical(x), 
                is.numeric(y)| is.logical(x))
      vapply(seq_along(x), function(i) smaller(x[i], y[i], na.rm = na.rm),
             numeric(1))
    }
    
    # Both versions give the same results
    v_smaller1(1:10, c(2,1,4,3,6,5,8,7,10,9))
    #>  [1] 1 1 3 3 5 5 7 7 9 9
    v_smaller2(1:10, c(2,1,4,3,6,5,8,7,10,9))
    #>  [1] 1 1 3 3 5 5 7 7 9 9
    
    v_smaller1(numeric(), numeric())
    #> numeric(0)
    v_smaller2(numeric(), numeric())
    #> numeric(0)
    
    v_smaller1(c(1, NA), c(1, NA), na.rm = FALSE)
    #> [1]  1 NA
    v_smaller2(c(1, NA), c(1, NA), na.rm = FALSE)
    #> [1]  1 NA
    
    v_smaller1(NA,NA)
    #> [1] NA
    v_smaller2(NA,NA)
    #> [1] NA

    Of course, we are also able to copy paste the rest from the textbook, to solve the last part of the exercise:

    row_min <- function(x, na.rm = FALSE) {
      apply(x, 1, r_smaller, na.rm = na.rm)
    }
    col_min <- function(x, na.rm = FALSE) {
      apply(x, 2, r_smaller, na.rm = na.rm)
    }
    arr_min <- function(x, dim, na.rm = FALSE) {
      apply(x, dim, r_smaller, na.rm = na.rm)
    }
  2. Q: Create a table that has and, or, add, multiply, smaller, and larger in the columns and binary operator, reducing variant, vectorised variant, and array variants in the rows.

    1. Fill in the cells with the names of base R functions that perform each of the roles.

    2. Compare the names and arguments of the existing R functions. How consistent are they? How could you improve them?

    3. Complete the matrix by implementing any missing functions.

    A In the following table we can see the requested base R functions, that we are aware of:

    and or add multiply smaller larger
    binary && ||
    reducing all any sum prod min max
    vectorised & | + * pmin pmax
    array

    Notice that we were relatively strict about the binary row. Since the vectorised and reducing versions are more general, then the binary versions, we could have used them twice. However, this doesn’t seem to be the intention of this exercise.

    The last part of this exercise can be solved via copy pasting from the book and the last exercise for the binary row and creating combinations of apply() and the reducing versions for the array row. We think the array functions just need a dimension and an rm.na argument. We don’t know how we would name them, but sth. like sum_array(1, na.rm = TRUE) could be ok.

    The second part of the exercise is hard to solve complete. But in our opinion, there are two important parts. The behaviour for special inputs like NA, NaN, NULL and zero length atomics should be consistent and all versions should have a rm.na argument, for which the functions also behave consistent. In the follwing table, we return the output of `f`(x, 1), where f is the function in the first column and x is the special input in the header (the named functions also have an rm.na argument, which is FALSE by default). The order of the arguments is important, because of lazy evaluation.

    NA NaN NULL logical(0) integer(0)
    && NA NA error NA NA
    all NA NA TRUE TRUE TRUE
    & NA NA error logical(0) logical(0)
    || TRUE TRUE error TRUE TRUE
    any TRUE TRUE TRUE TRUE TRUE
    | TRUE TRUE error logical(0) logical(0)
    sum NA NaN 1 1 1
    + NA NaN numeric(0) numeric(0) numeric(0)
    prod NA NaN 1 1 1
    * NA NaN numeric(0) numeric(0) numeric(0)
    min NA NaN 1 1 1
    pmin NA NaN numeric(0) numeric(0) numeric(0)
    max NA NaN 1 1 1
    pmax NA NaN numeric(0) numeric(0) numeric(0)

    We can see, that the vectorised and reduced numerical functions are all consistent. However it is not, that the first three logical functions return NA for NA and NaN, while the 4th till 6th function all return TRUE. Then FALSE would be more consistent for the first three or the return of NA for all and an extra na.rm argument. In seems relatively hard to find an easy rule for all cases and especially the different behaviour for NULL is relatively confusing. Another good opportunity for sorting the functions would be to differentiate between “numerical” and “logical” operators first and then between binary, reduced and vectorised, like below (we left the last colum, which is redundant, because of coercion, as intended):

    `f(x,1)` NA NaN NULL logical(0)
    && NA NA error NA
    || TRUE TRUE error TRUE
    all NA NA TRUE TRUE
    any TRUE TRUE TRUE TRUE
    & NA NA error logical(0)
    | TRUE TRUE error logical(0)
    sum NA NaN 1 1
    prod NA NaN 1 1
    min NA NaN 1 1
    max NA NaN 1 1
    + NA NaN numeric(0) numeric(0)
    * NA NaN numeric(0) numeric(0)
    pmin NA NaN numeric(0) numeric(0)
    pmax NA NaN numeric(0) numeric(0)

    The other point are the naming conventions. We think they are clear, but it could be useful to provide the missing binary operators and name them for example ++, **, <>, >< to be consistent.

  3. Q: How does paste() fit into this structure? What is the scalar binary function that underlies paste()? What are the sep and collapse arguments to paste() equivalent to? Are there any paste variants that don’t have existing R implementations?

    A paste() behaves like a mix. If you supply only length one arguments, it will behave like a reducing function, i.e. :

    paste("a", "b", sep = "")
    #> [1] "ab"
    paste("a", "b","", sep = "") 
    #> [1] "ab"

    If you supply at least one element with length greater then one, it behaves like a vectorised function, i.e. :

    paste(1:3)
    #> [1] "1" "2" "3"
    paste(1:3, 1:2)
    #> [1] "1 1" "2 2" "3 1"
    paste(1:3, 1:2, 1)
    #> [1] "1 1 1" "2 2 1" "3 1 1"

    We think it should be possible to implement a new paste() starting from

    p_binary <- function(x, y = "") {
      stopifnot(length(x) == 1, length(y) == 1)
      paste0(x,y)
    }

    The sep argument is equivalent to bind sep on every ... input supplied to paste(), but the last and then bind these results together. In relations:

    paste(n1, n2, ...,nm , sep = sep) <=>
    paste0(paste0(n1, sep), paste(n2, n3, ..., nm, sep = sep)) <=>
    paste0(paste0(n1, sep), paste0(n2, sep), ..., paste0(nn, sep), paste0(nm))

    We can check this for scalar and non scalar input

    # scalar:
    paste("a", "b", "c", sep = "_")
    #> [1] "a_b_c"
    paste0(paste0("a", "_"), paste("b", "c", sep = "_"))
    #> [1] "a_b_c"
    paste0(paste0("a", "_"), paste0("b", "_"), paste0("c"))
    #> [1] "a_b_c"
    
    # non scalar
    paste(1:2, "b", "c", sep = "_")
    #> [1] "1_b_c" "2_b_c"
    paste0(paste0(1:2, "_"), paste("b", "c", sep = "_"))
    #> [1] "1_b_c" "2_b_c"
    paste0(paste0(1:2, "_"), paste0("b", "_"), paste0("c"))
    #> [1] "1_b_c" "2_b_c"

    collapse just binds the outputs for non scalar input together with the collapse input. In relations:

    for input A1, ..., An, where Ai = a1i:ami,
    
    paste(A1 , A2 , ...,  An, collapse = collapse) 
    <=>
    paste0(
          paste0(paste(  a11,   a12, ...,   a1n), collapse),
          paste0(paste(  a21,   a22, ...,   a2n), collapse),
          .................................................
          paste0(paste(am-11, am-12, ..., am-1n), collapse),      
                 paste(  am1,   am2, ...,   amn)
          )

    One can see this easily by intuition from examples:

    paste(1:5, 1:5, 6, sep = "", collapse = "_x_")
    #> [1] "116_x_226_x_336_x_446_x_556"
    paste(1,2,3,4, collapse = "_x_")
    #> [1] "1 2 3 4"
    paste(1:2,1:2,2:3,3:4, collapse = "_x_")
    #> [1] "1 1 2 3_x_2 2 3 4"

    We think the only paste version that is not implemented in base R is an array version. At least we are not aware of sth. like row_paste or paste_apply etc.

26.4 S3

  1. Q: The most important S3 objects in base R are factors, data frames, difftimes, and date/times (Dates, POSIXct, POSIXlt). You’ve already seen the attributes and base type that factors are built on. What base types and attributes are the others built on?

    A: TODO: Add answer for difftime.

    data frame: Data frames are build up on (named) lists. Together with the row.names attribute and after setting the class to “data.frame”, we get a classical data frame

    df_build <- structure(list(1:2, 3:4),
                          names = c("a", "b"),
                          row.names = 1:2, 
                          class = "data.frame")
    
    df_classic <- data.frame(a = 1:2, b = 3:4)
    
    identical(df_build, df_classic)
    #> [1] TRUE

    date/times (Dates, POSIXct, POSIXlt): Date is just a double with the class attribute set to “Date”

    date_build <- structure(0, class = "Date")
    date_classic <- as.Date("1970-01-01")
    identical(date_build, date_classic)
    #> [1] TRUE

    POSIXct is a class for date/times that inherits from POSIXt and is built on doubles as well. The only attribute is tz (for timezone)

    POSIXct_build <- structure(1, class = c("POSIXct", "POSIXt"), tzone = "CET")
    POSIXct_classic <- .POSIXct(1, tz = "CET") # note that tz's default is NULL
    identical(POSIXct_build, POSIXct_classic)
    #> [1] TRUE

    POSIXlt is another date/time class that inherits from POSIXt. It is built on top of a named list and a tzone attribute. Differences between POSIXct and POSIXlt are described in ?DateTimeClasses.

    POSIXlt_build <- structure(list(sec = 30,
                                    min = 30L,
                                    hour = 14L,
                                    mday = 1L,
                                    mon = 0L,
                                    year = 70L,
                                    wday = 4L,
                                    yday = 0L,
                                    isdst = 0L,
                                    zone = "CET",
                                    gmtoff = 3600L),
                               tzone = c("", "CET", "CEST"),
                               class = c("POSIXlt", "POSIXt"))
    POSIXlt_classic <- as.POSIXlt(.POSIXct(13.5 * 3600 + 30))
    identical(POSIXlt_build, POSIXlt_classic)
    #> [1] FALSE
  2. Q: Draw a Venn diagram illustrating the relationships between functions, generics, and methods.

    A: Funtions don’t have to be generics or methods, but both the latter are functions. It is also possible that a function is both, a method and a generic, at the same time, which seems to be relatively awkward, so that also the author of the textbook doesn’t recommend it, see ?pryr::ftype

    This function figures out whether the input function is a regular/primitive/internal function, a internal/S3/S4 generic, or a S3/S4/RC method. This is function is slightly simplified as it’s possible for a method from one class to be a generic for another class, but that seems like such a bad idea that hopefully no one has done it.

  3. Q: Write a constructor for difftime objects. What base type are they built on? What attributes do they use? You’ll need to consult the documentation, read some code, and perform some experiments.

    A: Our constructor should be named new_class_name, have one argument for its base type and each attribute and check the base types of these arguments as well.

    new_difftime <- function(x, units = "auto") {
      stopifnot(is.double(x), is.character(units))
    
      structure(x, units = units, class = "difftime")
    }

    However, since the following result prints awkward

    new_difftime(3)
    #> Time difference of 3 auto

    we get a little bit more “inspiration” by the original difftime() function and make the regarding changes. Basically we need to implement logic for the units attribute, in case it is set to "auto" and convert the value of the underlying double from seconds to the regarding unit, as commented in the following

    new_difftime <- function(x, units = "auto") {
      stopifnot(is.double(x), is.character(units))
    
      # case units == "auto":
      if (units == "auto") 
        # when all time differences are NA, units should be "secs"
        units <- if (all(is.na(x))){
          "secs"
        } else {
          # otherwise set the units regarding to the minimal time difference
          x_min <- min(abs(x), na.rm = TRUE)
          if (!is.finite(x_min) || x_min < 60) {
            "secs"
          } else if (x_min < 3600) {
            "mins"
          } else if (x_min < 86400) {
            "hours"
          } else {
            "days"
          }
        }
    
      # we rescale the underlying double, according to the units
      x <- switch(units, 
                  secs = x,
                  mins = x/60,
                  hours = x/3600,
                  days = x/86400,
                  weeks = x/(7 * 86400))
    
      structure(x, units = units, class = "difftime")
    }
    
    # test
    new_difftime(c(NA, -3600, 86400))
    #> Time differences in hours
    #> [1] NA -1 24

26.4.1 Inheritance

  1. Q: The ordered class is a subclass of factor, but it’s implemented in a very ad hoc way in base R. Implement it in a principled way by building a constructor and providing vec_restore() method.

    f1 <- factor("a", c("a", "b"))
    as.factor(f1)  
    #> [1] a
    #> Levels: a b
    as.ordered(f1) # loses levels
    #> [1] a
    #> Levels: a

    A: TODO: the olad exercise text ended on “an as_ordered generic”. Check the answer if it needs to be updated.

    ordered is a subclass of factor, so we need to do the following

    • for factors: add a subclass argument to the constructor and helper
    • for ordered: add a constructor
    • write an as_ordered() generic with methods ordered, factor and default

    We use the factor constructor from the textbook and add the subclass argument

    new_factor <- function(x, levels, ..., subclass = NULL) {
      stopifnot(is.integer(x))
      stopifnot(is.character(levels))
    
      structure(
        x,
        levels = levels,
        class = c(subclass, "factor")
      )
    }

    We also use the validator for factors from the textbook

    validate_factor <- function(x) {
      values <- unclass(x)
      levels <- attr(x, "levels")
    
      if (!all(!is.na(values) & values > 0)) {
        stop(
          "All `x` values must be non-missing and greater than zero",
          call. = FALSE
        )
      }
    
      if (length(levels) < max(values)) {
        stop(
          "There must at least as many `levels` as possible values in `x`",
          call. = FALSE
        )
      }
    
      x
    }

    And we add the subclass argument for the helper from the textbook and the exercises

    factor <- function(x, levels = unique(x), ... , subclass = NULL) {
      ind <- match(x, levels)
    
      # error when values occur, which are not in the levels
      if(any(is.na(ind))){
        stop("The following values do not occur in the levels: ",
             paste(setdiff(x,levels), collapse = ", "), ".", 
             call. = FALSE)
        }
    
      validate_factor(new_factor(ind, levels, subclass = subclass))
    }

    A constructor for ordered is already implemented in the sloop package:

    new_ordered <- function (x, levels) {
      stopifnot(is.integer(x))
      stopifnot(is.character(levels))
      structure(x, levels = levels, class = c("ordered", "factor"))
    }

    The implementation of the generic and the first two methods is straight forward

    as_ordered <- function(x, ...) {
      UseMethod("as_ordered")
    }
    
    as_ordered.ordered <- function(x, ...) x
    as_ordered.default <- function(x, ...) {
      stop(
        "Don't know how to coerce object of class ", 
        paste(class(x), collapse = "/"), " into an ordered factor", 
        call. = FALSE
      )
    }

    For the factor method of as_ordered() we use the factor helper, since it saves us some typing:

    as_ordered.factor <- function(x, ...) {
      factor(x, attr(x, "levels"), subclass = "ordered")
    }

    Finally, our new method preserves all levels:

    as_ordered(f1)
    #> [1] a
    #> Levels: a < b

    For a real scenario, we might want to add an as_factor.ordered() method to the as_factor() generic from the textbook.

26.5 S4

26.5.1 Generics and methods

  1. Q: What’s the difference between the generics generated by these two calls?

    setGeneric("myGeneric", function(x) standardGeneric("myGeneric"))
    setGeneric("myGeneric", function(x) {
      standardGeneric("myGeneric")
    })

    A: The first call defines a standard generic and the second one creates a nonstandard generic. One can confirm this directly whlie printing (showing in S4 jargon) the function.

    setGeneric("myGeneric", function(x) standardGeneric("myGeneric"))
    #> [1] "myGeneric"
    myGeneric
    #> standardGeneric for "myGeneric" defined from package ".GlobalEnv"
    #> 
    #> function (x) 
    #> standardGeneric("myGeneric")
    #> <environment: 0x5cb0cc0>
    #> Methods may be defined for arguments: x
    #> Use  showMethods("myGeneric")  for currently available ones.
    
    setGeneric("myGeneric", function(x) {
      standardGeneric("myGeneric")
    })
    #> [1] "myGeneric"
    myGeneric
    #> nonstandardGenericFunction for "myGeneric" defined from package ".GlobalEnv"
    #> 
    #> function (x) 
    #> {
    #>     standardGeneric("myGeneric")
    #> }
    #> <environment: 0x5deec20>
    #> Methods may be defined for arguments: x
    #> Use  showMethods("myGeneric")  for currently available ones.

26.6 Expressions

26.6.1 Abstract syntax trees

  1. Q: Use ast() and experimentation to figure out the three arguments to an if() call. What would you call them? Which arguments are required and which are optional?

    A: You can write an if() statement in several ways: with or without else, formatted or in one line and also in prefix notation. Here are several versions focussing on the possibility of leaving out curly brackets.

    lobstr::ast(if (TRUE) {} else {})
    #> █─`if` 
    #> ├─TRUE 
    #> ├─█─`{` 
    #> └─█─`{`
    lobstr::ast(if (TRUE) 1 else 2)
    #> █─`if` 
    #> ├─TRUE 
    #> ├─1 
    #> └─2
    lobstr::ast(`if`(TRUE, 1, 2))
    #> █─`if` 
    #> ├─TRUE 
    #> ├─1 
    #> └─2

    One possible way of naming the arguments would be: condition (1), conclusion (2), alternative (3).

    The condition is always required. If the condition is TRUE, also the conclusion is required. If the condition is FALSE and if() is called in combination with else(), then also the alternative is required.

  2. Q: What are the arguments to the for() and while() calls?

    A: for() requires an index (called var in the docs), a sequence and an expression, for example

    `for`(i, 1:3, {print(i)})
    #> [1] 1
    #> [1] 2
    #> [1] 3

    while() requires a condition and an expression. Again, an example in prefix notation:

    set.seed(123)
    `while`((i <- rnorm(1)) < 1, {print(i)})
    #> [1] -0.56
    #> [1] -0.23
    i
    #> [1] 1.56

    Note that a minimal expression can consist of { only.

  3. Q: Two arithmetic operators can be used in both prefix and infix style. What are they?

    A: I am not sure how this is meant to be. Theoretically every arithmetic operator can be written in prefix notation via backticks. On the other hand, + and - seem to be the only ones, which can be written in infix notation without backticks.

    x <- 1
    
    +(x)
    #> [1] 1
    -(x)
    #> [1] -1

    However, when we look more closely, the call tree is not what we would expect from a prefix function

    lobstr::ast(+(x))  
    #> █─`+` 
    #> └─█─`(` 
    #>   └─x
    lobstr::ast(-(x))  
    #> █─`-` 
    #> └─█─`(` 
    #>   └─x

    So maybe it is meant to look like this…

    lobstr::ast(+x)
    #> █─`+` 
    #> └─x
    lobstr::ast(-x)
    #> █─`-` 
    #> └─x

    Of course also this doesn’t make too much sense, since in ?Syntax one can read, that R clearly differentiates between unary and binary + and - operators and a unary operator is not really what we mean, when we speak about infix operators.

    However, if we don’t differentiate in this way, this is probably the solution, since it’s obviously also an infix function:

    lobstr::ast(x + y)
    #> █─`+` 
    #> ├─x 
    #> └─y
    lobstr::ast(x - y)
    #> █─`-` 
    #> ├─x 
    #> └─y

26.7 Quasiquotation (new)

  1. Q: Why does as.Date.default() use substitute() and deparse()? Why does pairwise.t.test() use them? Read the source code.

    A: as.Date.default() uses them to convert unexpected input expressions (neither dates, nor NAs) into a character string and return it within an error message.

    pairwise.t.test() uses them to convert the names of its datainputs (response vector x and grouping factor g) into character strings to format these further into a part of the desired output.

  2. Q: pairwise.t.test() assumes that deparse() always returns a length one character vector. Can you construct an input that violates this expectation? What happens?

    A: We can pass an expression to one of pairwise.t.test()’s data input arguments, which exceeds the default cutoff width in deparse(). The expression will be split into a character vector of length greater 1. The deparsed data inputs are directly pasted (read the source code!) with “and” as separator and the result is just used to be displayed in the output. Just the data.name output will change (it will include more than one “and”).

    d=1
    pairwise.t.test(2, d+d+d+d+d+d+d+d+d+d+d+d+d+d+d+d+d)
    #> 
    #>  Pairwise comparisons using t tests with pooled SD 
    #> 
    #> data:  2 and d + d + d + d + d + d + d + d + d + d + d + d + d + d + d + d +  2 and     d 
    #> 
    #> <0 x 0 matrix>
    #> 
    #> P value adjustment method: holm

26.8 FO

26.8.1 Behavioural FOs

  1. Q: What does the following function do? What would be a good name for it?

    f <- function(g) {
      force(g)
      result <- NULL
      function(...) {
        if (is.null(result)) {
          result <<- g(...)
        }
        result
      }
    }
    runif2 <- f(runif)
    runif2(5)
    #> [1] 0.528 0.892 0.551 0.457 0.957
    runif2(10)
    #> [1] 0.528 0.892 0.551 0.457 0.957

    A: It returns a new version of the inputfunction. That version will always return the result of it’s first run (in case this not NULL), no matter how the input changes. Good names could be first_run() or initial_return().

  2. Q: Modify delay_by() so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called g <- delay_by(1, f); g(); Sys.sleep(2); g() there shouldn’t be an extra delay.

    A:
  3. Q: Write wait_until() which delays execution until a specific time.

    A:

    wait_until <- function(time, f) {
      force(f)
      function(...) {
        while (Sys.time() < time) {}
        return(f(...))
      }
    }
    
    # a little test
    ptm <- proc.time()
    m <- wait_until(Sys.time() + 10, mean)
    m(1:3)
    proc.time() - ptm
  4. Q: There are three places we could have added a memoise call: why did we choose the one we did?

    download <- memoise(dot_every(10, delay_by(1, download_file)))
    download <- dot_every(10, memoise(delay_by(1, download_file)))
    download <- dot_every(10, delay_by(1, memoise(download_file)))

    A: The second was chosen. It’s easy to see why, if we eliminate the other two options:

    • The first version only prints a dot at every tenth download() call with a new input. This is because dot_every() is inside of memoise() and the counter created by dot_every() is not “activated” if the input is known.

    • The third version takes one second for every call. Even if we already know the result and don’t download anything again.

  5. Q: Why is the remember() function inefficient? How could you implement it in more efficient way?

  6. Q: Why does the following code, from stackoverflow, not do what you expect?

    # return a linear function with slope a and intercept b.
    f <- function(a, b) function(x) a * x + b
    
    # create a list of functions with different parameters.
    fs <- Map(f, a = c(0, 1), b = c(0, 1))
    
    fs[[1]](3)
    #> [1] 0
    # should return 0 * 3 + 0 = 0

    How can you modify f so that it works correctly?

    A: You can read in the stackoverflow link that the question arose, because the original return of fs[[1]](3) was 4, which is due to lazy evaluation and could be solved by two users via force():

    f <- function(a, b) {force(a); force(b); function(x) a * x + b}

    However you can see in the result within the question that R’s behaviour was changed in this case and as Jan Kislinger points out on twitter:

    The real question should be: “How did they modify #rstats so that it works correctly?” otherwise it’s a tricky question :D

    Note that the same issue appears in the textbook:

    In the following example, we take a list of functions and delay each one. But when we try to evaluate the mean, we get the sum instead.

    funs <- list(mean = mean, sum = sum)
    funs_m <- lapply(funs, delay_by, delay = 0.1)
    
    funs_m$mean(1:10)
    #> [1] 5.5

    Which (as one can see) is not true anymore…actually it changed in R version 3.2:

    Higher order functions such as the apply functions and Reduce() now force arguments to the functions they apply in order to eliminate undesirable interactions between lazy evaluation and variable capture in closures. This resolves PR#16093.

    For further interested: PR#16093 will lead you to the subject “iterated lapply” within the R-devel Archives. Note that the behaviour in for loops is still as “the old lapply()” behaviour.

26.8.2 Output FOs

  1. Q: Create a negative() FO that flips the sign of the output of the function to which it is applied.

    A:

    negative <- function(f){
      force(f)
      function(...){
        -f(...)
      }
    }
  2. Q: The evaluate package makes it easy to capture all the outputs (results, text, messages, warnings, errors, and plots) from an expression. Create a function like capture_it() that also captures the warnings and errors generated by a function.

    A: One way is just to capture the output of tryCatch() with identity handlers for errors and warnings:

    capture_trials <- function(f){
      force(f)
      function(...){
        capture.output(tryCatch(f(...),
                                error = function(e) e,
                                warning = function(w) w)
        )
      }
    }
    
    # we test the behaviour
    log_t <- capture_trials(log)
    elements <- list(1:10, c(-1, 10), c(TRUE, FALSE), letters)
    results <- lapply(elements, function(x) log_t(x))
    results
    #> [[1]]
    #> [1] " [1] 0.000 0.693 1.099 1.386 1.609 1.792 1.946 2.079 2.197 2.303"
    #> 
    #> [[2]]
    #> [1] "<simpleWarning in f(...): NaNs produced>"
    #> 
    #> [[3]]
    #> [1] "[1]    0 -Inf"
    #> 
    #> [[4]]
    #> [1] "<simpleError in f(...): non-numeric argument to mathematical function>"
    
    # further
    # results_detailed <- lapply(elements, function(x) lapply(x, function(y))log2(x))
    # results_detailed
  3. Q: Create a FO that tracks files created or deleted in the working directory (Hint: use dir() and setdiff().) What other global effects of functions might you want to track?

    A:

26.8.3 Input FOs

  1. Q: Our previous download() function only downloads a single file. How can you use partial() and lapply() to create a function that downloads multiple files at once? What are the pros and cons of using partial() vs. writing a function by hand?

  2. Q: Read the source code for plyr::colwise(). How does the code work? What are colwise()’s three main tasks? How could you make colwise() simpler by implementing each task as a function operator? (Hint: think about partial().)

    A: We describe how it works by commenting the source code:

    function (.fun, .cols = true, ...) 
      {
      # We check if .cols is not a function, since it is possible to supply a
      # predicate function.
      # if so, the .cols arguments will be "quoted", and filter() will 
      # be a function that checks and evaluates these .cols within its other argument
      if (!is.function(.cols)) {
        .cols <- as.quoted(.cols)
        filter <- function(df) eval.quoted(.cols, df)
      }
      # otherwise, filter will be be Filter(), which applies the function 
      # in .cols to every element of its other argument
      else {
        filter <- function(df) Filter(.cols, df)
      }
      # the ... arguments are caught in the list dots
      dots <- list(...)
      # a function is created, which will also be the return value.
      # it checks if its input is a data frame
      function(df, ...) {
        stopifnot(is.data.frame(df))
        # if df is split (in "plyr" speaking), this will be taken into account...
        df <- strip_splits(df)
        # now the columns of the data frame are chosen, depending on the input of .cols
        # this can chosen directly, via a predicate function, or all columns (default)
        filtered <- filter(df)
        # if this means, that no columns are selected, an empty data frame will be returned
        if (length(filtered) == 0) 
          return(data.frame())
        # otherwise lapply will be called on all filtered columns, with 
        # the .fun argument, which has to be provided by the user, and some other
        # arguments provided by the user, when calling the function (...) and
        # when defining the function (dots)
        out <- do.call("lapply", c(list(filtered, .fun, ...), 
                               dots))
        # the output will be named and converted from list into a data frame again
        names(out) <- names(filtered)
        quickdf(out)
      }
    }
    
    <environment: namespace:plyr>
  3. Q: Write FOs that convert a function to return a matrix instead of a data frame, or a data frame instead of a matrix. If you understand S3, call them as.data.frame.function() and as.matrix.function().

    A:

    as.matrix.function <- function(f){
      force(f)
      function(...){
        as.matrix(f(...))
      }
    }
    
    as.data.frame.function <- function(f){
      force(f)
      function(...){
        as.data.frame(f(...))
      }
    }
  4. Q: You’ve seen five functions that modify a function to change its output from one form to another. What are they? Draw a table of the various combinations of types of outputs: what should go in the rows and what should go in the columns? What function operators might you want to write to fill in the missing cells? Come up with example use cases.

  5. Q: Look at all the examples of using an anonymous function to partially apply a function in this and the previous chapter. Replace the anonymous function with partial(). What do you think of the result? Is it easier or harder to read?

    A: The results are easy to read. Especially the Map() examples profit in readability:

    library(pryr)
    #> Registered S3 method overwritten by 'pryr':
    #>   method      from
    #>   print.bytes Rcpp
    #> 
    #> Attaching package: 'pryr'
    #> The following object is masked _by_ '.GlobalEnv':
    #> 
    #>     f
    ## From Functionals
    # 1
    trims <- c(0, 0.1, 0.2, 0.5)
    x <- rcauchy(1000)
    unlist(lapply(trims, function(trim) mean(x, trim = trim)))
    #> [1] -0.00498  0.05088  0.03304  0.02733
    unlist(lapply(trims, partial(mean, x)))
    #> [1] -0.00498  0.05088  0.03304  0.02733
    
    # 2
    xs <- replicate(5, runif(10), simplify = FALSE)
    ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE)
    unlist(Map(function(x, w) weighted.mean(x, w, na.rm = TRUE), xs, ws))
    #> [1] 0.453 0.521 0.500 0.443 0.525
    unlist(Map(partial(weighted.mean, na.rm = TRUE), xs, ws))
    #> [1] 0.453 0.521 0.500 0.443 0.525
    
    # 3
    add <- function(x, y, na.rm = FALSE) {
      if (na.rm && (is.na(x) || is.na(y))) rm_na(x, y, 0) else x + y
    }
    
    r_add <- function(xs, na.rm = TRUE) {
      Reduce(function(x, y) add(x, y, na.rm = na.rm), xs)
    }
    
    r_add_compact <- function(xs, na.rm = TRUE) {
      Reduce(partial(add, na.rm = na.rm), xs)
    }
    
    r_add(1:4)
    #> [1] 10
    r_add_compact(1:4)
    #> [1] 10
    
    # 4
    v_add1 <- function(x, y, na.rm = FALSE) {
      stopifnot(length(x) == length(y), is.numeric(x), is.numeric(y))
      if (length(x) == 0) return(numeric())
      simplify2array(
        Map(function(x, y) add(x, y, na.rm = na.rm), x, y)
      )
    }
    
    v_add1_compact <- function(x, y, na.rm = FALSE) {
      stopifnot(length(x) == length(y), is.numeric(x), is.numeric(y))
      if (length(x) == 0) return(numeric())
      simplify2array(
        Map(partial(add, na.rm = na.rm), x, y)
      )
    }
    
    v_add1(1:3, 2:4)
    #> [1] 3 5 7
    v_add1_compact(1:3, 2:4)
    #> [1] 3 5 7
    
    # 5
    c_add <- function(xs, na.rm = FALSE) {
      Reduce(function(x, y) add(x, y, na.rm = na.rm), xs,
             accumulate = TRUE)
    }
    
    c_add_compact <- function(xs, na.rm = FALSE) {
      Reduce(partial(add, na.rm = na.rm), xs,
             accumulate = TRUE)
    }
    
    c_add(1:3)
    #> [1] 1 3 6
    c_add_compact(1:3)
    #> [1] 1 3 6
    
    ## From Function operators
    # 6
    f <- function(x) x ^ 2
    partial(f)
    #> function (...) 
    #> f(...)
    
    # 7
    # Map(function(x, y) f(x, y, zs), xs, ys)
    # Map(partial(f, zs = zs), xs, yz)
    
    # 8
    # f <- function(a) g(a, b = 1)
    # f <- partial(g, b = 1)
    
    # 9
    compact <- function(x) Filter(Negate(is.null), x)
    compact <- partial(Filter, Negate(is.null))
    
    # 10
    # Map(function(x, y) f(x, y, zs), xs, ys)
    # Map(partial(f, zs = zs), xs, ys)
    
    # 11
    funs2 <- list(
      sum = function(...) sum(..., na.rm = TRUE),
      mean = function(...) mean(..., na.rm = TRUE),
      median = function(...) median(..., na.rm = TRUE)
    )
    
    funs2 <- list(
      sum = partial(sum, na.rm = TRUE),
      mean = partial(mean, na.rm = TRUE),
      median = partial(median, na.rm = TRUE)
    )

26.8.4 Combining FOs

  1. Q: Implement your own version of compose() using Reduce and %o%. For bonus points, do it without calling function.

    A: We use the definition from the textbook:

    compose <- function(f, g) {
      function(...) f(g(...))
    }
    
    "%o%" <- compose

    And then we build two versions. One via an anonymous function and one via partial():

    compose_red <- function(fs) {
      Reduce(function(f, g) function(...) f(g(...)), fs)
    }
    compose_red(c(mean, length, unique))(1:10)
    #> [1] 10
    
    compose_red_bonus <- function(fs) {
      Reduce(partial(partial(`%o%`)), fs)
    }
    compose_red_bonus(c(mean, length, unique))(1:10)
    #> [1] 10
  2. Q: Extend and() and or() to deal with any number of input functions. Can you do it with Reduce()? Can you keep them lazy (e.g., for and(), the function returns once it sees the first FALSE)?

    A: We use and() and or() as defined in the textbook. They are lazy, since they are build up on && and ||. Also their reduced versions stay lazy, as we will show at the end of the code

    and <- function(f1, f2) {
      force(f1); force(f2)
      function(...) {
        f1(...) && f2(...)
      }
    }
    
    and_red <- function(fs){
      Reduce(function(f, g) and(f, g), fs)
    }
    
    or <- function(f1, f2) {
      force(f1); force(f2)
      function(...) {
        f1(...) || f2(...)
      }
    }
    
    or_red <- function(fs){
      Reduce(function(f, g) or(f, g), fs)
    }
    
    # Errors before the first TRUE will be returned
    tryCatch(
      or_red(c(is.logical, is.logical, stop, is.character))("a"), 
      error = function(e) e
    )
    #> <simpleError in f1(...): a>
    
    # Errors after the first TRUE won't be returned
    or_red(c(is.logical, is.logical, is.character, stop))("a")
    #> [1] TRUE
  3. Q: Implement the xor() binary operator. Implement it using the existing xor() function. Implement it as a combination of and() and or(). What are the advantages and disadvantages of each approach? Also think about what you’ll call the resulting function to avoid a clash with the existing xor() function, and how you might change the names of and(), not(), and or() to keep them consistent.

    A: Both versions are implemented straight forward, as also the reduced versions. However, the parallel versions need a little bit more care:

    xor_fb1 <- function(f1, f2){
      force(f1); force(f2)
      function(...){
        xor(f1(...), f2(...)) 
      }
    }
    
    xor_fb2 <- function(f1, f2){
      force(f1); force(f2)
      function(...){
        or(f1, f2)(...) && !(and(f1, f2)(...))
      }
    }
    
    # binary combination
    xor_fb1(is.logical, is.character)("a")
    #> [1] TRUE
    xor_fb2(is.logical, is.character)("a")
    #> [1] TRUE
    
    # parallel combination (results in an error)
    xor_fb1(c(is.logical, is.character), c(is.logical, is.character))("a")
    #> Error in f1(...): could not find function "f1"
    xor_fb2(c(is.logical, is.character), c(is.logical, is.character))("a")
    #> Error in f1(...): could not find function "f1"
    
    # reduced combination (results in an error)
    xor_fb1(c(is.logical, is.character, is.logical, is.character))("a")
    #> Error in force(f2): argument "f2" is missing, with no default
    xor_fb2(c(is.logical, is.character, is.logical, is.character))("a")
    #> Error in force(f2): argument "f2" is missing, with no default
    
    ### Reduced version
    xor_fb1_red <- function(fs){
      Reduce(function(f, g) xor_fb1(f, g), fs)
    }
    
    xor_fb2_red <- function(fs){
      Reduce(function(f, g) xor_fb2(f, g), fs)
    }
    
    # should return TRUE
    xor_fb1_red(c(is.logical, is.character, is.logical, is.character))("a")
    #> [1] FALSE
    xor_fb2_red(c(is.logical, is.character, is.logical, is.character))("a")
    #> [1] FALSE
    
    # should return FALSE
    xor_fb1_red(c(is.logical, is.logical, is.character, is.logical))("a")
    #> [1] TRUE
    xor_fb2_red(c(is.logical, is.logical, is.character, is.logical))("a")
    #> [1] TRUE
    
    # should return FALSE
    xor_fb1_red(c(is.logical, is.logical, is.character, is.character))("a")
    #> [1] FALSE
    xor_fb2_red(c(is.logical, is.logical, is.character, is.character))("a")
    #> [1] FALSE
  4. Q: Above, we implemented boolean algebra for functions that return a logical function. Implement elementary algebra (plus(), minus(), multiply(), divide(), exponentiate(), log()) for functions that return numeric vectors.

    A:

    plus <- function(f1, f2) {
      force(f1); force(f2)
      function(...) {
        f1(...) + f2(...)
      }
    }
    
    minus <- function(f1, f2) {
      force(f1); force(f2)
      function(...) {
        f1(...) - f2(...)
      }
    }
    
    multiply <- function(f1, f2) {
      force(f1); force(f2)
      function(...) {
        f1(...) * f2(...)
      }
    }
    
    divide <- function(f1, f2) {
      force(f1); force(f2)
      function(...) {
        f1(...) / f2(...)
      }
    }
    
    exponentiate <- function(f1, f2) {
      force(f1); force(f2)
      function(...) {
        f1(...) ^ f2(...)
      }
    }
    
    # we rename log to log_ since log() already exists
    log_ <- function(f1, f2) {
      force(f1); force(f2)
      function(...) {
        log(f1(...), f2(...))
      }
    }
    
    # Test
    mns <- minus(mean, function(x) x^2)
    mns(1:5)

26.9 Expressions (again)

26.9.1 Data structures

  1. Q: How is rlang::maybe_missing() implemented? Why does it work?

    A: Let us take a look at the functions source code to see what’s going on

    lang::maybe_missing
    function (x) 
      {
      # is_missing checks if one of the following is TRUE
      # 1. check via substitute if typeof(x) is symbol and missing(x) is TRUE
      # 2. check if x identical to missing_arg()
      if (is_missing(x)) {
        missing_arg()  # returns missing argument
                       # implemented in lower level code -> .Call())
      }
      else {
        x  # when it's not missing, x is simply returned
      }
    }
    <bytecode: 0x00000000195ed740>
    <environment: namespace:rlang>

    First it is checked if the argument is missing. If so, the missing arg is returned, otherwise the argument (x) itsself is returned.

26.9.2 Parsing and deparsing

  1. Q: Why does as.Date.default() use substitute() and deparse()? Why does pairwise.t.test() use them? Read the source code.

    A:

26.9.3 R’s grammar

  1. Q: deparse() produces vectors when the input is long. For example, the following call produces a vector of length two:

    expr <- rlang::expr(g(a + b + c + d + e + f + g + h + i + j + k + l + m +
      n + o + p + q + r + s + t + u + v + w + x + y + z))
    
    deparse(expr)
    #> [1] "g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + "
    #> [2] "    p + q + r + s + t + u + v + w + x + y + z)"

    What do expr_text(), expr_name(), and expr_label() do with this input?

    A:

    • expr_text() pastes the output string into one and inserts \n (new line identifiers) as separators
    cat(rlang::expr_text(expr)) # cat is used for printing with linebreak
    #> g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + 
    #>     p + q + r + s + t + u + v + w + x + y + z)
    • expr_name() recreates the call into the form f(…) and deparses this expression into a string
    rlang::expr_name(expr)
    #> [1] "g(...)"
    • expr_label() does the same as expr_name(), but surrounds the output also with backticks
    rlang::expr_label(expr)
    #> [1] "`g(...)`"