6 Functions

6.1 Function fundamentals

1. Q: Given a name, like "mean", match.fun() lets you find a function. Given a function, can you find its name? Why doesn’t that make sense in R?

A: A name can only point to a single object, but an object can be pointed to by 0, 1, or many names. What are names of the functions in the following block?

function(x) sd(x) / mean(x)
#> function(x) sd(x) / mean(x)

f1 <- function(x) (x - min(x)) / (max(x) - min(x))
f2 <- f1
f3 <- f1
2. Q: It’s possible (although typically not useful) to call an anonymous function. Which of the two approaches below is correct? Why?

function(x) 3()
#> function(x) 3()
(function(x) 3)()
#> [1] 3

A: The second approach is correct.

The anonymous function function(x) 3 is surrounded by a pair of parentheses before it is called by (). These extra parentheses separate the function call from the anonymous functions body. Without these a function with the invalid body 3() is returned, which throws an error when we call it. This is easier to see if we name the function:

f <- function(x) 3()
f
#> function(x) 3()
f()
#> Error in f(): attempt to apply non-function
3. Q: A good rule of thumb is that an anonymous function should fit on one line and shouldn’t need to use {}. Review your code. Where could you have used an anonymous function instead of a named function? Where should you have used a named function instead of an anonymous function?

A: The use of anonymous functions allows concise and elegant code in certain situations. However, they miss a descriptive name and when re-reading the code it can take a while to figure out what they do (even it it’s future you reading). That’s why it’s helpful to give long and complex functions a descriptive name. It may be worthwhile to take a look at your own projects or other peoples code to reflect on this part of your coding style.

1. Q: What function allows you to tell if an object is a function? What function allows you to tell if a function is a primitive function?

A: Use is.function() to test, if an object is a function. You may also consider is.primitive() to test specifically for primitive functions.

2. Q: This code makes a list of all functions in the base package.

objs <- mget(ls("package:base", all = TRUE), inherits = TRUE)
funs <- Filter(is.function, objs)

Use it to answer the following questions:

1. Which base function has the most arguments?

2. How many base functions have no arguments? What’s special about those functions?

3. How could you adapt the code to find all primitive functions?

A: Let’s look at each sub-question separately:

1. To find the function with the most arguments, we first compute the length of formals()

library(purrr)

n_args <- funs %>%
map(formals) %>%
map_int(length)

Then use table() to see the distribution, and [ to find the largest:

table(n_args)
#> n_args
#>   0   1   2   3   4   5   6   7   8   9  10  11  13  15  16  22
#> 247 223 361 195 118  83  36  14  12   3   2   1   3   1   2   1
names(n_args)[n_args == 22]
#> [1] "scan"
1. We can also use n_args to find the number functions with no arguments:

sum(n_args == 0)
#> [1] 247

However, this over counts because formals() returns NULL for primitive functions, and length(NULL) is 0. To fix that we can first remove the primitive functions

n_args2 <- funs %>%
map(formals) %>%
map_int(length)

sum(n_args2 == 0)
#> [1] 47

Indeed, most of functions with no arguments are actually primitive functions.

2. To find all primitive functions, we can change the predicate in Filter() from is.function() to is.primitive():

funs <- Filter(is.primitive, objs)
length(funs)
3. Q: What are the three important components of a function?

A: These components are the function’s body(), formals() and environment(). However, as mentioned in the textbook:

There is one exception to the rule that functions have three components. Primitive functions, like sum(), call C code directly with .Primitive() and contain no R code. Therefore their formals(), body(), and environment() are all NULL.

4. Q: When does printing a function not show what environment it was created in?

A: Primitive functions and functions created in the global environment do not print their environment.

6.2 Lexical Scoping

1. Q: What does the following code return? Why? Describe how each of the three c’s is interpreted.

c <- 10
c(c = c)

A: This code returns a named numeric vector of length one - with one element of the value 10 and the name "c". The first c represents the c() function, the second c is interpreted as a (quoted) name and the third c as a value.

2. Q: What are the four principles that govern how R looks for values?

A: R’s lexical scoping rules are based on these four principles:

• Functions vs. variables
• A fresh start
• Dynamic lookup
3. Q: What does the following function return? Make a prediction before running the code yourself.

f <- function(x) {
f <- function(x) {
f <- function(x) {
x ^ 2
}
f(x) + 1
}
f(x) * 2
}
f(10)

A: Within this function two more functions also named f() are defined and called. Because the functions are each executed in their own environment R will look up and use the functions defined in these environments. The innermost f() is called last, though it is the first function to return a value. Because of this the order of the calculation passes “from the inside to the outside” and the function returns ((10 ^ 2) + 1) * 2, i.e. 202.

6.3 Lazy evaluation

1. Q: What important property of && makes x_ok() work?

x_ok <- function(x) {
!is.null(x) && length(x) == 1 && x > 0
}

x_ok(NULL)
#> [1] FALSE
x_ok(1)
#> [1] TRUE
x_ok(1:3)
#> [1] FALSE

What is different with this code? Why is this behaviour undesirable here?

x_ok <- function(x) {
!is.null(x) & length(x) == 1 & x > 0
}

x_ok(NULL)
#> logical(0)
x_ok(1)
#> [1] TRUE
x_ok(1:3)
#> [1] FALSE FALSE FALSE

A:

We expect x_ok() to validate its input via certain criteria: it must not be NULL, but have length 1 and a value greater than 0. Meaningful outcomes for this assertion will be TRUE, FALSE or NA.

The desired behaviour is reached by combining the assertions through && instead of &. && does not perform elementwise comparisons, instead it uses the first element of each value only. It also uses lazy evaluation, in the sense that evaluation “proceeds only until the result is determined” (from ?Logic).

For some situations (x = 1) both operators will lead to the same result. But this is not always the case. For x = NULL, the &&-operator will stop after the !is.null-statement and return the result. The following conditions won’t even be evaluated! (If the other conditions are also evaluated (by the use of &), the outcome would change. NULL > 0 returns logical(0), which is not a helpful in this case.)

We can also see the difference in behaviour, when we set x = 1:3. The &&-operator returns the result from length(x) == 1, which is FALSE. Using & as the logical operator leads to the (vectorised) x > 0 condition being be evaluated and also returned.

2. Q: What does this function return? Why? Which principle does it illustrate?

f2 <- function(x = z) {
z <- 100
x
}
f2()

A: The function returns 100. The default arguments are evaluated in the function environment. Because of lazy evaluation these arguments are not evaluated before they are accessed. At the time x is accessed z has already been bound to the value 100.

3. Q: What does this function return? Why? Which principle does it illustrate?

y <- 10
f1 <- function(x = {y <- 1; 2}, y = 0) {
c(x, y)
}
f1()
y

A: The function returns c(2, 1). This is due to name masking. When x is accessed within c(), the promise x = {y <- 1; 2} is evaluated inside f1()’s environment. y is bound to the value 1 and the return value of {() (2) is assigned to x. When y is accessed within c(), it has already the value 1 and R doesn’t need to look it up any further. Therefore, the promise y = 0 won’t be evaluated. Also, because y is assigned within f1()’s environment, the value of the global variable y is left untouched.

4. Q: In hist(), the default value of xlim is range(breaks), the default value for breaks is "Sturges", and

range("Sturges")
#> [1] "Sturges" "Sturges"

Explain how hist() works to get a correct xlim value.

A: The xlim argument of hist() defines the range of the histogram’s x-axis. In order to provide a valid axis xlim must contain a numeric vector of exactly two unique values. Consequently for the default xlim = range(breaks)), breaks must evaluate to a vector with at least two unique values.

During execution hist() overwrites the breaks argument. The breaks argument is quite flexible and allows the users to provide the breakpoints directly or compute them in several ways. Therefore the specific behaviour depends highly on the input. But hist ensures that breaks evaluates to a numeric vector containing at least two unique elements before xlim is computed.

5. Q: Explain why this function works. Why is it confusing?

show_time <- function(x = stop("Error!")) {
stop <- function(...) Sys.time()
print(x)
}
show_time()
#> [1] "2019-05-19 14:54:27 UTC"

A: Before show_time() accesses x (default stop("Error")), the stop() function is masked by function(...) Sys.time(). Because default arguments are evaluated in the function environment, print(x) will be evaluated as print(Sys.time()).

This function is confusing, because its behaviour changes when x’s value is supplied directly. Now the value from the calling environment will be used and the overwriting of stop won’t affect the outcome any more.

show_time(x = stop("Error!"))
#> Error in print(x): Error!
6. Q: How many arguments are required when calling library()?

A: library() doesn’t require any arguments. When called without arguments library() (invisibly) returns a list of class “libraryIQR”, which contains a results matrix with one row and three columns per installed package. These columns contain entries for the name of the package (“Package”), the path to the package (“LibPath”) and the title of the package (“Title”). library() also has its own print method (print.libraryIQR), which displays this information conveniently in its own window.

This behaviour is also documented under the details section of the help page for ?library:

If library is called with no package or help argument, it lists all available packages in the libraries specified by lib.loc, and returns the corresponding information in an object of class “libraryIQR”. (The structure of this class may change in future versions.) Use .packages(all = TRUE) to obtain just the names of all available packages, and installed.packages() for even more information.

Because the package and help argument from library() do not show a default value, it’s easy to overlook the possibility to call library() without these arguments. (Instead of providing NULLs as default values library() uses missing() to check if these arguments were provided.)

str(formals(library))
#> Dotted pair list of 13
#>  $package : symbol #>$ help           : symbol
#>  $pos : num 2 #>$ lib.loc        : NULL
#>  $character.only : logi FALSE #>$ logical.return : logi FALSE
#>  $warn.conflicts : symbol #>$ quietly        : logi FALSE
#>  $verbose : language getOption("verbose") #>$ mask.ok        : symbol
#>  $exclude : symbol #>$ include.only   : symbol
#>  $attach.required: language missing(include.only) 6.4... (dot-dot-dot) 1. Q: Explain the following results: sum(1, 2, 3) #> [1] 6 mean(1, 2, 3) #> [1] 1 sum(1, 2, 3, na.omit = TRUE) #> [1] 7 mean(1, 2, 3, na.omit = TRUE) #> [1] 1 A: Let’s inspect the arguments and their order for both functions. For sum() these are ... and na.rm: str(sum) #> function (..., na.rm = FALSE) For the ... argument sum() expects numeric, complex or logical vector input (see ?sum). Unfortunately, when ... is used, misspelled arguments (!) like na.omit won’t raise an error (When no further input checks are implemented). So instead, na.omit is treated as a logical and becomes part of the ... argument. It will be coerced to 1 and be part of the sum. All other arguments are left unchanged. Therefore sum(1, 2, 3) returns 6 and sum(1, 2, 3, na.omit = TRUE) returns 7. In contrast, the generic function mean() expects x, trim, na.rm and ... for its default method. str(mean.default) #> function (x, trim = 0, na.rm = FALSE, ...) Because na.omit is not one of mean()’s named arguments (and also not a candidate for partial matching), na.omit again becomes part of the ... argument. The other supplied objects are matched by their order, i.e.: x = 1, trim = 2 and na.rm = 3. Because x is of length 1 and not NA, the settings of trim and na.rm do not affect the calculation of the mean. Both calls (mean(1, 2, 3) and mean(1, 2, 3, na.omit = TRUE)) return 1. 1. Q: In the following call, explain how to find the documentation for the named arguments in the following function call: plot(1:10, col = "red", pch = 20, xlab = "x", col.lab = "blue") A: First we type ?plot in the console and check the “Usage” section: plot(x, y, ...) The arguments we want to learn more about are part of the ... argument. We can find information for xlab and follow the recommendation to visit ?par for the other arguments. Here we type “col” into the search bar, which leads us the section “Color Specification”. We also search for the pch argument, which leads to the recommendation to check ?points. Finally col.lab is also directly documented within ?par. 2. Q: Why does plot(1:10, col = "red") only colour the points, not the axes or labels? Read the source code of plot.default() to find out. A: To learn about the internals of plot.default() we add browser() to the first line of the code and interactively run plot(1:10, col = "red"). This way we can see how the plot is build and learn where the axis are added. This leads us to the function call localTitle(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) The localTitle() function was defined in the first lines of plot.default() as: localTitle <- function(..., col, bg, pch, cex, lty, lwd) title(...) The call to localTitle() will be passed the col parameter as part of ... argument. ?title tells us that the title() function specifies four parts of the plot: Main (title of the plot), sub (sub-title of the plot) and both axis labels. Because of this it would introduce ambiguity inside title() to use col directly. Instead one has the option to supply col via the ... argument as col.labs or as part of xlab (similar for ylab) in the form xlab = list(c("index"), col = "red"). 6.5 Exiting a function 1. Q: What does load() return? Why don’t you normally see these values? A: load() loads objects saved to disk in .Rdata files by save(). When run successfully, load() invisibly returns a character vector containing the names of the newly loaded objects. To print these names to the console, one can set the argument verbose to TRUE or surround the call in parentheses to trigger R’s auto-printing mechanism. 2. Q: What does write.table() return? What would be more useful? A: write.table() writes an object, usually a data frame or a matrix, to disk. The function invisibly returns NULL. It would be more useful if write.table() would (invisibly) return the input data, x. This would allow to save intermediate results and directly take on further processing steps without breaking the flow of the code (i.e. breaking it into different lines). One package which uses this pattern is the readr package, which is part of the “tidyverse”-ecosystem. 3. Q: How does the chdir parameter of source() compare to in_dir()? Why might you prefer one approach to the other? The in_dir() approach was given in the book as in_dir <- function(dir, code) { old <- setwd(dir) on.exit(setwd(old)) force(code) } A: in_dir() takes a path to a working directory as an argument. First the working directory is changed accordingly. on.exit() ensures that the modification to the working directory are reset to the initial value when the function exits. In source() the chdir argument specifies if the working directory should be changed during the evaluation of the file argument (which in this case has to be a pathname). 4. Q: Write a function that opens a graphics device, runs the supplied code, and closes the graphics device (always, regardless of whether or not the plotting code worked). A: To control the graphics device we use pdf() and dev.off(). To ensure a clean termination on.exit() is used. plot_pdf <- function(code) { pdf("test.pdf") on.exit(dev.off(), add = TRUE) code } 5. Q: We can use on.exit() to implement a simple version of capture.output(). capture.output2 <- function(code) { temp <- tempfile() on.exit(file.remove(temp), add = TRUE) sink(temp) on.exit(sink(), add = TRUE) force(code) readLines(temp) } capture.output2(cat("a", "b", "c", sep = "\n")) #> [1] "a" "b" "c" Compare capture.output() to capture.output2(). How do the functions differ? What features have I removed to make the key ideas easier to see? How have I rewritten the key ideas to be easier to understand? A: Using body(capture.output) we inspect the source code of the original capture.output() function. capture.output() is a quite a bit longer (39 lines vs. 7 lines). capture.output() writes out entire methods, such as readLines() . Instead capture.output2() calls these methods directly. This brevity and modularity makes capture.output2 easier to understand (given you know the underlying methods). However capture.output2() does miss a couple of features: capture.output() appears to handle important exceptions and it also offers a choice between overwriting or appending to a file. 6.6 Function forms 1. Q: Rewrite the following code snippets into prefix form: 1 + 2 + 3 1 + (2 + 3) if (length(x) <= 5) x[[5]] else x[[n]] A: Let’s rewrite the expressions to match the exact syntax from the code above. Because prefix functions already define the execution order, we may omit the parentheses in the second expression. +(+(1, 2), 3) +(1, ((+(2, 3))) +(1, +(2, 3)) if(<=(length(x), 5), [[(x, 5), [[(x, n)) 2. Q: Clarify the following list of odd function calls: x <- sample(replace = TRUE, 20, x = c(1:10, NA)) y <- runif(min = 0, max = 1, 20) cor(m = "k", y = y, u = "p", x = x) A: None of these functions provides a ... argument. Therefore the function arguments are first matched exactly, then via partial matching and finally by position. This leads us to the following explicit function calls: x <- sample(c(1:10, NA), size = 20, replace = TRUE) y <- runif(20, min = 0, max = 1) cor(x, y, use = "pairwise.complete.obs", method = "kendall") 3. Q: Explain why the following code fails: modify(get("x"), 1) <- 10 #> Error: target of assignment expands to non-language object A: First, let’s define x and recall the definition of modify() from the textbook: x <- 1:3 modify<- <- function(x, position, value) { x[position] <- value x } R internally transforms the code and the transformed code reproduces the error above. get("x") <- modify<-(get("x"), 1, 10) #> Error in get("x") <- modify<-(get("x"), 1, 10) : #> target of assignment expands to non-language object The error occurs during the assignment, because no corresponding replacement function, i.e. get<- exists for get(). To confirm this we can reproduce the error via the following simple example. get("x") <- 2 #> Error in get("x") <- 2 : target of assignment expands to non-language object 4. Q: Create a replacement function that modifies a random location in a vector. A: Lets define %random% like this: random<- <- function(x, value) { idx <- sample(length(x), 1) x[idx] <- value x } 5. Q: Write your own version of + that will paste its inputs together if they are character vectors but behaves as usual otherwise. In other words, make this code work: 1 + 2 #> [1] 3 "a" + "b" #> [1] "ab" A: To achieve this behaviour, we need to override the + operator. We need to take care to not use the + operator itself inside of the function definition, because this would lead to an undesired infinite recursion. We also add b = 0L as a default value to keep the behaviour of + as a unary operator, i.e. to keep + 1 working and not throwing an error + <- function(a, b = 0L){ if (is.character(a) && is.character(b)) { paste0(a, b) } else { base::+(a, b) } } # test functionality + 1 #> [1] 1 1 + 2 #> [1] 3 "a" + "b" #> [1] "ab" # return back to the original + operator rm(+) 6. Q: Create a list of all the replacement functions found in the base package. Which ones are primitive functions? (Hint use apropos()) A: The hint suggests to look for functions with a specific naming pattern: Replacement functions conventionally end on <-. We can search these objects with a regular expression (<-$).

apropos("<-$") #> [1] ".rowNamesDF<-" "[[<-" "[<-" #> [4] "@<-" "<-" "<<-" #> [7] "$<-"              "as<-"             "attr<-"
#> [10] "attributes<-"     "body<-"           "body<-"
#> [13] "class<-"          "coerce<-"         "colnames<-"
#> [16] "comment<-"        "contrasts<-"      "diag<-"
#> [19] "dim<-"            "dimnames<-"       "el<-"
#> [22] "elNamed<-"        "Encoding<-"       "environment<-"
#> [25] "formals<-"        "functionBody<-"   "is.na<-"
#> [28] "languageEl<-"     "length<-"         "levels<-"
#> [31] "mode<-"           "modify<-"         "mostattributes<-"
#> [34] "names<-"          "oldClass<-"       "packageSlot<-"
#> [37] "parent.env<-"     "pluck<-"          "regmatches<-"
#> [40] "row.names<-"      "rownames<-"       "S3Class<-"
#> [43] "S3Part<-"         "slot<-"           "split<-"
#> [46] "storage.mode<-"   "substr<-"         "substring<-"
#> [49] "tsp<-"            "units<-"          "window<-"

However, instead of apropros() we will use ls() and adopt a bit of the code from a previous exercise. (This makes it easier to work with environments explicitly.) We first find all the objects in the base package which end on <-, then filter to only look at functions:

repl_nms <- ls(baseenv(), all.names = TRUE, pattern = "<-$") repl_objects <- mget(repl_nms, baseenv()) repl_functions <- Filter(is.function, repl_objects) length(repl_functions) #> [1] 35 Additionally, we also filter for primitive functions. Overall base R contains 35 replacement functions. The following 17 of them are also primitive functions: names(Filter(is.primitive, repl_functions)) #> [1] "[[<-" "[<-" "@<-" "<-" #> [5] "<<-" "$<-"            "attr<-"         "attributes<-"
#>  [9] "class<-"        "dim<-"          "dimnames<-"     "environment<-"
#> [13] "length<-"       "levels<-"       "names<-"        "oldClass<-"
#> [17] "storage.mode<-"
7. Q: What are valid names for user-created infix functions?

A: Let’s cite Advanced R here (section on “Function Forms”):

… names of infix functions are more flexible than regular R functions: they can contain any sequence of characters except “%”.

8. Q: Create an infix xor() operator.

A: We could create an infix %xor% like this:

%xor% <- function(a, b) {
xor(a, b)
}
TRUE %xor% TRUE
#> [1] FALSE
FALSE %xor% TRUE
#> [1] TRUE
9. Q: Create infix versions of the set functions intersect(), union(), andsetdiff(). You might call them %n%, %u%, and %/% to match conventions from mathematics.

A: These infix operators could be defined in the following way. (%/% is chosen instead of %\%, because \ serves as an escape character.)

%n% <- function(a, b) {
intersect(a, b)
}

%u% <- function(a, b) {
union(a, b)
}

%/% <- function(a, b) {
setdiff(a, b)
}

x <- c("a", "b", "d")
y <- c("a", "c", "d")

x %u% y
#> [1] "a" "b" "d" "c"
x %n% y
#> [1] "a" "d"
x %/% y
#> [1] "b"