# 4 Functions

## 4.1 Function fundamentals

1. Q: Given a function, like "mean", match.fun() lets you find a function. Given a function, can you find its name? Why doesn’t that make sense in R?

A: If you know body(), formals() and environment() it can be possible to find the function. However, this won’t be possible for primitive functions, since they return NULL for those three properties. Also annonymous functions won’t be found, because they are not bound to a name. On the other hand it could be that different names in an environment contain binding to one (or more functions) with the same body(), formals() and environment() which means that the solution wouldn’t be unique. More general: In R a (function) name has an object, but an object (i.e. a function) doesn’t have a name (just a binding sometimes).

Also: You “can only find the parents of an environment, not its children, so this is necessarily only going to be a partial search” (Hadley Wickham).

2. Q: It’s possible (although typically not useful) to call an anonymous function. Which of the two approaches below is correct? Why?

function(x) 3()
#> function(x) 3()
(function(x) 3)()
#> [1] 3

A: The second approach is correct. Using the first syntax we can directly convince ourselves that the function doesn’t get called and instead a function object is returned containing the invalid function 3(). When we try to evalute the function in the correct way well consequently get an error:

(function(x) 3())()
#> Error in (function(x) 3())(): attempt to apply non-function

In contrast the second syntax ensures via parenthesis that the anonymous function doesn’t contain a non valid function. It’s obvious to see that the latter brackets are used to call the anonymous function.

3. Q: A good rule of thumb is that an anonymous function should fit on one line and shouldn’t need to use {}. Review your code. Where could you have used an anonymous function instead of a named function? Where should you have used a named function instead of an anonymous function?

A: :) - please take a look a your recent projects. You can also make a mental note to assign names to your functions, whenever they start to span multiple lines.

4. Q: What function allows you to tell if an object is a function? What function allows you to tell if a function is a primitive function?

A: You can test objects with is.function and is.primitive.

5. Q: This code makes a list of all functions in the base package.

objs <- mget(ls("package:base"), inherits = TRUE)
funs <- Filter(is.function, objs)

Use it to answer the following questions:

1. Which base function has the most arguments?

2. How many base functions have no arguments? What’s special about those functions?

3. How could you adapt the code to find all primitive functions?

A:

1. First we create a named vector that returns the number of arguments per function and then we subset it with the index of it’s maximum entry:
f_arg_length <- sapply(funs, function(x) length(formals(x)))
f_arg_length[which.max(f_arg_length)]
#> scan
#>   22
1. We check the number of functions with formals() returning 0 or NULL. Then we will see, that all of these functions have formals equal to NULL, which means, that they should be primitive functions.
sum(sapply(funs, function(x) is.null(formals(x)) | length(formals(x)) == 0))
#> [1] 224
sum(sapply(funs, function(x) !is.null(formals(x)) & length(formals(x)) == 0))
#> [1] 0
sum(sapply(funs, function(x) is.null(formals(x))))
#> [1] 224
sum(sapply(funs, function(x) is.null(formals(x)) & is.primitive(x)))
#> [1] 183
Hence not all functions with formals equal to NULL are primitive functions, there must be non primitive functions with this property too.
1. Change the predicate in Filter to is.primitive:
funs <- Filter(is.primitive, objs)
6. Q: What are the three important components of a function?

A: body(), formals() and environment().

There is one exception to the rule that functions have three components. Primitive functions, like sum(), call C code directly with .Primitive() and contain no R code. Therefore their formals(), body(), and environment() are all NULL.

7. Q: When does printing a function not show what environment it was created in?

A: When it was created in the global environment.

## 4.2 Lexical Scoping

1. Q: What does the following code return? Why? Describe how each of the three c’s is interpreted.

c <- 10
c(c = c)

A: A named vector c, which first field has the value 10 and the name “c”. The first “c” is the c() function, the second is the name of the first entry and the third is the value of the first entry.

2. Q: What are the four principles that govern how R looks for values?

A: As stated in the book:

There are four basic principles behind R’s implementation of lexical scoping:

• name masking
• functions vs. variables
• a fresh start
• dynamic lookup
3. Q: What does the following function return? Make a prediction before running the code yourself.

f <- function(x) {
f <- function(x) {
f <- function(x) {
x ^ 2
}
f(x) + 1
}
f(x) * 2
}
f(10)

A: 202

## 4.3 Lazy evaluation

1. Q: What important property of && make x_ok() work?

x_ok <- function(x) {
!is.null(x) && length(x) == 1 && x > 0
}

x_ok(NULL)
#> [1] FALSE
x_ok(1)
#> [1] TRUE
x_ok(1:3)
#> [1] FALSE

What is different with this code? Why is this behaviour undesirable here?

x_ok <- function(x) {
!is.null(x) & length(x) == 1 & x > 0
}

x_ok(NULL)
#> logical(0)
x_ok(1)
#> [1] TRUE
x_ok(1:3)
#> [1] FALSE FALSE FALSE

A: The intended usage of x_ok is to check if an argument provided to a function is not NULL, has length 1 and is greater than 0. To work with this function, we only want to know if this is TRUE, FALSE or NA (unknown). Therefore the first version behaves as desired as we can see in the first and the third example.

The property of && that makes x_ok() work is lazy evaluation.

&& checks from left to right, if the first element of every side (argument) evaluates to TRUE, FALSE or sth. else (another valid value, which gets coerced to logical). In case of FALSE, it stops and returns FALSE. If none is FALSE but at least one is not TRUE (after checking all sides), NA is returned. This makes sense, since if sth. is neither true nor false, it’s logical value is unknown, corresponding to NA.

Apart from the undesired elementwise evaluation of the & operator (at least in this case) & almost works the same as &&. From ?&:

… The shorter form performs elementwise comparisons in much the same way as arithmetic operators. Evaluation proceeds only until the result is determined.

However, when combining the results, it takes the lengths of it’s arguments into account. Therefore FALSE & logical(0) returns logical(0), in contrast FALSE && logical(0), which returns FALSE.

Returning to the first example, we can see now that this is what leads to the (in this case) undesired behaviour, where neither TRUE, FALSE nor NA gets returend:

FALSE && NULL > 0
#> [1] FALSE
FALSE &  NULL > 0
#> logical(0)
2. Q: The definition of force() is simple:

force
#> function (x)
#> x
#> <bytecode: 0xefdd58>
#> <environment: namespace:base>

Why is it better to force(x) instead of just x?

A: To be clear: force(x) is just syntactic sugar for x. However, as stated in the first edition of the textbook:

using this function clearly indicates that you’re forcing evaluation, not that you’ve accidentally typed x.

3. Q: What does this function return? Why? Which principle does it illustrate?

f2 <- function(x = z) {
z <- 100
x
}
f2()

A: 100, lazy evaluation.

4. Q: What does this function return? Why? Which principle does it illustrate?

A: The output of f1() shows, that the default argument for y is not accessed. Instead the value assigned to y in the expression set as the default argument for x is used. Also, calling f1 doesn’t change value of y in the global environment.

f1()
#> [1] 2 1
y
#> [1] 10

The principle here is “value caching of function arguments”: R caches these arguments when their value is accessed for the first time. Here x is called before y (because of c(x, y)). When the default expression given to x is evaluated, a value for y is also created (and cached). Therefore the default argument for y will no longer be evaluated.

If we change the functions body to c(y, x), the default value given for y will be used.

f2 <- function(x = {y <- 1; 2}, y = 0) {
c(y, x)
}
f2()
#> [1] 0 2
5. Q: In hist(), the default value of xlim is range(breaks), the default value for breaks is "Sturges", and

range("Sturges")
#> [1] "Sturges" "Sturges"

Explain how hist() works to get a correct xlim value.

A: Before the hist() function creates the final plot where xlim is provided and finally evaluated, the hist() function internally updates and checks the value of breaks several times to ensure that it is finally a numeric vector with at least two elements.

The detailed behaviour is very specific to the input. According to ?hist this must be one of:
• a vector giving the breakpoints between histogram cells,
• a function to compute the vector of breakpoints,
• a single number giving the number of cells for the histogram,
• a character string naming an algorithm to compute the number of cells (see ‘Details’),
• a function to compute the number of cells.

Further:

In the last three cases the number is a suggestion only; as the breakpoints will be set to pretty values, the number is limited to 1e6 (with a warning if it was larger). If breaks is a function, the x vector is supplied to it as the only argument (and the number of breaks is only limited by the amount of available memory).

In case of breaks = "Sturges" this means that breaks is:
• checked that it is provided (a corresponding flag is set; otherwise it is set to nclass regarding that is provided)
• checked that its length is greater than 1 (a corresponding flag is set)
• converted to lower case and matched to “sturges”
• set to an integer value via sturges = nclass.Sturges(x) inside a switch() statement
• checked that it is now numeric, finite and at least 1
• set to 1000000 if it is greater than 1000000
• turned into a numeric vector of length (possibly) greater one via pretty(range(x), n = breaks, min.n = 1)
• checked that its length is now greater than 1 and not NA
• checked that the differences between the breaks are strictly positive
• during and after this process more variables are calculated and checks are made and finally plot() gets called, where xlim gets evaluated.
6. Q: Explain why this function works. Why is it confusing?

show_time <- function(x = stop("Error!")) {
stop <- function(...) Sys.time()
print(x)
}
show_time()
#> [1] "2018-12-07 22:24:52 UTC"

A: It works because functions are objects, that can be modified and overwritten, and because of of lazy evaluation. Before the x argument is evaluated, the stop() function is overwritten by another function which gets called in the last line where x is finally evaluated.

It’s quite confusing, since there is no relation between the default value of x and its actual meaning in the context of the show_time() function. The user won’t have any chance to guess the meaning of x or it’s default value without looking up a possibly written documentation or analyzing of the source code.

7. Q: How many arguments are required when calling library()?

A: Surprisingly no argument is required. When looking at ?library we can see under usage, that library has nine arguments and two of them are without default arguments:

library(package, help, pos = 2, lib.loc = NULL, character.only = FALSE, logical.return = FALSE, warn.conflicts = TRUE, quietly = FALSE, verbose = getOption(“verbose”))

However, when we call library() without any arguments, we get a list of all available libraries under the current library path (.libPaths()) as also document under the details section of the help file:

If library is called with no package or help argument, it lists all available packages in the libraries specified by lib.loc, and returns the corresponding information in an object of class “libraryIQR”. (The structure of this class may change in future versions.) Use .packages(all = TRUE) to obtain just the names of all available packages, and installed.packages() for even more information.

## 4.4... (dot-dot-dot)

1. Q: Explain the following results:

sum(1, 2, 3)
#> [1] 6
mean(1, 2, 3)
#> [1] 1

sum(1, 2, 3, na.omit = TRUE)
#> [1] 7
mean(1, 2, 3, na.omit = TRUE)
#> [1] 1

A: The arguments of sum() are ... and na.rm. For ... sum() expects “numeric or complex or logical vectors” as documented in ?sum. So any input not explicitly supplied named with na.rm is treated as part of the ... argument and used for summation.

In contrast mean() expects as first argument x typically a vector, as second argument trim a fraction of observations to be trimmed from each end of x and again na.rm. As both: trim = 2 and na.rm = 3 have not effect on the calculation of the mean of 1, we get 1 as the result.

In the next call, na.omit is supplied via the ... argument to sum(), which treats it as logical vector and builds its sum with the other arguments.

Finally in the last call to mean na.omit = TRUE is neither needed as part of the default method nor is it used for the mean calculation, since mean() calculates it’s value only from its x argument.

2. Q: In the following call, explain how to find the documentation for the named arguments in the following function call:

plot(1:10, col = "red", pch = 20, xlab = "x", col.lab = "blue")

A: First we type ?plot in the console and scan the usage section:

plot(x, y, ...)

Obviously we have to look under the ... bullet in the arguments section.

There we can find a bullet for xlab (check), and follow the recommendation to visit ?par for further arguments.

From there we type “col” into the search bar, which leads us to a recommentation to search further under Color Specification (check). Again using the search we find a bullet for the pch argument. From there we get also the recommendation to look under ?points for more specific settings (check). Finally we use the search functionality to find col.lab also as a bullet inside ?par.

3. Q: Why does plot(1:10, col = "red") only colour the points, not the axes or labels? Read the source code of plot.default() to find out.

A: It is easiest to start by adding browser() to the first line of plot.default() and interactively run plot(1:10, col = "red"). In this way we can see how the plot is build during the last lines and especially find out where the axis are added. This leads us to the function call:

localTitle(main = main, sub = sub, xlab = xlab, ylab = ylab, ...)

The localTitle() function was defined in the first lines of plot.default() as:

localTitle <- function(..., col, bg, pch, cex, lty, lwd) title(...)

So the call to localTitle() clearly gets the col parameter as part of ... argument. To find out if it is used we try following the source code of title(), which leads us to a line of C code. Instead of following further, can we stay in R and look at ?title, which brings some clarity on the fact that the title() function specifies four parts of the plot: Main (title of the plot), sub (sub-title of the plot) and both axis labels. Therefore it would introduce ambiguity inside title() to use col directly. Instead on has the option to supply col via the ... argument as col.labs or as part of xlab (similar for ylab) in the form xlab = list(c("index"), col = "red").

## 4.5 Exiting a function

1. Q: What does load() return? Why don’t you normally see these values?

A: load() reloads datasets written with the function save(). It returns a character vector of the names of objects created, invisibly. To see the names of the objects, one can set the verbose argument to TRUE, which triggers a regarding if statement in the function’s body. However, to print the value of the names it is also feasible to use brackets around the load() call to autoprint the returned value.

2. Q: What does write.table() return? What would be more useful?

A: It invisibly returns NULL. It would be more useful to invisibly return the (data frame) object to be written as for example the readr package does. In this way it would be possible to save intermediate results from a sequence of processing steps directly, i.e. within a magrittr pipeline.

3. Q: How does the chdir parameter of source() compare to in_dir()? Why might you prefer one approach to the other? The in_dir() approach was given in the book as

in_dir <- function(dir, code) {
old <- setwd(dir)
on.exit(setwd(old))

force(code)
}

A: in_dir() takes a path to a working directory as an argument. At the beginning of the function the working directory is changed to this specification and with a call to on.exit it is guranteed, that when the function finishes the working directory also equals to this specification.

In source() you need the chdir argument to specify, if the working directory should be changed during the evaluation to the file argument, if this is a pathname. The difference in source() is, that the actual working directory as output of getwd() is saved to set it in on.exit before changing the directory to the pathname (given to the file argument) for the rest of the execution of the source() function.

4. Q: Write a function that opens a graphics device, runs the supplied code, and closes the graphics device (always, regardless of whether or not the plotting code worked).

A:

plot_pdf <- function(code){
pdf("test.pdf")
on.exit(dev.off())
code
}
5. Q: We can use on.exit() to implement a simple version of capture.output().

capture.output2 <- function(code) {
temp <- tempfile()
on.exit(file.remove(temp), add = TRUE)

sink(temp)
on.exit(sink(), add = TRUE)

force(code)
readLines(temp)
}
capture.output2(cat("a", "b", "c", sep = "\n"))
#> [1] "a" "b" "c"

Compare capture.output() to capture.output2(). How do the functions differ? What features have I removed to make the key ideas easier to see? How have I rewritten the key ideas to be easier to understand?

A: Using body(capture.output), we can see the source code for the original capture.output() function. capture.output() is a good clip longer (39 lines vs. 7 lines). The reason for this is that capture.output2() is more modular, since capture.output() writes out entire methods like readLines() instead of invoking them. This makes capture.output2 easier to understand if you understand the underlying methods.

However, capture.output2() does remove potentially important functionality, as capture.output() appears to handle important exceptions not handled in capture.output2(), and capture.output() offers the ability to chose between overwriting or appending to a file.

## 4.6 Function forms

1. Q: Rewrite the following code snippets into prefix form:

1 + 2 + 3

1 + (2 + 3)

if (length(x) <= 5) x[[5]] else x[[n]]

A:

+(+(1, 2), 3)

+(1, ((+(2, 3)))

if(<=(length(x), 5), [[(x, 5), [[(x, n))
2. Q: Clarify the following list of odd function calls:

x <- sample(replace = TRUE, 20, x = c(1:10, NA))
# -> sample(x = c(1:10, NA), size = 20, replace = TRUE)
y <- runif(min = 0, max = 1, 20)
# -> runif(n = 20, min = 0, max = 1)
cor(m = "k", y = y, u = "p", x = x)
# -> cor(x = x, y = y, use = "pairwise.complete.obs", method = "pearson")
3. Q: Explain why the following code fails:

modify(get("x"), 1) <- 10
#> Error: target of assignment expands to non-language object

A: First let define x and recall the definition of modify() from the textbook:

x <- 1:3

modify<- <- function(x, position, value) {
x[position] <- value
x
}

As described in the textbook R turns the code behind the scenes into

get("x") <- modify<-(get("x"), 1, 10)
#> Error in get("x") <- modify<-(get("x"), 1, 10) : target of assignment expands to non-language object

which can not work, because get() has no equivalent replacement function. To confirm this claim, we can reproduce the error via an easier example

get("x") <- 2
#> Error in get("x") <- 2 : target of assignment expands to non-language object

and modify the example to use a function with an available replacement function:

modify(names(x), 1) <- 10
names(x)
#> [1] "10" NA   NA
4. Q: Create a replacement function that modifies a random location in a vector.

A:

random<- <- function(x, value){
x[sample(length(x), 1)] <- value
x
}
5. Q: Write your own version of + that will paste its inputs together if they are character vectors but behaves as usual otherwise. In other words, make this code work:

1 + 2
#> [1] 3

"a" + "b"
#> [1] "ab"

A: We can simply override the + operator. In this case we need to take a bit of care to not use the + operator itself inside of the function definition, since otherwise we would end in an infinite recursion (a special case of an infinite loop). We also add b = 0L as a default value keep the behaviour of + as a unary operator, i.e. to keep + 1 working and not throwing an error:

+ <- function(a, b = 0L){
if (is.character(a) && is.character(b)) {return(paste0(a, b))}
a -- b
}

# tests
+ 1
#> [1] 1

1 + 2
#> [1] 3

"a" + "b"
#> [1] "ab"

# return back to the original + operator behaviour
rm(+)
6. Q: Create a list of all the replacement functions found in the base package. Which ones are primitive functions? (Hint use apropros())

A: We can find replacementfunctions by searching for functions that end on “<-”:

repls <- funs[grepl("<-\$", names(funs))]
Filter(is.primitive, repls)
7. Q: What are valid names for user-created infix functions?

A: As the section on “Function Forms” in Advanced R tells us, the “names of infix functions are more flexible than regular R functions: they can contain any sequence of characters except “%”.”

8. Q: Create an infix xor() operator.

A:

%xor_% <- function(a, b){
(a | b) & !(a & b)
}
9. Q: Create infix versions of the set functions intersect(), union(), and setdiff(). You might call them %n%, %u%, and %/% to match conventions from mathematics.

A:

%union_% <- function(a, b){
unique(c(a, b))
}

%intersect_% <- function(a, b){
unique(c(a[a %in% b], b[b %in% a]))
}

%setdiff_% <- function(a, b){
a[!a %in% b]
}

## 4.7 Old exercises

1. Q: What does this function return? Why? Which principle does it illustrate?

A: It returns 3 and illustrates lazy evaluation. As you can see, y becomes 1, but only when x is evaluated (before y) inside the function (otherwise it is 0):

f2 <- function(x = {y <- 1; 2}, y = 0) {
y
}
f2()
#> [1] 0

Note that funny things can happen if we switch the evaluation order (even within one line)

f3 <- function(x = {y <- 1; 2}, y = 0) {
y + x
}
f3()
#> [1] 2

or we evaluate y once before and once after the evaluation of x

f4 <- function(x = {y <- 1; 2}, y = 0) {
y_before_x <- y
x
y_after_x <- y
c(y_before_x, y_after_x)
}
f4()
#> [1] 0 1`