# 9 Function operators

## 9.1 Existing FOs

**Q**: Base R provides a function operator in the form of`Vectorize()`

. What does it do? When might you use it?**A**:**Q**: Read the source code for`possibly()`

. How does it work?**A**:**Q**: Read the source code for`safely()`

. How does it work?**A**:

## 9.2 Case study: creating your own FOs

**Q**: Weigh the pros and cons of`download.file %>% dot_every(10) %>% delay_by(0.1)`

vs`download.file %>% delay_by(0.1) %>% dot_every(10)`

.

**A**:

**Q**: Should you memoise`file.download()`

? Why/why not?

**A**:

**Q**: Create a FO that reports whenever a file is created or deleted in the working directory, using`dir()`

and`setdiff()`

. What other global function effects might you want to track?

**A**:

**Q**: Write a FO that logs a timestamp and message to a file every time a function is run.

**A**:

**Q**: Modify`delay_by()`

so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called`g <- delay_by(1, f); g(); Sys.sleep(2); g()`

there shouldn’t be an extra delay.

**A**:

## 9.3 Old exercises

## 9.4 Behavioural FOs

**Q**: Write a FO that logs a time stamp and message to a file every time a function is run.**A**: Note that the example will create a file file in your current working directory:`logger <- function(f, filename){ force(f) filename_tmp <- paste(filename, basename(tempfile()), sep = "_") write(paste("created at:", Sys.time()), filename_tmp, append = TRUE) function(..., message = "you can add a message at each call") { write(paste0("used at: ", Sys.time(), ", ", message), filename_tmp, append = TRUE) f(...) } } # the following line creates a file, which name starts with "mean_log_" mean2 <- logger(mean, "mean_log") mean2(1:4, message = "first time") mean2(1:4, message = "second_time")`

**Q**: What does the following function do? What would be a good name for it?`f <- function(g) { force(g) result <- NULL function(...) { if (is.null(result)) { result <<- g(...) } result } } runif2 <- f(runif) runif2(5) #> [1] 0.5651561 0.3223773 0.5743120 0.4445636 0.4411795 runif2(10) #> [1] 0.5651561 0.3223773 0.5743120 0.4445636 0.4411795`

**A**: It returns a new version of the inputfunction. That version will always return the result of it’s first run (in case this not`NULL`

), no matter how the input changes. Good names could be`first_run()`

or`initial_return()`

.**Q**: Modify`delay_by()`

so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called`g <- delay_by(1, f); g(); Sys.sleep(2); g()`

there shouldn’t be an extra delay.**A**: We can do this with three little tricks (and the help of 42):`delay_by_v2 <- function(delay, f) { force(f) # we initialise the timestamp for the last run. We set a specific default value, # to ensure that the first run of the returned function will never be delayed last_runtime <- Sys.time() - (delay + 42) function(...) { # we continually check if enough time passed with an (empty) while statement. while (Sys.time() < last_runtime + delay) {} # we override the start for the next waiting interval. # Note that this is done on exit (after the function is evaluated) on.exit(last_runtime <<- Sys.time()) return(f(...)) } }`

Alternatively to the empty while statement we could have used

`Sys.sleep()`

. I would not recommend this solution, since`?Sys.sleep`

indicates that`Sys.sleep()`

might have some overhead and seems not to be as exact as we need.**Q**: Write`wait_until()`

which delays execution until a specific time.**A**:**Q**: There are three places we could have added a memoise call: why did we choose the one we did?`download <- memoise(dot_every(10, delay_by(1, download_file))) download <- dot_every(10, memoise(delay_by(1, download_file))) download <- dot_every(10, delay_by(1, memoise(download_file)))`

**A**: The second was chosen. It’s easy to see why, if we eliminate the other two options:The first version only prints a dot at every tenth

`download()`

call with a new input. This is because`dot_every()`

is inside of`memoise()`

and the counter created by`dot_every()`

is not “activated” if the input is known.The third version takes one second for every call. Even if we already know the result and don’t download anything again.

**Q**: Why is the`remember()`

function inefficient? How could you implement it in more efficient way?**Q**: Why does the following code, from stackoverflow, not do what you expect?`# return a linear function with slope a and intercept b. f <- function(a, b) function(x) a * x + b # create a list of functions with different parameters. fs <- Map(f, a = c(0, 1), b = c(0, 1)) fs[[1]](3) #> [1] 0 # should return 0 * 3 + 0 = 0`

How can you modify

`f`

so that it works correctly?**A**: You can read in the stackoverflow link that the question arose, because the original return of`fs[[1]](3)`

was`4`

, which is due to lazy evaluation and could be solved by two users via`force()`

:However you can see in the result within the question that

**R**’s behaviour was changed in this case and as Jan Kislinger points out on twitter:The real question should be: “How did they modify #rstats so that it works correctly?” otherwise it’s a tricky question :D

Note that the same issue appears in the textbook:

In the following example, we take a list of functions and delay each one. But when we try to evaluate the mean, we get the sum instead.

`funs <- list(mean = mean, sum = sum) funs_m <- lapply(funs, delay_by, delay = 0.1) funs_m$mean(1:10) #> [1] 5.5`

Which (as one can see) is not true anymore…actually it changed in R version

**3.2**:Higher order functions such as the apply functions and Reduce() now force arguments to the functions they apply in order to eliminate undesirable interactions between lazy evaluation and variable capture in closures. This resolves PR#16093.

For further interested: PR#16093 will lead you to the subject “iterated lapply” within the R-devel Archives. Note that the behaviour in for loops is still as “the old

`lapply()`

” behaviour.

## 9.5 Output FOs

**Q**: Create a`negative()`

FO that flips the sign of the output of the function to which it is applied.**A**:**Q**: The`evaluate`

package makes it easy to capture all the outputs (results, text, messages, warnings, errors, and plots) from an expression. Create a function like`capture_it()`

that also captures the warnings and errors generated by a function.**A**: One way is just to capture the output of`tryCatch()`

with identity handlers for errors and warnings:`capture_trials <- function(f){ force(f) function(...){ capture.output(tryCatch(f(...), error = function(e) e, warning = function(w) w) ) } } # we test the behaviour log_t <- capture_trials(log) elements <- list(1:10, c(-1, 10), c(TRUE, FALSE), letters) results <- lapply(elements, function(x) log_t(x)) results #> [[1]] #> [1] " [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101" #> [2] " [8] 2.0794415 2.1972246 2.3025851" #> #> [[2]] #> [1] "<simpleWarning in f(...): NaNs produced>" #> #> [[3]] #> [1] "[1] 0 -Inf" #> #> [[4]] #> [1] "<simpleError in f(...): non-numeric argument to mathematical function>" # further # results_detailed <- lapply(elements, function(x) lapply(x, function(y))log2(x)) # results_detailed`

**Q**: Create a FO that tracks files created or deleted in the working directory (Hint: use`dir()`

and`setdiff()`

.) What other global effects of functions might you want to track?**A**: We start with a short version to show the idea:`track_dir <- function(f){ force(f) function(...){ dir_old <- dir() on.exit(if(!setequal(dir(), dir_old)){ message("files in your working directory were deleted or added by this function")}) f(...) } } # the following test will create the file "delete_me" in your working directory td <- track_dir(dir.create) td("delete_me")`

Of course we can provide more information on the type of changes:

`track_dir <- function(f){ force(f) function(...){ dir_old <- dir() on.exit(if(!setequal(dir(), dir_old)){ message("Files in your working directory were deleted or added by this function.")}, add = TRUE) on.exit(if(length(setdiff(dir_old, dir()) != 0)){ message(paste0("The following files were deleted: ", paste(setdiff(dir_old, dir()), collapse = ", ") ))}, add = TRUE) on.exit(if(length(setdiff(dir(), dir_old) != 0)){ message(paste0("The following files were added: ", paste(setdiff(dir(), dir_old), collapse = ", ") ))}, add = TRUE) f(...) } } # the following test will again create two files in your working directory td <- track_dir(sapply) td(c("delete_me", "me_too"), dir.create)`

Other global effects that might be worth tracking include changes regarding:

- the search path and/or introduced
`conflicts()`

`options()`

and`par()`

which modify global settings- the path of the working directory
- environment variables
- the locale.

- the search path and/or introduced

## 9.6 Input FOs

**Q**: Our previous`download()`

function only downloads a single file. How can you use`partial()`

and`lapply()`

to create a function that downloads multiple files at once? What are the pros and cons of using`partial()`

vs. writing a function by hand?**Q**: Read the source code for`plyr::colwise()`

. How does the code work? What are`colwise()`

’s three main tasks? How could you make`colwise()`

simpler by implementing each task as a function operator? (Hint: think about`partial()`

.)**A**: We describe how it works by commenting the source code:`function (.fun, .cols = true, ...) { # We check if .cols is not a function, since it is possible to supply a # predicate function. # if so, the .cols arguments will be "quoted", and filter() will # be a function that checks and evaluates these .cols within its other argument if (!is.function(.cols)) { .cols <- as.quoted(.cols) filter <- function(df) eval.quoted(.cols, df) } # otherwise, filter will be be Filter(), which applies the function # in .cols to every element of its other argument else { filter <- function(df) Filter(.cols, df) } # the ... arguments are caught in the list dots dots <- list(...) # a function is created, which will also be the return value. # it checks if its input is a data frame function(df, ...) { stopifnot(is.data.frame(df)) # if df is split (in "plyr" speaking), this will be taken into account... df <- strip_splits(df) # now the columns of the data frame are chosen, depending on the input of .cols # this can chosen directly, via a predicate function, or all columns (default) filtered <- filter(df) # if this means, that no columns are selected, an empty data frame will be returned if (length(filtered) == 0) return(data.frame()) # otherwise lapply will be called on all filtered columns, with # the .fun argument, which has to be provided by the user, and some other # arguments provided by the user, when calling the function (...) and # when defining the function (dots) out <- do.call("lapply", c(list(filtered, .fun, ...), dots)) # the output will be named and converted from list into a data frame again names(out) <- names(filtered) quickdf(out) } } <environment: namespace:plyr>`

**Q**: Write FOs that convert a function to return a matrix instead of a data frame, or a data frame instead of a matrix. If you understand S3, call them`as.data.frame.function()`

and`as.matrix.function()`

.**A**:**Q**: You’ve seen five functions that modify a function to change its output from one form to another. What are they? Draw a table of the various combinations of types of outputs: what should go in the rows and what should go in the columns? What function operators might you want to write to fill in the missing cells? Come up with example use cases.**Q**: Look at all the examples of using an anonymous function to partially apply a function in this and the previous chapter. Replace the anonymous function with`partial()`

. What do you think of the result? Is it easier or harder to read?**A**: The results are easy to read. Especially the`Map()`

examples profit in readability:`library(pryr) ## From Functionals # 1 trims <- c(0, 0.1, 0.2, 0.5) x <- rcauchy(1000) unlist(lapply(trims, function(trim) mean(x, trim = trim))) #> [1] -1.74303794 -0.13030677 -0.07694337 -0.07839265 unlist(lapply(trims, partial(mean, x))) #> [1] -1.74303794 -0.13030677 -0.07694337 -0.07839265 # 2 xs <- replicate(5, runif(10), simplify = FALSE) ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE) unlist(Map(function(x, w) weighted.mean(x, w, na.rm = TRUE), xs, ws)) #> [1] 0.3890304 0.6066202 0.4287779 0.3139000 0.5473885 unlist(Map(partial(weighted.mean, na.rm = TRUE), xs, ws)) #> [1] 0.3890304 0.6066202 0.4287779 0.3139000 0.5473885 # 3 add <- function(x, y, na.rm = FALSE) { if (na.rm && (is.na(x) || is.na(y))) rm_na(x, y, 0) else x + y } r_add <- function(xs, na.rm = TRUE) { Reduce(function(x, y) add(x, y, na.rm = na.rm), xs) } r_add_compact <- function(xs, na.rm = TRUE) { Reduce(partial(add, na.rm = na.rm), xs) } r_add(1:4) #> [1] 10 r_add_compact(1:4) #> [1] 10 # 4 v_add1 <- function(x, y, na.rm = FALSE) { stopifnot(length(x) == length(y), is.numeric(x), is.numeric(y)) if (length(x) == 0) return(numeric()) simplify2array( Map(function(x, y) add(x, y, na.rm = na.rm), x, y) ) } v_add1_compact <- function(x, y, na.rm = FALSE) { stopifnot(length(x) == length(y), is.numeric(x), is.numeric(y)) if (length(x) == 0) return(numeric()) simplify2array( Map(partial(add, na.rm = na.rm), x, y) ) } v_add1(1:3, 2:4) #> [1] 3 5 7 v_add1_compact(1:3, 2:4) #> [1] 3 5 7 # 5 c_add <- function(xs, na.rm = FALSE) { Reduce(function(x, y) add(x, y, na.rm = na.rm), xs, accumulate = TRUE) } c_add_compact <- function(xs, na.rm = FALSE) { Reduce(partial(add, na.rm = na.rm), xs, accumulate = TRUE) } c_add(1:3) #> [1] 1 3 6 c_add_compact(1:3) #> [1] 1 3 6 ## From Function operators # 6 f <- function(x) x ^ 2 partial(f) #> function (...) #> f(...) # 7 # Map(function(x, y) f(x, y, zs), xs, ys) # Map(partial(f, zs = zs), xs, yz) # 8 # f <- function(a) g(a, b = 1) # f <- partial(g, b = 1) # 9 compact <- function(x) Filter(Negate(is.null), x) compact <- partial(Filter, Negate(is.null)) # 10 # Map(function(x, y) f(x, y, zs), xs, ys) # Map(partial(f, zs = zs), xs, ys) # 11 funs2 <- list( sum = function(...) sum(..., na.rm = TRUE), mean = function(...) mean(..., na.rm = TRUE), median = function(...) median(..., na.rm = TRUE) ) funs2 <- list( sum = partial(sum, na.rm = TRUE), mean = partial(mean, na.rm = TRUE), median = partial(median, na.rm = TRUE) )`

## 9.7 Combining FOs

**Q**: Implement your own version of`compose()`

using`Reduce`

and`%o%`

. For bonus points, do it without calling`function`

.**A**: We use the definition from the textbook:And then we build two versions. One via an anonymous function and one via

`partial()`

:**Q**: Extend`and()`

and`or()`

to deal with any number of input functions. Can you do it with`Reduce()`

? Can you keep them lazy (e.g., for`and()`

, the function returns once it sees the first`FALSE`

)?**A**: We use`and()`

and`or()`

as defined in the textbook. They are lazy, since they are build up on`&&`

and`||`

. Also their reduced versions stay lazy, as we will show at the end of the code`and <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) && f2(...) } } and_red <- function(fs){ Reduce(function(f, g) and(f, g), fs) } or <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) || f2(...) } } or_red <- function(fs){ Reduce(function(f, g) or(f, g), fs) } # Errors before the first TRUE will be returned tryCatch( or_red(c(is.logical, is.logical, stop, is.character))("a"), error = function(e) e ) #> <simpleError in f1(...): a> # Errors after the first TRUE won't be returned or_red(c(is.logical, is.logical, is.character, stop))("a") #> [1] TRUE`

**Q**: Implement the`xor()`

binary operator. Implement it using the existing`xor()`

function. Implement it as a combination of`and()`

and`or()`

. What are the advantages and disadvantages of each approach? Also think about what you’ll call the resulting function to avoid a clash with the existing`xor()`

function, and how you might change the names of`and()`

,`not()`

, and`or()`

to keep them consistent.**A**: Both versions are implemented straight forward, as also the reduced versions. However, the parallel versions need a little bit more care:`xor_fb1 <- function(f1, f2){ force(f1); force(f2) function(...){ xor(f1(...), f2(...)) } } xor_fb2 <- function(f1, f2){ force(f1); force(f2) function(...){ or(f1, f2)(...) && !(and(f1, f2)(...)) } } # binary combination xor_fb1(is.logical, is.character)("a") #> [1] TRUE xor_fb2(is.logical, is.character)("a") #> [1] TRUE # parallel combination (results in an error) xor_fb1(c(is.logical, is.character), c(is.logical, is.character))("a") #> Error in f1(...): could not find function "f1" xor_fb2(c(is.logical, is.character), c(is.logical, is.character))("a") #> Error in f1(...): could not find function "f1" # reduced combination (results in an error) xor_fb1(c(is.logical, is.character, is.logical, is.character))("a") #> Error in force(f2): argument "f2" is missing, with no default xor_fb2(c(is.logical, is.character, is.logical, is.character))("a") #> Error in force(f2): argument "f2" is missing, with no default ### Reduced version xor_fb1_red <- function(fs){ Reduce(function(f, g) xor_fb1(f, g), fs) } xor_fb2_red <- function(fs){ Reduce(function(f, g) xor_fb2(f, g), fs) } # should return TRUE xor_fb1_red(c(is.logical, is.character, is.logical, is.character))("a") #> [1] FALSE xor_fb2_red(c(is.logical, is.character, is.logical, is.character))("a") #> [1] FALSE # should return FALSE xor_fb1_red(c(is.logical, is.logical, is.character, is.logical))("a") #> [1] TRUE xor_fb2_red(c(is.logical, is.logical, is.character, is.logical))("a") #> [1] TRUE # should return FALSE xor_fb1_red(c(is.logical, is.logical, is.character, is.character))("a") #> [1] FALSE xor_fb2_red(c(is.logical, is.logical, is.character, is.character))("a") #> [1] FALSE`

**Q**: Above, we implemented boolean algebra for functions that return a logical function. Implement elementary algebra (`plus()`

,`minus()`

,`multiply()`

,`divide()`

,`exponentiate()`

,`log()`

) for functions that return numeric vectors.**A**:`plus <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) + f2(...) } } minus <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) - f2(...) } } multiply <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) * f2(...) } } divide <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) / f2(...) } } exponentiate <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) ^ f2(...) } } # we rename log to log_ since log() already exists log_ <- function(f1, f2) { force(f1); force(f2) function(...) { log(f1(...), f2(...)) } } # Test mns <- minus(mean, function(x) x^2) mns(1:5)`