11 Function operators

library(purrr)

11.1 Existing function operators

  1. Q: Base R provides a function operator in the form of Vectorize(). What does it do? When might you use it?

    A: In R a lot of functions are “vectorised”. This means, that a computation is iteratively applied to each element of a “vectorised argument”. Further - in the context of R - the term vectorised implies that these operations are implemented in a compiled language such as C or Fortran as these languages provide faster evaluation of for loops than R does.

    Despite what the function’s name and its documentation (?Vectorize) tell us, Vectorize() is not able to speed up the provided function. It rather changes the input format of the supplied arguments (vectorize.args), so that they are iterated over. Because of this Vectorize() works best for scalar functions.

    In essence, Vectorize() is mostly a wrapper for mapply(). Let’s take a look at an example from the documentation:

    vrep <- Vectorize(rep.int)
    vrep
    #> function (x, times) 
    #> {
    #>     args <- lapply(as.list(match.call())[-1L], eval, parent.frame())
    #>     names <- if (is.null(names(args))) 
    #>         character(length(args))
    #>     else names(args)
    #>     dovec <- names %in% vectorize.args
    #>     do.call("mapply", c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]), 
    #>         SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES))
    #> }
    #> <environment: 0x946438>
    
    # Application
    vrep(1:2, 3:4)
    #> [[1]]
    #> [1] 1 1 1
    #> 
    #> [[2]]
    #> [1] 2 2 2 2
    
    # Naming arguments still works
    vrep(times = 1:2, x = 3:4)  
    #> [[1]]
    #> [1] 3
    #> 
    #> [[2]]
    #> [1] 4 4

    Vectorize() provides a convenient and concise notation to iterate over multiple arguments. If you want your vectorised function to be type stable you might want to set SIMPLIFY = FALSE inside of Vectorize().

  2. Q: Read the source code for possibly(). How does it work?

    A: possibly() is a function operator that modifies functions to return a specified default value in case of an error (otherwise) and to suppress any error messages (quiet = TRUE).

    While reading the source code, we notice that possibly() internally uses purrr::as_mapper(). This enables users to supply not only functions, but also formulas or atomics via the same syntax as known from other functions in the purrr package. Besides this, the new default value (otherwise) gets evaluated once to make it (almost) immutable from now on.

    The main functionality of possibly() is provided by base::tryCatch(). In this part the supplied function (.f) gets wrapped and the error and interrupt handling are specified.

    possibly
    #> function (.f, otherwise, quiet = TRUE) 
    #> {
    #>     .f <- as_mapper(.f)
    #>     force(otherwise)
    #>     function(...) {
    #>         tryCatch(.f(...), error = function(e) {
    #>             if (!quiet) 
    #>                 message("Error: ", e$message)
    #>             otherwise
    #>         }, interrupt = function(e) {
    #>             stop("Terminated by user", call. = FALSE)
    #>         })
    #>     }
    #> }
    #> <bytecode: 0x1675680>
    #> <environment: namespace:purrr>
  3. Q: Read the source code for safely(). How does it work?

    A: safely() returns a function created by capture_error(), which we have to inspect in order to understand what this function operator does:

    safely
    #> function (.f, otherwise = NULL, quiet = TRUE) 
    #> {
    #>     .f <- as_mapper(.f)
    #>     function(...) capture_error(.f(...), otherwise, quiet)
    #> }
    #> <bytecode: 0x37a2df0>
    #> <environment: namespace:purrr>
    purrr:::capture_error
    #> function (code, otherwise = NULL, quiet = TRUE) 
    #> {
    #>     tryCatch(list(result = code, error = NULL), error = function(e) {
    #>         if (!quiet) 
    #>             message("Error: ", e$message)
    #>         list(result = otherwise, error = e)
    #>     }, interrupt = function(e) {
    #>         stop("Terminated by user", call. = FALSE)
    #>     })
    #> }
    #> <bytecode: 0x3258398>
    #> <environment: namespace:purrr>

    Here tryCatch() evaluates the code, which is passed to it, within a list with the two elements “results” and “error”. If the code evaluates without an error, the returned value be assigned to the “results” element of this list.

    In cas of an error, the list will have the same structure, but “results” will be otherwise (NULL by default) and the “error” element will contain the error message. Take a look at the textbook or the documentation of safely() to see how you can take advantage of this behaviour, for example when fitting many models with R.

11.2 Case study: Creating your own function operators

  1. Q: Weigh the pros and cons of download.file %>% dot_every(10) %>% delay_by(0.1) vs download.file %>% delay_by(0.1) %>% dot_every(10).

    A:

  2. Q: Should you memoise file.download()? Why/why not?

    A: Before you consider to memoise file.download(), ensure that the files you want to download under a specific URL don’t change. Otherwise, it makes sense to memoise file.download() in scenarios where objects are downloaded repeatedly and downloads might take a little longer. However, the downside when caching results in memory is that the regarding amount of memory will not be available for further tasks during the R session. Therefore, it doesn’t make sense to memoise file.download() when it is needed to download very large files, lots of different files or time is not an issue. As the meaning of these terms varies regarding differing situations, it always depends on the use case and one has to weigh up these trade offs carefully.

  3. Q: Create a function operator that reports whenever a file is created or deleted in the working directory, using dir() and setdiff(). What other global function effects might you want to track?

    A: We start with a short version to show the idea:

    track_dir <- function(f){
      force(f)
      function(...){
        dir_old <- dir()
        on.exit(if(!setequal(dir(), dir_old)){
          message("files in your working directory were deleted or added by this function")})
        f(...)
      }
    }
    
    # the following test will create the file "delete_me" in your working directory
    td <- track_dir(dir.create)
    td("delete_me")

    Of course we can provide more information on the type of changes:

    track_dir <- function(f){
      force(f)
      function(...){
        dir_old <- dir()
    
        on.exit(if(!setequal(dir(), dir_old)){
          message("Files in your working directory were deleted or added by this
                  function.")}, add = TRUE)
        on.exit(if(length(setdiff(dir_old, dir()) != 0)){
          message(paste0("The following files were deleted: ",
                         paste(setdiff(dir_old, dir()), collapse = ", ")
                         ))}, add = TRUE)
        on.exit(if(length(setdiff(dir(), dir_old) != 0)){
          message(paste0("The following files were added: ", 
                         paste(setdiff(dir(), dir_old), collapse = ", ")
                         ))}, add = TRUE)
    
        f(...)
      }
    }
    
    # the following test will again create two files in your working directory
    td <- track_dir(sapply)
    td(c("delete_me", "me_too"), dir.create)

    Other global effects that might be worth tracking include changes regarding:

    • the search path and/or introduced conflicts()
    • options() and par() which modify global settings
    • the path of the working directory
    • environment variables
    • the locale.
  4. Q: Write a function operator that logs a timestamp and message to a file every time a function is run.

    A: Note that the example will create a file file in your current working directory:

    logger <- function(f, filename){
      force(f)
      filename_tmp <- paste(filename, basename(tempfile()), sep = "_")
      write(paste("created at:", Sys.time()), filename_tmp, append = TRUE)
      function(..., message = "you can add a message at each call") {
        write(paste0("used at: ", Sys.time(), ", ", message), filename_tmp, append = TRUE)
        f(...)
      }
    }
    
    # the following line creates a file, which name starts with "mean_log_"
    mean2 <- logger(mean, "mean_log") 
    mean2(1:4, message = "first time") 
    mean2(1:4, message = "second_time")
  5. Q: Modify delay_by() so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called g <- delay_by(1, f); g(); Sys.sleep(2); g() there shouldn’t be an extra delay.

    A: We can do this with three little tricks (and the help of 42):

    delay_by_v2 <- function(delay, f) {
      force(f)
      # we initialise the timestamp for the last run. We set a specific default value,
      # to ensure that the first run of the returned function will never be delayed
      last_runtime <- Sys.time() - (delay + 42)
      function(...) {
        # we continually check if enough time passed with an (empty) while statement.
        while (Sys.time() < last_runtime + delay) {}
        # we override the start for the next waiting interval.
        # Note that this is done on exit (after the function is evaluated)
        on.exit(last_runtime <<- Sys.time()) 
        return(f(...))
      }
    }

    Alternatively to the empty while statement we could have used Sys.sleep(). I would not recommend this solution, since ?Sys.sleep indicates that Sys.sleep() might have some overhead and seems not to be as exact as we need.