11 Function operators

11.1 Existing function operators

  1. Q: Base R provides a function operator in the form of Vectorize(). What does it do? When might you use it?

    A: In R a lot of functions are “vectorised”. Vectorised has two meanings. First, it means (broadly) that a function inputs a vector or vectors, and does something to each element. Secondly, it usually implies that these operations are implemented in a compiled language such as C or Fortran, so that the implementation is very fast.

    However, despite what the function’s name implies, Vectorize() is not able to speed up the provided function. It rather changes the input format of the supplied arguments (vectorize.args), so that they can be iterated over.

    In essence, Vectorize() is mostly a wrapper for mapply(). Let’s take a look at an example from the documentation:

    Vectorize() provides a convenient and concise notation to iterate over multiple arguments, but has some major drawbacks that mean you generally shouldn’t use it. See https://www.jimhester.com/2018/04/12/vectorize/ for more details.

  2. Q: Read the source code for possibly(). How does it work?

    A: possibly() modifies functions to return a specified default value in case of an error (otherwise) and to suppress any error messages (quiet = TRUE).

    While reading the source code, we notice that possibly() internally uses purrr::as_mapper(). This enables users to supply not only functions, but also formulas or atomics via the same syntax as known from other functions in the purrr package. Besides this, the new default value (otherwise) gets evaluated once to make it (almost) immutable from now on.

    The main functionality of possibly() is provided by base::tryCatch(). In this part the supplied function (.f) gets wrapped and the error and interrupt handling are specified.

  3. Q: Read the source code for safely(). How does it work?

    A: safely() modifies functions to return a list, containing the elements “result” and “error”. It works in a similar fashion as possibly() and besides using as_mapper(), safely() also provides the otherwise and quiet argument. However, in order to provide the result and the error in a consistent way, the tryCatch() part of the implementation returns a list with the same structure in both cases. In the case of successful evaluation “error” equals to NULL and in case of an error “result” equals to otherwise, which is NULL by default.

    As the tryCatch() part is hidden in the internal purrr:::capture_output() function, we provide it here in addition to safely():

    Take a look at the textbook or the documentation of safely() to see how you can take advantage of this behaviour, for example when fitting many models.

11.2 Case study: Creating your own function operators

  1. Q: Weigh the pros and cons of download.file %>% dot_every(10) %>% delay_by(0.1) vs download.file %>% delay_by(0.1) %>% dot_every(10).

    A:

  2. Q: Should you memoise file.download()? Why/why not?

    A: Memoising file.download() will only work if the files are immutable; i.e. if the file at a given url is always same. There’s no point memoising unless this is true. Even if this is true, however, memoise has to store the results in memory, and large files will potentially take up a lot of memory.

    This implies that it’s probably not beneficial to memoise file.download() in most cases. The only exception is if you are downloading small files many times, and the file at a given url is guaranteed not to change.

  3. Q: Create a function operator that reports whenever a file is created or deleted in the working directory, using dir() and setdiff(). What other global function effects might you want to track?

    A: We first start with a function that simply reports the difference between two vectors of files:

    Then we wrap it up in a function operator

    And try it out by creating wrappers around file.create() and file.remove():

    To create a more serious version of track_dir() one might provide optionality to set the full.names and recursive arguments of dir() to TRUE. This would enable to also track the creation/deletion of hidden files and files in folders contained in the working directory.

    Other global effects that might be worth tracking include changes regarding:

    • the search path and possibly introduced conflicts()
    • options() and par() which modify global settings
    • the path of the working directory
    • environment variables
  4. Q: Write a function operator that logs a timestamp and message to a file every time a function is run.

    A:

  5. Q: Modify delay_by() so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called g <- delay_by(1, f); g(); Sys.sleep(2); g() there shouldn’t be an extra delay.

    A: We can do this with three little tricks (and the help of 42):