28 Functional programming

28.1 Annonymous functions

  1. Q: Given a function, like "mean", match.fun() lets you find a function. Given a function, can you find its name? Why doesn’t that make sense in R?
    A: If you know body(), formals() and environment() it can be possible to find the function. However, this won’t be possible for primitive functions, since they return NULL for those three properties. Also annonymous functions won’t be found, because they are not bound to a name. On the other hand it could be that different names in an environment contain binding to one (or more functions) with the same body(), formals() and environment() which means that the solution wouldn’t be unique. More general: In R a (function) name has an object, but an object (i.e. a function) doesn’t have a name (just a binding sometimes).

  2. Q: Use lapply() and an anonymous function to find the coefficient of variation (the standard deviation divided by the mean) for all columns in the mtcars dataset

    A: lapply(mtcars, function(x) sd(x)/mean(x)).

  3. Q: Use integrate() and an anonymous function to find the area under the curve for the following functions. Use Wolfram Alpha to check your answers.

    1. y = x ^ 2 - x, x in [0, 10]
    2. y = sin(x) + cos(x), x in [-\(\pi\), \(\pi\)]
    3. y = exp(x) / x, x in [10, 20]

    A:

  4. Q: A good rule of thumb is that an anonymous function should fit on one line and shouldn’t need to use {}. Review your code. Where could you have used an anonymous function instead of a named function? Where should you have used a named function instead of an anonymous function?
    A:

28.2 Closures

  1. Q: Why are functions created by other functions called closures?
    A: As stated in the book:

    because they enclose the environment of the parent function and can access all its variables.

  2. Q: What does the following statistical function do? What would be a better name for it? (The existing name is a bit of a hint.)

    A: It is the logarithm, when lambda equals zero and x ^ lambda - 1 / lambda otherwise. A better name might be box_cox_transformation (one parametric), you can read about it (here)[https://en.wikipedia.org/wiki/Power_transform].

  3. Q: What does approxfun() do? What does it return?
    A: approxfun basically takes a combination of 2-dimensional data points + some extra specifications as arguments and returns a stepwise linear or constant interpolation function (defined on the range of given x-values, by default).

  4. Q: What does ecdf() do? What does it return?
    A: “ecdf” means empirical density function. For a numeric vector, ecdf() returns the appropriate density function (of class “ecdf”, which is inheriting from class “stepfun”). You can describe it’s behaviour in 2 steps. In the first part of it’s body, the (x,y) pairs for the nodes of the density function are calculated. In the second part these pairs are given to approxfun.

  5. Q: Create a function that creates functions that compute the ith central moment of a numeric vector. You can test it by running the following code:

    A: For a discrete formulation look here

  6. Q: Create a function pick() that takes an index, i, as an argument and returns a function with an argument x that subsets x with i.

    A:

28.3 Lists of functions

  1. Q: Implement a summary function that works like base::summary(), but uses a list of functions. Modify the function so it returns a closure, making it possible to use it as a function factory.

    A: We have two possibilities, we can imitate base::summary() completely or create a new summary based on our preferences. Both is not so easy, since it involves a lot of design decisions. We choose the second option, since we just like to create a first draft to apply what we have learned and get some feeling for the challenges that might appear.

    Some properties, that our new summary function summary2 should have are nice default actions for specific data types and they should of course be changeable as this is also a part of the exercise. To limit our efforts, we focus on summaries for data frames. Everything else will be explained, via comments on the code:

    # The arguments of our function factory are the lists of functions that are
    # applied to data frame columns, depending on their type.
    # We focus on the most important, so they can be set for characters, integer,
    # double, logical, factor and date. By default they are set to NULL, but if you 
    # supply a list with functions, this will override the real default, for the
    # specific type, which is set inside the function factory.
    summary2 <- function(character_functions = NULL, integer_functions = NULL,
                         double_functions = NULL, logical_functions = NULL, 
                         factor_functions = NULL, date_functions = NULL){
    
      # The following functional will later be six times applied on the data frame,
      # one time for every column type in the scope of our function
      apply_typefunction <- function(df, pred, functions){
        lapply(df[vapply(df, pred, logical(1))],
               function(x) unlist(lapply(functions, function(y) y(x))))
      }
    
      # The following lists of functions are "somehow" similar to those, that are used
      # by base::summary, so we define them once...
      default_1 <- list(Table = table)
      default_2 <- list(Min = min, `1st Qu.` = function(x) quantile(x)[[2]],
                        Median = median, Mean = mean,
                        `3rd Qu.` = function(x) quantile(x)[[4]], Max = max)
    
      # All those function list, that are not specified, when calling the 
      # function factory, are now set to their default values
      if(is.null(character_functions)) {character_functions = default_1}
      if(is.null(integer_functions))   {integer_functions   = default_2}
      if(is.null(double_functions))    {double_functions    = default_2}
      if(is.null(logical_functions))   {logical_functions   = default_1}
      if(is.null(factor_functions))    {factor_functions    = default_1}
      if(is.null(date_functions))      {date_functions      = default_2}
    
      # Finally the returned function is created
      function(df){
    
        # For every column type, the specific functions will be applied to the
        # appropriate columns. 
        characters <- apply_typefunction(df, is.character, character_functions)
        integers   <- apply_typefunction(df, is.integer  , integer_functions  )
        doubles    <- apply_typefunction(df, is.double   , double_functions   )
        logicals   <- apply_typefunction(df, is.logical  , logical_functions  )
    
        factors    <- apply_typefunction(df, is.factor   , factor_functions   )
        dates      <- apply_typefunction(df, function(x) inherits(x, 'Date'), 
                                         date_functions)
    
        # The results will be collected in a list and if empty lists appear, because
        # of non occuring columntypes, these empty lists will be removed from the output.
        # There are a lot of formatting steps, like ordering, naming and converting
        # output, that we could do, but we think that the idea is more important for now
        out <- list(characters, integers, doubles, logicals, 
                factors, dates)
        out[lengths(out) != 0]
      }
    }
    
    # Now we can apply the function factory
    summary2_default <- summary2()
    # And the resulting function
    summary2_default(df = iris)
    #> [[1]]
    #> [[1]]$Sepal.Length
    #>      Min  1st Qu.   Median     Mean  3rd Qu.      Max 
    #> 4.300000 5.100000 5.800000 5.843333 6.400000 7.900000 
    #> 
    #> [[1]]$Sepal.Width
    #>      Min  1st Qu.   Median     Mean  3rd Qu.      Max 
    #> 2.000000 2.800000 3.000000 3.057333 3.300000 4.400000 
    #> 
    #> [[1]]$Petal.Length
    #>     Min 1st Qu.  Median    Mean 3rd Qu.     Max 
    #>   1.000   1.600   4.350   3.758   5.100   6.900 
    #> 
    #> [[1]]$Petal.Width
    #>      Min  1st Qu.   Median     Mean  3rd Qu.      Max 
    #> 0.100000 0.300000 1.300000 1.199333 1.800000 2.500000 
    #> 
    #> 
    #> [[2]]
    #> [[2]]$Species
    #>     Table.setosa Table.versicolor  Table.virginica 
    #>               50               50               50
    
    # Unfortunately, we will fail if there are any NAs in integer columns
    df_nas <- data.frame(integers_na = c(NA, 2:19))
    summary2_default(df_nas)
    #> Error in quantile.default(x): missing values and NaN's not allowed if 'na.rm' is FALSE
    
    # But since, we can define new functions for integer columns, we can solve this
    summary2_naversion <- summary2(integer_functions = list(
      Mean_na = function(x) mean(x, na.rm = TRUE),
      Median_na = function(x) median(x, na.rm = TRUE),
      NAs = function(x) sum(is.na(x)))
      )
    summary2_naversion(df_nas)
    #> [[1]]
    #> [[1]]$integers_na
    #>   Mean_na Median_na       NAs 
    #>      10.5      10.5       1.0
  2. Q: Which of the following commands is equivalent to with(x, f(z))?

    1. x$f(x$z).
    2. f(x$z).
    3. x$f(z).
    4. f(z).
    5. It depends.

    A: b is equivalent. If x is the current environment, also d would work.

28.4 Case study: numerical integration

  1. Q: Instead of creating individual functions (e.g., midpoint(), trapezoid(), simpson(), etc.), we could store them in a list. If we did that, how would that change the code? Can you create the list of functions from a list of coefficients for the Newton-Cotes formulae?
    A:

  2. Q: The trade-off between integration rules is that more complex rules are slower to compute, but need fewer pieces. For sin() in the range [0, \(\pi\)], determine the number of pieces needed so that each rule will be equally accurate. Illustrate your results with a graph. How do they change for different functions? sin(1 / x^2) is particularly challenging.
    A: