19 Quasiquotation

Prerequisites

To further compute on the language, we mainly use the rlang package in this section.

library(rlang)

19.1 Motivation

  1. Q: For each function in the following base R code, identify which arguments are quoted and which are evaluated.

    library(MASS)
    
    mtcars2 <- subset(mtcars, cyl == 4)
    
    with(mtcars2, sum(vs))
    sum(mtcars2$am)
    
    rm(mtcars2)

    A:

    library(MASS)  # MASS -> quoted
    # library also accepts "MASS", which would be evaluated
    
    mtcars2 <- subset(mtcars, cyl == 4)  # mtcars -> evaluated
                                         # cyl    -> quoted
    
    with(mtcars2, sum(vs))  # mtcars2 -> evaluated
                            # sum(vs) -> quoted
    sum(mtcars2$am)  # matcars$am -> evaluated
                     # am -> quoted (via `$`)    
    
    rm(mtcars2)  # mtcars2 -> quoted

    Some of the arguments (mtcars or mtcars2) are objects, which can be found in the global environment. When you type them into the console, the object will be returned. Others such as cyl, sum(vs) or am will need to be evaluated within a certain environment. That’s why they are quoted.

  2. Q: For each function in the following tidyverse code, identify which arguments are quoted and which are evaluated.

    library(dplyr)
    library(ggplot2)
    
    by_cyl <- mtcars %>%
      group_by(cyl) %>%
      summarise(mean = mean(mpg))
    
    ggplot(by_cyl, aes(cyl, mean)) + geom_point()

    A:

    library(dplyr)    # dplyr -> quoted
    library(ggplot2)  # ggplot2 -> quoted
    
    by_cyl <- mtcars %>%  # mtcars -> evaluated
      group_by(cyl) %>%   # cyl -> quoted
      summarise(mean = mean(mpg))  # mean, mean() and mpg -> quoted
    
    ggplot(by_cyl,  # by_cyl -> evaluated
           aes(cyl, mean)) +  # aes() -> evaluated
                              # cyl, mean -> quoted (via aes)
      geom_point() 

    The column names in piped dplyr-statements need to be quoted, so they can be found in the specified dataframe. The names of new variables as defined on the LHS of the summarise-expression are also quoted, while the function calls on the RHS will be evaluated.

19.2 Quoting

  1. Q: How is expr() implemented? Look at its source code.

    A: expr() simply directs it’s argument into enexpr().

    expr
    #> function (expr) 
    #> {
    #>     enexpr(expr)
    #> }
    #> <bytecode: 0x39d3038>
    #> <environment: namespace:rlang>
  2. Q: Compare and contrast the following two functions. Can you predict the ouput before running them?

    f1 <- function(x, y) {
      exprs(x = x, y = y)
    }
    f2 <- function(x, y) {
      enexprs(x = x, y = y)
    }
    f1(a + b, c + d)
    #> $x
    #> x
    #> 
    #> $y
    #> y
    f2(a + b, c + d)
    #> $x
    #> a + b
    #> 
    #> $y
    #> c + d

    A: Both functions are able to capture multiple arguments and will return a named list of expressions. f1() will return the arguments defined within the body of f1(), because exprs() captures the expressions as specified by the developer during the definition of f1. f2() will return the arguments supplied to f2() as specified by the user when the function is called.

  3. Q: What happens if you try to use enexpr() with an expression (i.e. enexpr(x + y))? What happens if enexpr() is passed a missing argument?

    A: In the first case we’ll get an error:

    library(rlang)
    
    on_expr <- function(x) {enexpr(expr(x))}
    on_expr(x + y)
    #> Warning: `rlang__backtrace_on_error` is no longer experimental.
    #> It has been renamed to `rlang_backtrace_on_error`. Please update your RProfile.
    #> This warning is displayed once per session.
    #> Error: `arg` must be a symbol

    In the second case a missing argument is returned:

    on_missing <- function(x) {enexpr(x)}
    on_missing()
    is_missing(on_missing())
    #> [1] TRUE
  4. Q: How are exprs(a) and exprs(a = ) different? Think about both the input and the output.

    A: In exprs(a) the input a is interpreted as a symbol for an unnamed argument. Consequently the output shows an unnamed list with the first element containing the symbol a. In exprs(a = ) the first argument is named a, but then no value is provided. This leads to the output of a named list with the first element named a, which contains the missing argument.

    out1 <- exprs(a)
    str(out1)
    #> List of 1
    #>  $ : symbol a
    out2 <- exprs(a = )
    str(out2)
    #> List of 1
    #>  $ a: symbol
    is_missing(out2$a)
    #> [1] TRUE
  5. Q: What are other differences between exprs() and alist()? Read the documentation for the named arguments of exprs() to find out.

    A: exprs() provides the additional arguments .named (= FALSE), .ignore_empty (c("trailing", "none", "all")) and .unquote_names (TRUE). .named allows to ensure taht all dots are named. ignore_empty allows to specify how empty arguments should be handled for dots ("trailing") or all arguments ("none" and "all"). Further via .unquote_names one can specify if := should be treated like =. := can be useful as it supports unquoting (!!) on the left-hand-side.

  6. Q: The documentation for substitute() says:

    Substitution takes place by examining each component of the parse tree as follows:

    • If it is not a bound symbol in env, it is unchanged.
    • If it is a promise object (i.e., a formal argument to a function) the expression slot of the promise replaces the symbol.
    • If it is an ordinary variable, its value is substituted;
    • Unless env is .GlobalEnv in which case the symbol is left unchanged.

    Create examples that illustrate each of the four different cases.

19.3 Unquoting

  1. Q: Given the following components:

    xy <- expr(x + y)
    xz <- expr(x + z)
    yz <- expr(y + z)
    abc <- exprs(a, b, c)

    Use quasiquotation to construct the following calls:

    (x + y) / (y + z)
    -(x + z) ^ (y + z)
    (x + y) + (y + z) - (x + y)
    atan2(x + y, y + z)
    sum(x + y, x + y, y + z)
    sum(a, b, c)
    mean(c(a, b, c), na.rm = TRUE)
    foo(a = x + y, b = y + z)

    A:

    #1  (x + y) / (y + z)
    expr(!!xy / !!yz)
    #> (x + y)/(y + z)
    #2  -(x + z) ^ (y + z)
    expr(-(!!xz)^(!!yz))
    #> -(x + z)^(y + z)
    #3  (x + y) + (y + z) - (x + y)
    expr(!!xy + !!yz - !!xz)
    #> x + y + (y + z) - (x + z)
    #4  atan2(x + y, y + z)
    expr(atan2(!!xy, !!yz))
    #> atan2(x + y, y + z)
    #5  sum(x + y, x + y, y + z)
    expr(sum(!!xy, !!xy, !!yz))
    #> sum(x + y, x + y, y + z)
    #6  sum(a, b, c)
    expr(sum(!!!abc))
    #> sum(a, b, c)
    #7  mean(c(a, b, c), na.rm = TRUE)
    expr(mean(c(!!!abc), na.rm = TRUE))
    #> mean(c(a, b, c), na.rm = TRUE)
    #8  foo(a = x + y, b = y + z)
    expr(foo(a = xy, b = yz))
    #> foo(a = xy, b = yz)
  2. Q: The following two calls print the same, but are actually different:

    (a <- expr(mean(1:10)))
    #> mean(1:10)
    (b <- expr(mean(!!(1:10))))
    #> mean(1:10)
    identical(a, b)
    #> [1] FALSE

    What’s the difference? Which one is more natural?

    A: call evalulates its ... arguments. So in the first call 1:10 will be evaluated to an integer (1, 2, 3, …, 10) and in the second call quote() compensates the effect of the evaluation, so that b’s second element will be the expression 1:10 (which is again a call):

    as.list(a)
    #> [[1]]
    #> mean
    #> 
    #> [[2]]
    #> 1:10
    as.list(b)
    #> [[1]]
    #> mean
    #> 
    #> [[2]]
    #>  [1]  1  2  3  4  5  6  7  8  9 10

    We can create an example, where we can see the consequences directly:

    arg <- seq(10)
    call1 <- call("mean", arg)
    print(call1)
    #> mean(1:10)
    call2 <- call("mean", quote(arg))
    print(call2)
    #> mean(arg)
    eval(call1)
    #> [1] 5.5
    eval(call2)
    #> [1] 5.5

    I would prefer the second version, since it behaves more like lazy evaluation. It’s better to have call args depends on the calling environment rather than the enclosing environment,that’s more similar to normal function behavior.

19.4 Dot-dot-dot (...)

  1. Q: One way to implement exec() is shown below. Describe how it works. What are the key ideas?

    exec <- function(f, ..., .env = caller_env()) {
      args <- list2(...)
      do.call(f, args, envir = .env)
    }

    A: exec() takes a function together with its arguments and an environment as input. The idea is to construct a call from the function and its arguments and evaluate it in the supplied environment. As the ... argument is handled via list2(), exec supports tidy dots (quasiquotation), which means that one may unquote arguments via !!! and names on the LHS of := via !!.

  2. Q: Carefully read the source code for interaction(), expand.grid(), and par(). Compare and constract the techniques they use for switching between dots and list behaviour.

  3. Q: Explain the problem with this defintion of set_attr()

    set_attr <- function(x, ...) {
      attr <- rlang::list2(...)
      attributes(x) <- attr
      x
    }
    set_attr(1:10, x = 10)
    #> Error in attributes(x) <- attr: attributes must be named

    A: In this example we first learn that attributes must be named, as correctly given out by the error message. However, this behaviour mainly occures, because the first argument of set_attr() is named x as in the function call below. So the other argument in the set_attr() function call (1:10) is the only one, which is supplied as (unnamed) usage of the ellipsis. Therefore set_attr() tries to assign 1:10 as attribute to x = 10 and the error occures.

    The function becomes probably clearer and less error-prone when we name the first argument .x again. In this case 1:10 will get the (named) attribute x = 10 assigned:

    set_attr <- function(.x, ...) {
      attr <- rlang::list2(...)
    
      attributes(.x) <- attr
      .x
    }
    
    set_attr(1:10, x = 10)
    #>  [1]  1  2  3  4  5  6  7  8  9 10
    #> attr(,"x")
    #> [1] 10

19.5 Case studies

  1. Q: In the linear-model example, we could replace the expr() in reduce(summands, ~ expr(!!.x + !!.y)) with call2(): reduce(summands, call2, "+"). Compare and contrast the two approaches. Which do you think is easier to read?

  2. Q:Re-implement the Box-Cox transform defined below using unquoting and new_function():

    bc <- function(lambda) {
      if (lambda == 0) {
        function(x) log(x)
      } else {
        function(x) (x ^ lambda - 1) / lambda
      }
    }

    A:

    bc2 <- function(lambda){
      lambda <- enexpr(lambda)
    
      if (!!lambda == 0) {
        new_function(exprs(x = ), expr(log(x)))
        } else {
          new_function(exprs(x = ), expr((x^(!!lambda) - 1) / !!lambda))
        }
      }
    
    bc2(0)
    #> function (x) 
    #> log(x)
    #> <environment: 0x5382288>
    bc2(2)
    #> function (x) 
    #> (x^2 - 1)/2
    #> <environment: 0x53e77d8>
    bc2(2)(2)
    #> [1] 1.5
  3. Q:Re-implement the simple compose() defined below using quasiquotation and new_function():

    compose <- function(f, g) {
      function(...) f(g(...))
    }

    A: The implementation is straight forward. However, it can become tough to handle all bracktes correct at the first try:

    compose2 <- function(f, g){
      f <- enexpr(f)
      g <- enexpr(g)
    
      new_function(exprs(... = ), expr((!!f)((!!g)(...))))
    }
    
    compose(sin, cos)
    #> function(...) f(g(...))
    #> <environment: 0x5930298>
    compose(sin, cos)(pi)
    #> [1] -0.841
    compose2(sin, cos)
    #> function (...) 
    #> sin(cos(...))
    #> <environment: 0x605fa08>
    compose2(sin, cos)(pi)
    #> [1] -0.841

19.6 Old exercises Unquoting

  1. Q: What does the following command return? What information is lost? Why?

    expr({
      x +              y # comment  
    })

    A: When we look at the captured expression, we see that the extra whitespaces and comments are lost. R ignores them when parsing an expression. They do do not need to be represented in the AST, because they do not affect the evaluation of the expression.

    library(rlang)
    captured_expression <- expr({
      x +              y # comment  
    })
    
    captured_expression
    #> {
    #>     x + y
    #> }

    However, it is possible to retrieve the original input through the attributes of the captured expression:

    attributes(captured_expression)
    #> $srcref
    #> $srcref[[1]]
    #> {
    #> 
    #> $srcref[[2]]
    #> x +              y
    #> 
    #> 
    #> $srcfile
    #> <text> 
    #> 
    #> $wholeSrcref
    #> library(rlang)
    #> captured_expression <- expr({
    #>   x +              y # comment  
    #> }

19.7 Unquoting

  1. Q: Explain why both !0 + !0 and !1 + !1 return FALSE while !0 + !1 returns TRUE.

    A: To answer this question we look at the AST of the first example:

    library(lobstr)
    
    ast(!0 + !0)
    #> █─`!` 
    #> └─█─`+` 
    #>   ├─0 
    #>   └─█─`!` 
    #>     └─0

    As the coercion rules are the same in all examples, we can use the precedence order (right to left) to explain all three examples:

    • !0 + !0:
      So the second zero gets coerced to FALSE and !FALSE becomes TRUE.
      0 + TRUE gets coerced to 1.
      !1 becomes !TRUE which is FALSE
    • !1 + !1:
      So !1 is FALSE.
      1 + FALSE is 1.
      !1 is !TRUE so FALSE.
    • !0 + !1:
      !1 is FALSE.
      0 + FALSE is 0.
      !0 is TRUE.
  2. Q: Base functions match.fun(), page(), and ls() all try to automatically determine whether you want standard or non-standard evaluation. Each uses a different approach. Figure out the essence of each approach by reading the source code, then compare and contrast the techniques.

19.8 Case studies

  1. Q: Implement arrange_desc(), a variant of dplyr::arrange() that sorts in descending order by default.

    A: We just have to catch the ... from arrange() as an expression and modify the expression to be wrapped inside desc(). Afterwards we evaluate this new code within a regular arrange() call:

    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    library(purrr)
    #> 
    #> Attaching package: 'purrr'
    #> The following object is masked _by_ '.GlobalEnv':
    #> 
    #>     compose
    #> The following objects are masked from 'package:rlang':
    #> 
    #>     %@%, as_function, flatten, flatten_chr, flatten_dbl,
    #>     flatten_int, flatten_lgl, flatten_raw, invoke, list_along,
    #>     modify, prepend, splice
    
    arrange_desc <- function(.data, ...){
      increasing <- enexprs(...)
      decreasing <- map(increasing, ~ expr(desc(!!.x)))
    
      arrange(.data, !!!decreasing)
    }

    Let’s try it out

    d <- data.frame(abc = letters[1:6],
                    id1 = 1:6,
                    id2 = rep(1:2, 3))
    
      # old behaviour
    d %>% arrange(id2, id1)
    #>   abc id1 id2
    #> 1   a   1   1
    #> 2   c   3   1
    #> 3   e   5   1
    #> 4   b   2   2
    #> 5   d   4   2
    #> 6   f   6   2
    
    # new descending behaviour
    d %>% arrange_desc(id2, id1)
    #>   abc id1 id2
    #> 1   f   6   2
    #> 2   d   4   2
    #> 3   b   2   2
    #> 4   e   5   1
    #> 5   c   3   1
    #> 6   a   1   1
  2. Q: Implement filter_or(), a variant of dplyr::filter() that combines multiple arguments using | instead of &.

    A: This time we just need to collapse the ... arguments with |. Therefore we can use purrr::reduce() and afterwards we just need to evaluate the new code within a regular filter call:

    filter_or <- function(.data, ...){
      normal <- enexprs(...)
    
      normal_or <- reduce(normal, function(x, y) expr(!!x | !!y))
    
      filter(.data, !!!normal_or)
    }
    
    # and test it
    d <- data.frame(x = 1:6, y = 6:1)
    filter_or(d, x < 3, y < 3)
    #> Warning: Unquoting language objects with `!!!` is soft-deprecated as of rlang 0.3.0.
    #> Please use `!!` instead.
    #> 
    #>   # Bad:
    #>   dplyr::select(data, !!!enquo(x))
    #> 
    #>   # Good:
    #>   dplyr::select(data, !!enquo(x))    # Unquote single quosure
    #>   dplyr::select(data, !!!enquos(x))  # Splice list of quosures
    #> 
    #> This warning is displayed once per session.
    #>   x y
    #> 1 1 6
    #> 2 2 5
    #> 3 5 2
    #> 4 6 1
  3. Q:Implement partition_rows() which, like partition_cols(), returns two data frames, one containing the selected rows, and the other containing the rows that weren’t selected.

    A: We just have to decide if we focus on integer subsetting via dplyr::slice() or logical subsetting via dplyr::filter(). The rest is straightforward. Since the implementations of both subsetting styles are completely equivalent we just choose one without any particular reason:

    partition_rows <- function(.data, ...){
      included <- enexprs(...)
      excluded <- map(included, ~ expr(!(!!.x)))
    
      list(
        incl = filter(.data, !!!included),
        excl = filter(.data, !!!excluded)
      )
    }
    
    d <- data.frame(x = 1:6, y = 6:1)
    partition_rows(d, x <= 3)
    #> $incl
    #>   x y
    #> 1 1 6
    #> 2 2 5
    #> 3 3 4
    #> 
    #> $excl
    #>   x y
    #> 1 4 3
    #> 2 5 2
    #> 3 6 1
  4. Q:Add error handling to slice(). Give clear error messages if either along or index have invalid values (i.e. not numeric, not length 1, too small, or too big).