# 19 Quasiquotation

## Prerequisites

To continue computing on the language, we keep using the rlang package in this chapter.

library(rlang)

## 19.1 Motivation

1. Q: For each function in the following base R code, identify which arguments are quoted and which are evaluated.

library(MASS)

mtcars2 <- subset(mtcars, cyl == 4)

with(mtcars2, sum(vs))
sum(mtcars2$am) rm(mtcars2) A: For each argument we first follow the advice from the textbook and execute the argument outside of the respective function. Since MASS, cyl, vs and am are not objects contained in the global environment, their execution raises an “Object not found” error. This way we confirm that the respective function arguments are quoted. For the other arguments, we may inspect the source code (and the documentation) to check if any quoting mechanisms are applied or the arguments are evaluated. library(MASS) # MASS -> quoted library() also accepts character vectors and doesn’t quote when character.only is set to TRUE, so library(MASS, character.only = TRUE) will raise an error. mtcars2 <- subset(mtcars, cyl == 4) # mtcars -> evaluated # cyl -> quoted with(mtcars2, sum(vs)) # mtcars2 -> evaluated # sum(vs) -> quoted sum(mtcars2$am)  # matcars$am -> evaluated # am -> quoted by$()    

When we inspect the source code of rm(), we notice that rm() catches its ... argument as an unevaluated call (in this case a pairlist) via match.call(). This call is then converted into a string for further evaluation.

rm(mtcars2)  # mtcars2 -> quoted
2. Q: For each function in the following tidyverse code, identify which arguments are quoted and which are evaluated.

library(dplyr)
library(ggplot2)

by_cyl <- mtcars %>%
group_by(cyl) %>%
summarise(mean = mean(mpg))

ggplot(by_cyl, aes(cyl, mean)) + geom_point()

A: From the previous exercise we’ve already learned that library() quotes its first argument.

library(dplyr)    # dplyr   -> quoted
library(ggplot2)  # ggplot2 -> quoted

In similar fashion, it becomes clear that cyl is quoted by group_by().

by_cyl <- mtcars %>%           # mtcars -> evaluated
group_by(cyl) %>%            # cyl -> quoted
summarise(mean = mean(mpg))  # mean = mean(mpg) -> quoted

To find out what happens in summarise(), we inspect the source code. Tracing down the S3-dispatch of summarise(), we see that the ... argument is quoted in the underlying summarise.tbl_df() method.

dplyr::summarise
#> function (.data, ...)
#> {
#>     UseMethod("summarise")
#> }
#> <bytecode: 0x5998760>
#> <environment: namespace:dplyr>

dplyr:::summarise.tbl_df
#> function (.data, ...)
#> {
#>     dots <- enquos(..., .named = TRUE)
#>     summarise_impl(.data, dots, environment(), caller_env())
#> }
#> <bytecode: 0x67f85a0>
#> <environment: namespace:dplyr>

In the following ggplot2 expression the cyl- and mean-objects are quoted.

ggplot(by_cyl,            # by_cyl -> evaluated
aes(cyl, mean)) +  # aes() -> evaluated
# cyl, mean -> quoted (via aes)
geom_point() 

We can confirm this also by inspecting aes()’s source code.

ggplot2::aes
#> function (x, y, ...)
#> {
#>     exprs <- enquos(x = x, y = y, ..., .ignore_empty = "all")
#>     aes <- new_aes(exprs, env = parent.frame())
#>     rename_aes(aes)
#> }
#> <bytecode: 0x57cbe50>
#> <environment: namespace:ggplot2>

## 19.2 Quoting

1. Q: How is expr() implemented? Look at its source code.

A: expr() acts as a simple wrapper, which passes its argument to enexpr().

expr
#> function (expr)
#> {
#>     enexpr(expr)
#> }
#> <bytecode: 0x6590a28>
#> <environment: namespace:rlang>
2. Q: Compare and contrast the following two functions. Can you predict the output before running them?

f1 <- function(x, y) {
exprs(x = x, y = y)
}
f2 <- function(x, y) {
enexprs(x = x, y = y)
}
f1(a + b, c + d)
f2(a + b, c + d)

A: Both functions are able to capture multiple arguments and will return a named list of expressions. f1() will return the arguments defined within the body of f1(). This happens because exprs() captures the expressions as specified by the developer during the definition of f1.

f1(a + b, c + d)
#> $x #> x #> #>$y
#> y

f2() will return the arguments supplied to f2() as specified by the user when the function is called.

f2(a + b, c + d)
#> $x #> a + b #> #>$y
#> c + d
3. Q: What happens if you try to use enexpr() with an expression (i.e. enexpr(x + y))? What happens if enexpr() is passed a missing argument?

A: In the first case an error is thrown:

on_expr <- function(x) {enexpr(expr(x))}
on_expr(x + y)
#> Error: arg must be a symbol

In the second case a missing argument is returned:

on_missing <- function(x) {enexpr(x)}
on_missing()
is_missing(on_missing())
#> [1] TRUE
4. Q: How are exprs(a) and exprs(a = ) different? Think about both the input and the output.

A: In exprs(a) the input a is interpreted as a symbol for an unnamed argument. Consequently the output shows an unnamed list with the first element containing the symbol a.

out1 <- exprs(a)
str(out1)
#> List of 1
#>  $: symbol a In exprs(a = ) the first argument is named a, but then no value is provided. This leads to the output of a named list with the first element named a, which contains the missing argument. out2 <- exprs(a = ) str(out2) #> List of 1 #>$ a: symbol
is_missing(out2$a) #> [1] TRUE 5. Q: What are other differences between exprs() and alist()? Read the documentation for the named arguments of exprs() to find out. A: exprs() provides the additional arguments .named (= FALSE), .ignore_empty (c("trailing", "none", "all")) and .unquote_names (TRUE). .named allows to ensure that all dots are named. ignore_empty allows to specify how empty arguments should be handled for dots ("trailing") or all arguments ("none" and "all"). Further via .unquote_names one can specify if := should be treated like =. := can be useful as it supports unquoting (!!) on the left-hand-side. 6. Q: The documentation for substitute() says: Substitution takes place by examining each component of the parse tree as follows: • If it is not a bound symbol in env, it is unchanged. • If it is a promise object (i.e., a formal argument to a function) the expression slot of the promise replaces the symbol. • If it is an ordinary variable, its value is substituted, unless env is .GlobalEnv in which case the symbol is left unchanged. Create examples that illustrate each of the above cases. A: Let’s create a new environment a, which contains no objects. In this case substitute() will just return its first argument (expr): my_env <- env() substitute(x, my_env) #> x When we create a function containing one argument, which is directly returned after substitution. This function just returns the provided expression: foo <- function(x) substitute(x) foo(x + y * sin(0)) #> x + y * sin(0) In case substitute() can find (parts of) the expression in env, it will literally substitute. However, unless env is .GlobalEnv. my_env$x <- 7
substitute(x, my_env)
#> [1] 7

x <- 7
substitute(x, .GlobalEnv)
#> x

## 19.3 Unquoting

1. Q: Given the following components:

xy <- expr(x + y)
xz <- expr(x + z)
yz <- expr(y + z)
abc <- exprs(a, b, c)

Use quasiquotation to construct the following calls:

(x + y) / (y + z)               # (1)
-(x + z) ^ (y + z)              # (2)
(x + y) + (y + z) - (x + y)     # (3)
atan2(x + y, y + z)             # (4)
sum(x + y, x + y, y + z)        # (5)
sum(a, b, c)                    # (6)
mean(c(a, b, c), na.rm = TRUE)  # (7)
foo(a = x + y, b = y + z)       # (8)

A: We combine and unquote the given quoted expressions to construct the desired calls like this:

expr(!!xy / !!yz)                    # (1)
#> (x + y)/(y + z)

expr(-(!!xz)^(!!yz))                 # (2)
#> -(x + z)^(y + z)

expr(!!xy + !!yz - !!xz)             # (3)
#> x + y + (y + z) - (x + z)

expr(atan2(!!xy, !!yz))              # (4)
#> atan2(x + y, y + z)

expr(sum(!!xy, !!xy, !!yz))          # (5)
#> sum(x + y, x + y, y + z)

expr(sum(!!!abc))                    # (6)
#> sum(a, b, c)

expr(mean(c(!!!abc), na.rm = TRUE))  # (7)
#> mean(c(a, b, c), na.rm = TRUE)

expr(foo(a = xy, b = yz))            # (8)
#> foo(a = xy, b = yz)
2. Q: The following two calls print the same, but are actually different:

(a <- expr(mean(1:10)))
#> mean(1:10)
(b <- expr(mean(!!(1:10))))
#> mean(1:10)
identical(a, b)
#> [1] FALSE

What’s the difference? Which one is more natural?

A: It’s easiest to see the difference with lobstr::ast():

lobstr::ast(mean(1:10))
#> █─mean
#> └─█─:
#>   ├─1
#>   └─10
lobstr::ast(mean(!!(1:10)))
#> █─mean
#> └─<inline integer>

In the expression assigned to b the call 1:10 is evaluated to an integer vector, while still being a call object in a.

The first version (a) seems more natural. It captures lazy evaluation, with a promise that is evaluated when the function is called. The second version (b) inlines a vector directly in to a call.

## 19.4 Dot-dot-dot (...)

1. Q: One way to implement exec() is shown below. Describe how it works. What are the key ideas?

exec <- function(f, ..., .env = caller_env()) {
args <- list2(...)
do.call(f, args, envir = .env)
}

A: exec() takes a function and its arguments as input, as well as an environment. This allows to construct a call from the function and the arguments and evaluate it in the supplied environment. As the ... argument is handled via list2(), exec() supports tidy dots (quasiquotation), which means that arguments and names (on the left-hand-side of :=) can be unquoted via !! and !!!.

2. Q: Carefully read the source code for interaction(), expand.grid(), and par(). Compare and contrast the techniques they use for switching between dots and list behaviour.

A: All three functions capture the dots via args <- list(...). interaction() and expand.grid() return early in case of length(args) == 0.

As interaction() computes factors regarding combinations of args elements, interaction() iterates over args and doesn’t differentiate further between list and dots behaviour. Only the case length(args) == 1 && is.list(args[[1]]) are treated via args <- args[[1]]. Consequently lists deeper than 1 level raise errors in other parts of the code.

# These work and return the same
identical(
interaction(     a = c("a", "b", "c", "d"), b = c("e", "f")),
interaction(list(a = c("a", "b", "c", "d"), b = c("e", "f")))
)
#> [1] TRUE

# This doesn't work
interaction(list(list(a = c("a", "b", "c", "d"), b = c("e", "f"))))
#> Error in order(y): unimplemented type 'list' in 'orderVector1'

expand.grid() switches in exactly the same way as interaction(). I.e. it also assigns args <- args[[1]] in case of length(args) == 1 && is.list(args[[1]]) is TRUE.

par() preprocesses args the most in order to ensure that it becomes a list (or NULL). First, in case no dots were supplied (length(list(...)) == 0) par() creates a list from an internal character vector (partly depending on par()’s no.readonly argument). Further, in case all elements of args are character vectors (all(unlist(lapply(args, is.character)))) args is turned into a list via as.list(unlist(args)). When args is of length one with its first element being a list or NULL args becomes args <- args[1].

3. Q: Explain the problem with this definition of set_attr()

set_attr <- function(x, ...) {
attr <- rlang::list2(...)
attributes(x) <- attr
x
}
set_attr(1:10, x = 10)
#> Error in attributes(x) <- attr: attributes must be named

A: set_attr() expects an object to be passed as the x argument and its new attributes via the dots. Unfortunately, this prohibits us to provide attributes named x as these would collide with the argument name of our object. Even omitting the object’s argument name doesn’t help in this case - as can be seen in the example where the object is consequently treated as an unnamed attribute.

However, we may name the first argument .x, which seems clearer and less likely to invoke errors. In this case 1:10 will get the (named) attribute x = 10 assigned:

set_attr <- function(.x, ...) {
attr <- rlang::list2(...)

attributes(.x) <- attr
.x
}

set_attr(1:10, x = 10)
#>  [1]  1  2  3  4  5  6  7  8  9 10
#> attr(,"x")
#> [1] 10

## 19.5 Case studies

1. Q: In the linear-model example, we could replace the expr() in reduce(summands, ~ expr(!!.x + !!.y)) with call2(): reduce(summands, call2, "+"). Compare and contrast the two approaches. Which do you think is easier to read?

A: We would consider the first version to be more readable. There seems to be a little more boilerplate code at first, but the unquoting syntax is very readable. Overall the whole expression seems more explicit and less complex.

2. Q: Re-implement the Box-Cox transform defined below using unquoting and new_function():

bc <- function(lambda) {
if (lambda == 0) {
function(x) log(x)
} else {
function(x) (x ^ lambda - 1) / lambda
}
}

A: Here new_function() allows us to create a function factory using tidy evaluation.

bc2 <- function(lambda){
lambda <- enexpr(lambda)

if (!!lambda == 0) {
new_function(exprs(x = ), expr(log(x)))
} else {
new_function(exprs(x = ), expr((x ^ (!!lambda) - 1) / !!lambda))
}
}

bc2(0)
#> function (x)
#> log(x)
#> <environment: 0x5c76870>
bc2(2)
#> function (x)
#> (x^2 - 1)/2
#> <environment: 0x5ccab20>
bc2(2)(2)
#> [1] 1.5
3. Q:Re-implement the simple compose() defined below using quasiquotation and new_function():

compose <- function(f, g) {
function(...) f(g(...))
}

A: The implementation is fairly straightforward, even though a lot of parentheses are required:

compose2 <- function(f, g){
f <- enexpr(f)
g <- enexpr(g)

new_function(exprs(... = ), expr((!!f)((!!g)(...))))
}

compose(sin, cos)
#> function(...) f(g(...))
#> <environment: 0x660e338>
compose(sin, cos)(pi)
#> [1] -0.841
compose2(sin, cos)
#> function (...)
#> sin(cos(...))
#> <environment: 0x6805f10>
compose2(sin, cos)(pi)
#> [1] -0.841