15 Quasiquotation
Prerequisites
To continue computing on the language, we keep using the rlang package in this chapter.
15.1 Motivation
Q1: For each function in the following base R code, identify which arguments are quoted and which are evaluated.
library(MASS)
mtcars2 <- subset(mtcars, cyl == 4)
with(mtcars2, sum(vs))
sum(mtcars2$am)
rm(mtcars2)
A: For each argument we first follow the advice from Advanced R and execute the argument outside of the respective function. Since MASS
, cyl
, vs
and am
are not objects contained in the global environment, their execution raises an “Object not found” error. This way we confirm that the respective function arguments are quoted. For the other arguments, we may inspect the source code (and the documentation) to check if any quoting mechanisms are applied or the arguments are evaluated.
library()
also accepts character vectors and doesn’t quote when character.only
is set to TRUE
, so library(MASS, character.only = TRUE)
would raise an error.
mtcars2 <- subset(mtcars, cyl == 4) # mtcars -> evaluated
# cyl -> quoted
with(mtcars2, sum(vs)) # mtcars2 -> evaluated
# sum(vs) -> quoted
sum(mtcars2$am) # matcars$am -> evaluated
# am -> quoted by $()
When we inspect the source code of rm()
, we notice that rm()
catches its ...
argument as an unevaluated call (in this case a pairlist) via match.call()
. This call is then converted into a string for further evaluation.
rm(mtcars2) # mtcars2 -> quoted
Q2: For each function in the following tidyverse code, identify which arguments are quoted and which are evaluated.
library(dplyr)
library(ggplot2)
by_cyl <- mtcars %>%
group_by(cyl) %>%
summarise(mean = mean(mpg))
ggplot(by_cyl, aes(cyl, mean)) + geom_point()
A: From the previous exercise we’ve already learned that library()
quotes its first argument.
In similar fashion, it becomes clear that cyl
is quoted by group_by()
.
by_cyl <- mtcars %>% # mtcars -> evaluated
group_by(cyl) %>% # cyl -> quoted
summarise(mean = mean(mpg)) # mean = mean(mpg) -> quoted
To find out what happens in summarise()
, we inspect the source code. Tracing down the S3-dispatch of summarise()
, we see that the ...
argument is quoted in dplyr:::summarise_cols()
which is called in the underlying summarise.data.frame()
method.
dplyr::summarise
#> function (.data, ..., .groups = NULL)
#> {
#> UseMethod("summarise")
#> }
#> <bytecode: 0x7fdd17d26010>
#> <environment: namespace:dplyr>
dplyr:::summarise.data.frame
#> function (.data, ..., .groups = NULL)
#> {
#> cols <- summarise_cols(.data, ...)
#> out <- summarise_build(.data, cols)
#> if (identical(.groups, "rowwise")) {
#> out <- rowwise_df(out, character())
#> }
#> out
#> }
#> <bytecode: 0x7fdd1844e908>
#> <environment: namespace:dplyr>
dplyr:::summarise_cols
#> function (.data, ...)
#> {
#> mask <- DataMask$new(.data, caller_env())
#> dots <- enquos(...)
#> dots_names <- names(dots)
#> auto_named_dots <- names(enquos(..., .named = TRUE))
#> cols <- list()
#> sizes <- 1L
#> chunks <- vector("list", length(dots))
#> types <- vector("list", length(dots))
#>
#> ## function definition abbreviated for clarity ##
#> }
#> <bytecode: 0x55b540c07ca0>
#> <environment: namespace:dplyr>
In the following ggplot2 expression the cyl
- and mean
-objects are quoted.
ggplot(by_cyl, # by_cyl -> evaluated
aes(cyl, mean)) + # aes() -> evaluated
# cyl, mean -> quoted (via aes)
geom_point()
We can confirm this also by inspecting aes()
’s source code.
ggplot2::aes
#> function (x, y, ...)
#> {
#> exprs <- enquos(x = x, y = y, ..., .ignore_empty = "all")
#> aes <- new_aes(exprs, env = parent.frame())
#> rename_aes(aes)
#> }
#> <bytecode: 0x7fdd18833fc8>
#> <environment: namespace:ggplot2>
15.2 Quoting
Q1: How is expr()
implemented? Look at its source code.
A: expr()
acts as a simple wrapper, which passes its argument to enexpr()
.
expr
#> function (expr)
#> {
#> enexpr(expr)
#> }
#> <bytecode: 0x7fdd189e9c80>
#> <environment: namespace:rlang>
Q2: Compare and contrast the following two functions. Can you predict the output before running them?
f1 <- function(x, y) {
exprs(x = x, y = y)
}
f2 <- function(x, y) {
enexprs(x = x, y = y)
}
f1(a + b, c + d)
f2(a + b, c + d)
A: Both functions are able to capture multiple arguments and will return a named list of expressions. f1()
will return the arguments defined within the body of f1()
. This happens because exprs()
captures the expressions as specified by the developer during the definition of f1()
.
f1(a + b, c + d)
#> $x
#> x
#>
#> $y
#> y
f2()
will return the arguments supplied to f2()
as specified by the user when the function is called.
f2(a + b, c + d)
#> $x
#> a + b
#>
#> $y
#> c + d
Q3: What happens if you try to use enexpr()
with an expression (i.e. enexpr(x + y)
)? What happens if enexpr()
is passed a missing argument?
A: In the first case an error is thrown:
In the second case a missing argument is returned:
on_missing <- function(x) {enexpr(x)}
on_missing()
is_missing(on_missing())
#> [1] TRUE
Q4: How are exprs(a)
and exprs(a = )
different? Think about both the input and the output.
A: In exprs(a)
the input a
is interpreted as a symbol for an unnamed argument. Consequently, the output shows an unnamed list with the first element containing the symbol a
.
In exprs(a = )
the first argument is named a
, but then no value is provided. This leads to the output of a named list with the first element named a
, which contains the missing argument.
out2 <- exprs(a = )
str(out2)
#> List of 1
#> $ a: symbol
is_missing(out2$a)
#> [1] TRUE
Q5: What are other differences between exprs()
and alist()
? Read the documentation for the named arguments of exprs()
to find out.
A: exprs()
provides the additional arguments .named
(= FALSE
), .ignore_empty
(c("trailing", "none", "all")
) and .unquote_names
(TRUE
). .named
allows to ensure that all dots are named. ignore_empty
allows to specify how empty arguments should be handled for dots ("trailing"
) or all arguments ("none"
and "all"
). Further via .unquote_names
one can specify if :=
should be treated like =
. :=
can be useful as it supports unquoting (!!
) on the left-hand side.
Q6: The documentation for substitute()
says:
Substitution takes place by examining each component of the parse tree as follows:
- If it is not a bound symbol in
env
, it is unchanged.- If it is a promise object (i.e. a formal argument to a function) the expression slot of the promise replaces the symbol.
- If it is an ordinary variable, its value is substituted, unless
env
is .GlobalEnv in which case the symbol is left unchanged.
Create examples that illustrate each of the above cases.
A: Let’s create a new environment my_env
, which contains no objects. In this case substitute()
will just return its first argument (expr
):
my_env <- env()
substitute(x, my_env)
#> x
When we create a function containing an argument, which is directly returned after substitution, this function just returns the provided expression:
foo <- function(x) substitute(x)
foo(x + y * sin(0))
#> x + y * sin(0)
In case substitute()
can find (parts of) the expression in env
, it will literally substitute. However, unless env
is .GlobalEnv
.
my_env$x <- 7
substitute(x, my_env)
#> [1] 7
x <- 7
substitute(x, .GlobalEnv)
#> x
15.3 Unquoting
Q1: Given the following components:
Use quasiquotation to construct the following calls:
(x + y) / (y + z) # (1)
-(x + z) ^ (y + z) # (2)
(x + y) + (y + z) - (x + y) # (3)
atan2(x + y, y + z) # (4)
sum(x + y, x + y, y + z) # (5)
sum(a, b, c) # (6)
mean(c(a, b, c), na.rm = TRUE) # (7)
foo(a = x + y, b = y + z) # (8)
A: We combine and unquote the given quoted expressions to construct the desired calls like this:
expr(!!xy / !!yz) # (1)
#> (x + y)/(y + z)
expr(-(!!xz)^(!!yz)) # (2)
#> -(x + z)^(y + z)
expr(((!!xy)) + !!yz-!!xy) # (3)
#> (x + y) + (y + z) - (x + y)
expr(atan2(!!xy, !!yz)) # (4)
#> atan2(x + y, y + z)
expr(sum(!!xy, !!xy, !!yz)) # (5)
#> sum(x + y, x + y, y + z)
expr(sum(!!!abc)) # (6)
#> sum(a, b, c)
expr(mean(c(!!!abc), na.rm = TRUE)) # (7)
#> mean(c(a, b, c), na.rm = TRUE)
expr(foo(a = !!xy, b = !!yz)) # (8)
#> foo(a = x + y, b = y + z)
Q2: The following two calls print the same, but are actually different:
(a <- expr(mean(1:10)))
#> mean(1:10)
(b <- expr(mean(!!(1:10))))
#> mean(1:10)
identical(a, b)
#> [1] FALSE
What’s the difference? Which one is more natural?
A: It’s easiest to see the difference with lobstr::ast()
:
lobstr::ast(mean(1:10))
#> █─mean
#> └─█─`:`
#> ├─1
#> └─10
lobstr::ast(mean(!!(1:10)))
#> █─mean
#> └─<inline integer>
In the expression mean(!!(1:10))
the call 1:10
is evaluated to an integer vector, while still being a call object in mean(1:10)
.
The first version (mean(1:10)
) seems more natural. It captures lazy evaluation, with a promise that is evaluated when the function is called. The second version (mean(!!(1:10))
) inlines a vector directly into a call.
15.4 ...
(dot-dot-dot)
Q1: One way to implement exec()
is shown below. Describe how it works. What are the key ideas?
exec <- function(f, ..., .env = caller_env()) {
args <- list2(...)
do.call(f, args, envir = .env)
}
A: exec()
takes a function (f
), its arguments (...
) and an environment (.env
) as input. This allows to construct a call from f
and ...
and evaluate this call in the supplied environment. As the ...
argument is handled via list2()
, exec()
supports tidy dots (quasiquotation), which means that arguments and names (on the left-hand side of :=
) can be unquoted via !!
and !!!
.
Q2: Carefully read the source code for interaction()
, expand.grid()
, and par()
. Compare and contrast the techniques they use for switching between dots and list behaviour.
A: All three functions capture the dots via args <- list(...)
.
interaction()
computes factor interactions between the captured input factors by iterating over the args
. When a list is provided this is detected via length(args) == 1 && is.list(args[[1]])
and one level of the list is stripped through args <- args[[1]]
. The rest of the function’s code doesn’t differentiate further between list and dots behaviour.
# Both calls create the same output
interaction( a = c("a", "b", "c", "d"), b = c("e", "f")) # dots
#> [1] a.e b.f c.e d.f
#> Levels: a.e b.e c.e d.e a.f b.f c.f d.f
interaction(list(a = c("a", "b", "c", "d"), b = c("e", "f"))) # list
#> [1] a.e b.f c.e d.f
#> Levels: a.e b.e c.e d.e a.f b.f c.f d.f
expand.grid()
uses the same strategy and also assigns args <- args[[1]]
in case of length(args) == 1 && is.list(args[[1]])
.
par()
does the most pre-processing to ensure a valid structure of the args
argument. When no dots are provided (!length(args)
) it creates a list of arguments from an internal character vector (partly depending on its no.readonly
argument). Further, given that all elements of args
are character vectors (all(unlist(lapply(args, is.character)))
), args
is turned into a list via as.list(unlist(args))
(this flattens nested lists). Similar to the other functions, one level of args
gets stripped via args <- args[[1L]]
, when args
is of length one and its first element is a list.
Q3: Explain the problem with this definition of set_attr()
set_attr <- function(x, ...) {
attr <- rlang::list2(...)
attributes(x) <- attr
x
}
set_attr(1:10, x = 10)
#> Error in attributes(x) <- attr: attributes must be named
A: set_attr()
expects an object named x
and its attributes, supplied via the dots. Unfortunately, this prohibits us to provide attributes named x
as these would collide with the argument name of our object. Even omitting the object’s argument name doesn’t help in this case — as can be seen in the example where the object is consequently treated as an unnamed attribute.
However, we may name the first argument .x
, which seems clearer and less likely to invoke errors. In this case 1:10
will get the (named) attribute x = 10
assigned:
set_attr <- function(.x, ...) {
attr <- rlang::list2(...)
attributes(.x) <- attr
.x
}
set_attr(1:10, x = 10)
#> [1] 1 2 3 4 5 6 7 8 9 10
#> attr(,"x")
#> [1] 10
15.5 Case studies
Q1: In the linear-model example, we could replace the expr()
in reduce(summands, ~ expr(!!.x + !!.y))
with call2()
: reduce(summands, call2, "+")
. Compare and contrast the two approaches. Which do you think is easier to read?
A: We would consider the first version to be more readable. There seems to be a little more boilerplate code at first, but the unquoting syntax is very readable. Overall, the whole expression seems more explicit and less complex.
Q2: Re-implement the Box-Cox transform defined below using unquoting and new_function()
:
bc <- function(lambda) {
if (lambda == 0) {
function(x) log(x)
} else {
function(x) (x ^ lambda - 1) / lambda
}
}
A: Here new_function()
allows us to create a function factory using tidy evaluation.
bc2 <- function(lambda) {
lambda <- enexpr(lambda)
if (!!lambda == 0) {
new_function(exprs(x = ), expr(log(x)))
} else {
new_function(exprs(x = ), expr((x ^ (!!lambda) - 1) / !!lambda))
}
}
bc2(0)
#> function (x)
#> log(x)
#> <environment: 0x7fdd18273740>
bc2(2)
#> function (x)
#> (x^2 - 1)/2
#> <environment: 0x7fdd18302e48>
bc2(2)(2)
#> [1] 1.5
Q3: Re-implement the simple compose()
defined below using quasiquotation and new_function()
:
compose <- function(f, g) {
function(...) f(g(...))
}
A: The implementation is fairly straightforward, even though a lot of parentheses are required:
compose2 <- function(f, g) {
f <- enexpr(f)
g <- enexpr(g)
new_function(exprs(... = ), expr((!!f)((!!g)(...))))
}
compose(sin, cos)
#> function(...) f(g(...))
#> <environment: 0x7fdd142a53b0>
compose(sin, cos)(pi)
#> [1] -0.841
compose2(sin, cos)
#> function (...)
#> sin(cos(...))
#> <environment: 0x7fdd1785c668>
compose2(sin, cos)(pi)
#> [1] -0.841