# 18 Expressions

## 18.1 Abstract syntax trees

1. Q: Use ast() and experimentation to figure out the three arguments to an if() call. What would you call them? Which arguments are required and which are optional?

A: You can write an if() statement in several ways: with or without else, formatted or in one line and also in prefix notation. Here are several versions focussing on the possibility of leaving out curly brackets.

lobstr::ast(if (TRUE) {} else {})
#> █─if
#> ├─TRUE
#> ├─█─{
#> └─█─{
lobstr::ast(if (TRUE) 1 else 2)
#> █─if
#> ├─TRUE
#> ├─1
#> └─2
lobstr::ast(if(TRUE, 1, 2))
#> █─if
#> ├─TRUE
#> ├─1
#> └─2

One possible way of naming the arguments would be: condition (1), conclusion (2), alternative (3).

The condition is always required. If the condition is TRUE, also the conclusion is required. If the condition is FALSE and if() is called in combination with else(), then also the alternative is required.

2. Q: What does the call tree of an if statement with multiple else if conditions look like? Why?

A: The ast of nested else if statements might look a bit confusing because it contains multiple brackets. However, we can see that in the else part of the ast just another expression is being evaluated, which happens to be an if statement and so forth.

lobstr::ast(
if (FALSE) {
1
} else if (FALSE) {
2
} else if (TRUE) {
3
}
)
#> █─if
#> ├─FALSE
#> ├─█─{
#> │ └─1
#> └─█─if
#>   ├─FALSE
#>   ├─█─{
#>   │ └─2
#>   └─█─if
#>     ├─TRUE
#>     └─█─{
#>       └─3

We can see the structure more clearly when we avoid the curly brackets through prefix notation.

lobstr::ast(if(FALSE, 1, if(FALSE, 2, if(TRUE, 3))))
#> █─if
#> ├─FALSE
#> ├─1
#> └─█─if
#>   ├─FALSE
#>   ├─2
#>   └─█─if
#>     ├─TRUE
#>     └─3
3. Q: What are the arguments to the for() and while() calls?

A: for() requires an index (called var in the docs), a sequence and an expression, for example

for(i, 1:3, {print(i)})
#> [1] 1
#> [1] 2
#> [1] 3

while() requires a condition and an expression. Again, an example in prefix notation:

set.seed(123)
while((i <- rnorm(1)) < 1, {print(i)})
#> [1] -0.5604756
#> [1] -0.2301775
i
#> [1] 1.558708

Note that a minimal expression can consist of { only.

4. Q: Two arithmetic operators can be used in both prefix and infix style. What are they?

A: I am not sure how this is meant to be. Theoretically every arithmetic operator can be written in prefix notation via backticks. On the other hand, + and - seem to be the only ones, which can be written in infix notation without backticks.

x <- 1

+(x)
#> [1] 1
-(x)
#> [1] -1

However, when we look more closely, the call tree is not what we would expect from a prefix function

lobstr::ast(+(x))
#> █─+
#> └─█─(
#>   └─x
lobstr::ast(-(x))
#> █─-
#> └─█─(
#>   └─x

So maybe it is meant to look like this…

lobstr::ast(+x)
#> █─+
#> └─x
lobstr::ast(-x)
#> █─-
#> └─x

Of course also this doesn’t make too much sense, since in ?Syntax one can read, that R clearly differentiates between unary and binary + and - operators and a unary operator is not really what we mean, when we speak about infix operators.

However, if we don’t differentiate in this way, this is probably the solution, since it’s obviously also an infix function:

lobstr::ast(x + y)
#> █─+
#> ├─x
#> └─y
lobstr::ast(x - y)
#> █─-
#> ├─x
#> └─y

## 18.2 R’s grammar

1. Q: R uses parentheses in two slightly different ways as illustrated by these two calls:

f((1))
((1 + 1)

Compare and contrast the two uses by referencing the AST.

A: The trick with these examples lies in the fact, that ( can represent a primitive function but also be a part of R’s general prefix function syntax.

So in the AST of the first example, we will not see the outer (, which belongs to f() and is therefore not shown in the syntax, while the inner ( is treated as a function (symbol):

lobstr::ast(f((1)))
#> █─f
#> └─█─(
#>   └─1

In the second example, we can see that the outer ( is treated as a function and the inner ( belongs to its syntax:

lobstr::ast(((1 + 1))
#> █─(
#> └─█─+
#>   ├─1
#>   └─1

For the sake of clarity, let’s also create a third example, where none of the ( is part of another functions syntax:

lobstr::ast(((1 + 1)))
#> █─(
#> └─█─(
#>   └─█─+
#>     ├─1
#>     └─1
2. Q: = can also be used in two ways. Construct a simple example that shows both uses.

A: I was not exactly aware of a similar case with multiple syntactical meanings for the = symbol, but one can get there somehow. = is used as an operator for assignment. It is also part of the logical operators ==, >=, <=, != and is also used within functions to assign parameters or the definition of default settings.

The question probably aims at the difference of global assignment and parameter definition within functions.

So when we play with ast(), we can directly see that the following is not possible

lobstr::ast(a = 1)
#> Error in lobstr::ast(a = 1): unused argument (a = 1)

We get an error, because a = makes R looking for an argument called a. Since x is the only argument of lobstr::ast(), we get an error.

When we build our workaround for the problem, the solution to the question becomes obvious.

Instead a = 1, we pass the expression via brackets to ast(). Once via matching by position and once via matching by name

lobstr::ast((a = 1))
#> █─(
#> └─█─=
#>   ├─a
#>   └─1
lobstr::ast(x = (a = 1))
#> █─(
#> └─█─=
#>   ├─a
#>   └─1

The second way is more explicit, but both return the same syntax tree. When wee ignore the brackets and compare the trees, we can finally see from the second tree, that the first = is just part of the syntax and the second one is for the usage of assignment.

3. Q: What does !1 + !1 return? Why?

A: The first answer is quite simple

!1 + !1
#> [1] FALSE

To answer the “Why”, we have a look at the syntax tree first

lobstr::ast(!1 + !1)
#> █─!
#> └─█─+
#>   ├─1
#>   └─█─!
#>     └─1

So first, the second !1 becomes evaluated, which results in FALSE, because in R every non 0 numeric, becomes coerced to TRUE, when a logical operator is applied on it.

Next 1 + FALSE is evaluated to 1, since FALSE is coerced to 0.

Finally !1 is evaluated to FALSE, because it is the opposite of TRUE, which is what 1 becomes coerced to.

However, note that if ! had a higher precedence, the result would get FALSE + FALSE as intermediate result, which would be evalutated (again involving coercion) to 0.

4. Q: Why does x1 <- x2 <- x3 <- 0 work? There are two reasons.

A: One reason is that <- is right-associative.

5. Q: Compare the ASTs x + y %+% z to x ^ y %+% z. What does that tell you about the precedence of custom infix functions?

A: Comparison of the syntax trees:

# for ast(x + y %+% z)
# y %+% z will be calculated first and the result will be added to x
lobstr::ast(x + y %+% z)
#> █─+
#> ├─x
#> └─█─%+%
#>   ├─y
#>   └─z

# for ast(x ^ y %+% z)
# x ^ y will be calculated first, and the result will be used as
# first argument of %+%()
lobstr::ast(x ^ y %+% z)
#> █─%+%
#> ├─█─^
#> │ ├─x
#> │ └─y
#> └─z

So we can conclude that custom infix functions must have a precedence between addition and exponentiation. The general precedence rules can be found for example here.

## 18.3 Data structures

1. Q: Which two of the six types of atomic vector can’t appear in an expression? Why? Why can’t you create an expression that contains an atomic vector of length greater than one?

A: It is not possible to create an expression that evaluates to an atomic of length greater than one without using a function (i.e. the c() function). But expressions that include a function would be calls.

Let us illustrate this observation via the following example:

is.atomic(quote(1))       # atomic
#> [1] TRUE
is.atomic(quote(c(1,1)))  # not an atomic (it would just evaluate to an atomic).
#> [1] FALSE
is.call(quote(c(1,1)))    # still a call! (so at least a valid expression).
#> [1] TRUE

Two of the six atomic vector types of R do not work with expressions, the first one being raws. We assume, that raws may only be constructed through using as.raw(), but this function would then creating another call in the AST.

For similar reasons complex numbers also won’t work:

(function(x){is.atomic(x) & length(x) == 1})(quote(1 + 1.5i))
#> [1] FALSE

# however, imaginary parts of complex numbers work:
lobstr::ast(1i)
#> 0+1i

2. Q: How is rlang::maybe_missing() implemented? Why does it work?

A: Let us take a look at the functions source code to see what’s going on

lang::maybe_missing
function (x)
{
# is_missing checks if one of the following is TRUE
# 1. check via substitute if typeof(x) is symbol and missing(x) is TRUE
# 2. check if x identical to missing_arg()
if (is_missing(x)) {
missing_arg()  # returns missing argument
# implemented in lower level code -> .Call())
}
else {
x  # when it's not missing, x is simply returned
}
}
<bytecode: 0x00000000195ed740>
<environment: namespace:rlang>

First it is checked if the argument is missing. If so, the missing arg is returned, otherwise the argument (x) itsself is returned.

3. Q: rlang::call_standardise() doesn’t work so well for the following calls. Why? What makes mean() special?

library(rlang)

call_standardise(quote(mean(1:10, na.rm = TRUE)))
#> mean(x = 1:10, na.rm = TRUE)
call_standardise(quote(mean(n = T, 1:10)))
#> mean(x = 1:10, n = T)
call_standardise(quote(mean(x = 1:10, , TRUE)))
#> mean(x = 1:10, , TRUE)

A: The reason for this unexpected behaviour lies in the fact that mean() uses S3 dispatch (i.e., UseMethod) and therefore does not store its formals on mean(), but rather on mean.default(). rlang::call_standardise() can do much better when the S3 dispatch is explicit.

call_standardise(quote(mean.default(1:10, na.rm = TRUE)))
#> mean.default(x = 1:10, na.rm = TRUE)
call_standardise(quote(mean.default(n = T, 1:10)))
#> mean.default(x = 1:10, na.rm = T)
call_standardise(quote(mean.default(x = 1:10, , TRUE)))
#> mean.default(x = 1:10, na.rm = TRUE)
4. Q: Why does this code not make sense?

x <- expr(foo(x = 1))
names(x) <- c("x", "")

A: As stated in the book

The first element of a call is always the function that gets called.

We can just look what will happen

x <- rlang::expr(foo(x = 1))
x
#> foo(x = 1)

names(x) <- c("x", "")
x
#> foo(1)

names(x) <- c("", "x")
x
#> foo(x = 1)

5. Q: Construct the expression if(x > 1) "a" else "b" using multiple calls to lang(). How does the structure code reflect the structure of the AST?

A: Similar to the prefix version we get

rlang::lang("if", rlang::expr(x > 1), "a", "b")
#> Warning: lang() is soft-deprecated as of rlang 0.2.0.
#> Please use call2() instead
#> This warning is displayed once per session.
#> if (x > 1) "a" else "b"

When we reed the AST from left to right, we get the same structure: Function to evaluate, expression, which is another function and becomes evaluated first and two constants which will be evaluated next

lobstr::ast(if(x > 1, "a", "b"))
#> █─if
#> ├─█─>
#> │ ├─x
#> │ └─1
#> ├─"a"
#> └─"b"

## 18.4 Parsing and deparsing

1. Q: What happens if you attempt to parse an invalid expression? e.g. "a +" or "f())".

A: We get an error from the underlying parse function

rlang::parse_expr("a +")
#> Error in parse(text = x): <text>:2:0: unexpected end of input
#> 1: a +
#>    ^
rlang::parse_expr("f())")
#> Error in parse(text = x): <text>:1:4: unexpected ')'
#> 1: f())
#>        ^

parse(text = "a +")
#> Error in parse(text = "a +"): <text>:2:0: unexpected end of input
#> 1: a +
#>    ^
parse(text = "f())")
#> Error in parse(text = "f())"): <text>:1:4: unexpected ')'
#> 1: f())
#>        ^
2. Q: deparse() produces vectors when the input is long. For example, the following call produces a vector of length two:

expr <- rlang::expr(g(a + b + c + d + e + f + g + h + i + j + k + l + m +
n + o + p + q + r + s + t + u + v + w + x + y + z))

deparse(expr)
#> [1] "g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + "
#> [2] "    p + q + r + s + t + u + v + w + x + y + z)"

What do expr_text(), expr_name(), and expr_label() do with this input?

A:

• expr_text() pastes the output string into one and inserts \n (new line identifiers) as separators
cat(rlang::expr_text(expr)) # cat is used for printing with linebreak
#> g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o +
#>     p + q + r + s + t + u + v + w + x + y + z)
• expr_name() recreates the call into the form f(…) and deparses this expression into a string
rlang::expr_name(expr)
#> [1] "g(...)"
• expr_label() does the same as expr_name(), but surrounds the output also with backticks
rlang::expr_label(expr)
#> [1] "g(...)"
3. Q: Why does as.Date.default() use substitute() and deparse()? Why does pairwise.t.test() use them? Read the source code.

A:

4. Q: pairwise.t.test() assumes that deparse() always returns a length one character vector. Can you construct an input that violates this expectation? What happens?

A:

## 18.5 Case study: Walking the AST with recursive functions

1. Q: logical_abbr() returns TRUE for T(1, 2, 3). How could you modify logical_abbr_rec() so that it ignores function calls that use T or F?

A: We can apply a similar logic as in the multiple assignment example from the textbook and just treat this case as a special case handled within a sub function called find_T_call() which finds T() calls and “bounces them out”:

find_T_call <- function(x) {
if (is_call(x, "T")) {
x <- as.list(x)[-1]
purrr::some(x, logical_abbr_rec)
} else {
purrr::some(x, logical_abbr_rec)
}
}

logical_abbr_rec <- function(x) {
switch_expr(
x,
# Base cases
constant = FALSE,
symbol = as_string(x) %in% c("F", "T"),

# Recursive cases
pairlist = purrr::some(x, logical_abbr_rec),
call = find_T_call(x)
)
}

logical_abbr <- function(x) {
logical_abbr_rec(enexpr(x))
}

Now lets test our new logical_abbr() function:

logical_abbr(T(1, 2, 3))
#> [1] FALSE
logical_abbr(T(T, T(3, 4)))
#> [1] TRUE
logical_abbr(T(T))
#> [1] TRUE
logical_abbr(T())
#> [1] FALSE
logical_abbr()
#> [1] FALSE
logical_abbr(c(T, T, T))
#> [1] TRUE
2. Q: logical_abbr() works with expressions. It currently fails when you give it a function. Why not? How could you modify logical_abbr() to make it work? What components of a function will you need to recurse over?

f <- function(x = TRUE) {
g(x + T)
}
logical_abbr(!!f)

A: It currently fails, because "closure" is not handled within switch_expr() within logical_abbr_rec(). If we wanted to make it work, we must open a case there and write a function to inspect the formals and the body of the input function.

3. Q: Modify find assignment to also detect assignment using replacement functions, i.e. names(x) <- y.

A: Lets see what the AST of such an assignment looks like:

ast(names(x) <- x)
#> █─<-
#> ├─█─names
#> │ └─x
#> └─x

So we need to catch the case where the first two elements are both calls. Further the first call is identical to <- and we must return only the second call to see which objects got new values assigned.

This is why we add the following block Within another else statement in find_assign_call():

if (is_call(x, "<-") && is_call(x[[2]])) {
lhs <- expr_text(x[[2]])
children <- as.list(x)[-1]
}

Let us finish with the whole code including some tests for our new function:

flat_map_chr <- function(.x, .f, ...) {
purrr::flatten_chr(purrr::map(.x, .f, ...))
}

find_assign <- function(x) unique(find_assign_rec(enexpr(x)))

find_assign_call <- function(x) {
if (is_call(x, "<-") && is_symbol(x[[2]])) {
lhs <- as_string(x[[2]])
children <- as.list(x)[-1]
} else {
if (is_call(x, "<-") && is_call(x[[2]])) {
lhs <- expr_text(x[[2]])
children <- as.list(x)[-1]
} else {
lhs <- character()
children <- as.list(x)
}}

c(lhs, flat_map_chr(children, find_assign_rec))
}

find_assign_rec <- function(x) {
switch_expr(
x,
# Base cases
constant = ,symbol = character(),
# Recursive cases
pairlist = flat_map_chr(x, find_assign_rec),
call = find_assign_call(x)
)
}

find_assign(x <- y)
#> [1] "x"
find_assign(names(x))
#> character(0)
find_assign(names(x) <- y)
#> [1] "names(x)"
find_assign(names(x(y)) <- y)
#> [1] "names(x(y))"
find_assign(names(x(y)) <- y <- z)
#> [1] "names(x(y))" "y"
4. Q: Write a function that extracts all calls to a specified function.

A: We just need to delete the former added else statement and check for a call (not necessarily <-) within the first if() in find_assign_call(). We save a call when we found one and return it later as part of our character output. Everything else stays the same:

find_assign_call <- function(x) {
if (is_call(x)) {
lhs <- expr_text(x)
children <- as.list(x)[-1]
} else {
lhs <- character()
children <- as.list(x)
}

c(lhs, flat_map_chr(children, find_assign_rec))
}

find_assign_rec <- function(x) {
switch_expr(x,
# Base cases
constant = ,
symbol = character(),

# Recursive cases
pairlist = flat_map_chr(x, find_assign_rec),
call = find_assign_call(x)
)
}

find_assign(x <- y)
#> [1] "x <- y"
find_assign(names(x(y)) <- y <- z)
#> [1] "names(x(y)) <- y <- z" "names(x(y))"           "x(y)"
#> [4] "y <- z"
find_assign(mean(sum(1:3)))
#> [1] "mean(sum(1:3))" "sum(1:3)"       "1:3"`