# 18 Expressions

## Prerequisites

To capture and compute on expressions, and to visualise them, we will load the rlang and the lobstr packages.

```
library(rlang)
library(lobstr)
```

## 18.1 Abstract syntax trees

**Q**: Reconstruct the code represented by the trees below:`#> █─f #> └─█─g #> └─█─h #> █─`+` #> ├─█─`+` #> │ ├─1 #> │ └─2 #> └─3 #> █─`*` #> ├─█─`(` #> │ └─█─`+` #> │ ├─x #> │ └─y #> └─z`

**A**: The source is with you. ;)`ast(f(g(h()))) #> █─f #> └─█─g #> └─█─h ast(1 + 2 + 3) #> █─`+` #> ├─█─`+` #> │ ├─1 #> │ └─2 #> └─3 ast((x + y) * z) #> █─`*` #> ├─█─`(` #> │ └─█─`+` #> │ ├─x #> │ └─y #> └─z`

**Q**: Draw the following trees by hand then check your answers with`lobstr::ast()`

.`f(g(h(i(1, 2, 3)))) f(1, g(2, h(3, i()))) f(g(1, 2), h(3, i(4, 5)))`

**A**: TODO: Drawings by hand`ast(f(g(h(i(1, 2, 3))))) #> █─f #> └─█─g #> └─█─h #> └─█─i #> ├─1 #> ├─2 #> └─3 ast(f(1, g(2, h(3, i())))) #> █─f #> ├─1 #> └─█─g #> ├─2 #> └─█─h #> ├─3 #> └─█─i ast(f(g(1, 2), h(3, i(4, 5)))) #> █─f #> ├─█─g #> │ ├─1 #> │ └─2 #> └─█─h #> ├─3 #> └─█─i #> ├─4 #> └─5`

**Q**: What’s happening with the ASTs below? (Hint: carefully read`?"^"`

)`lobstr::ast(`x` + `y`) #> █─`+` #> ├─x #> └─y lobstr::ast(x ** y) #> █─`^` #> ├─x #> └─y lobstr::ast(1 -> x) #> █─`<-` #> ├─x #> └─1`

**A**: As the AST starts function calls always with the name of the function, the call in the first expression is translated into its prefix form. In the second case,`**`

is translated by R’s parser into`^`

. In the last AST, the expression is flipped when R parses it:`str(expr(a -> b)) #> language b <- a`

**Q**: What is special about the AST below? (Hint: re-read Section 6.2.1)`lobstr::ast(function(x = 1, y = 2) {}) #> █─`function` #> ├─█─x = 1 #> │ └─y = 2 #> ├─█─`{` #> └─<inline srcref>`

**A**: The last leaf of the AST is not explicitly specified in the expression. Instead the`srcref`

attribute, which points to the functions source code, is created automatically by base R.**Q**: What does the call tree of an`if`

statement with multiple`else if`

conditions look like? Why?**A**: The ast of nested`else if`

statements might look a bit confusing because it contains multiple brackets. However, we can see that in the`else`

part of the*ast*just another expression is being evaluated, which happens to be an`if`

statement and so forth.`lobstr::ast( if (FALSE) { 1 } else if (FALSE) { 2 } else if (TRUE) { 3 } ) #> █─`if` #> ├─FALSE #> ├─█─`{` #> │ └─1 #> └─█─`if` #> ├─FALSE #> ├─█─`{` #> │ └─2 #> └─█─`if` #> ├─TRUE #> └─█─`{` #> └─3`

We can see the structure more clearly when we avoid the curly brackets through prefix notation.

`lobstr::ast(`if`(FALSE, 1, `if`(FALSE, 2, `if`(TRUE, 3)))) #> █─`if` #> ├─FALSE #> ├─1 #> └─█─`if` #> ├─FALSE #> ├─2 #> └─█─`if` #> ├─TRUE #> └─3`

## 18.2 Expressions

**Q**: Which two of the six types of atomic vector can’t appear in an expression? Why? Similarly, why can’t you create an expression that contains an atomic vector of length greater than one?**A**: It is not possible to create an expression that evaluates to an atomic of length greater than one without using a function (i.e. the`c()`

function). But expressions that include a function would be calls.Let us illustrate this observation via the following example:

`is.atomic(quote(1)) # atomic #> [1] TRUE is.atomic(quote(c(1,1))) # not an atomic (it would just evaluate to an atomic). #> [1] FALSE is.call(quote(c(1,1))) # still a call! (so at least a valid expression). #> [1] TRUE`

Two of the six atomic vector types of R do not work with expressions, the first one being raws. We assume, that raws may only be constructed through using

`as.raw()`

, but this function would then creating another call in the AST.For similar reasons complex numbers also won’t work:

`(function(x){is.atomic(x) & length(x) == 1})(quote(1 + 1.5i)) #> [1] FALSE # however, imaginary parts of complex numbers work: lobstr::ast(1i) #> 1i`

**Q**: What happens when you subset a call object to remove the first element? e.g.`expr(read.csv("foo.csv", header = TRUE))[-1]`

. Why?**A**: When the first element of a call object is removed, the second element moves to the first position, which is in general taken as the function to call. Therefore, we remain with the call`"food.csv"(header = TRUE)`

**Q**: Describe the differences between the following call objects.`x <- 1:10 call2(median, x, na.rm = TRUE) call2(expr(median), x, na.rm = TRUE) call2(median, expr(x), na.rm = TRUE) call2(expr(median), expr(x), na.rm = TRUE)`

**A**: The call objects differ in their first two elements, which are sometimes evaluated before the call is constructed. In the first one, both`median()`

and`x`

are evaluated. Therefore, we can see in the constructed call that`median`

is a generic and the`x`

argument is`1:10`

. In the following calls we remain with differing combinations. Once, only`x`

and once only only`median()`

gets evaluated. In the final call both aren’t evaluated.**Q**:`rlang::call_standardise()`

doesn’t work so well for the following calls. Why? What makes`mean()`

special?`library(rlang) call_standardise(quote(mean(1:10, na.rm = TRUE))) #> mean(x = 1:10, na.rm = TRUE) call_standardise(quote(mean(n = T, 1:10))) #> mean(x = 1:10, n = T) call_standardise(quote(mean(x = 1:10, , TRUE))) #> mean(x = 1:10, , TRUE)`

**A**: The reason for this unexpected behaviour lies in the fact that`mean()`

uses the`...`

argument and therefore can not standardise the regarding arguments. Since`mean()`

uses S3 dispatch (i.e.,`UseMethod`

) and the underlying`mean.default()`

method specifies some more arguments,`rlang::call_standardise()`

can do much better when the S3 dispatch is explicit.`call_standardise(quote(mean.default(1:10, na.rm = TRUE))) #> mean.default(x = 1:10, na.rm = TRUE) call_standardise(quote(mean.default(n = T, 1:10))) #> mean.default(x = 1:10, na.rm = T) call_standardise(quote(mean.default(x = 1:10, , TRUE))) #> mean.default(x = 1:10, na.rm = TRUE)`

**Q**: Why does this code not make sense?`x <- expr(foo(x = 1)) names(x) <- c("x", "")`

**A**: As stated in the bookThe first element of a call is always the function that gets called.

We can just look what will happen

`x <- rlang::expr(foo(x = 1)) x #> foo(x = 1) names(x) <- c("x", "") x #> foo(1) names(x) <- c("", "x") x #> foo(x = 1)`

So giving the first element a name just adds useless metadata.

**Q**: Construct the expression`if(x > 1) "a" else "b"`

using multiple calls to`call2()`

. How does the structure code reflect the structure of the AST?**A**: Similar to the prefix version we get`call2("if", expr(x > 1), "a", "b") #> if (x > 1) "a" else "b"`

When we reed the AST from left to right, we get the same structure: Function to evaluate, expression, which is another function and becomes evaluated first and two constants which will be evaluated next

`lobstr::ast(`if`(x > 1, "a", "b")) #> █─`if` #> ├─█─`>` #> │ ├─x #> │ └─1 #> ├─"a" #> └─"b"`

## 18.3 Parsing and grammar

**Q**: R uses parentheses in two slightly different ways as illustrated by these two calls:`f((1)) `(`(1 + 1)`

Compare and contrast the two uses by referencing the AST.

**A**: The trick with these examples lies in the fact, that`(`

can represent a primitive function but also be a part of R’s general prefix function syntax.So in the AST of the first example, we will not see the outer

`(`

, which belongs to`f()`

and is therefore not shown in the syntax, while the inner`(`

is treated as a function (symbol):`lobstr::ast(f((1))) #> █─f #> └─█─`(` #> └─1`

In the second example, we can see that the outer

`(`

is treated as a function and the inner`(`

belongs to its syntax:`lobstr::ast(`(`(1 + 1)) #> █─`(` #> └─█─`+` #> ├─1 #> └─1`

For the sake of clarity, let’s also create a third example, where none of the

`(`

is part of another functions syntax:`lobstr::ast(((1 + 1))) #> █─`(` #> └─█─`(` #> └─█─`+` #> ├─1 #> └─1`

**Q**:`=`

can also be used in two ways. Construct a simple example that shows both uses.**A**: I was not exactly aware of a similar case with multiple syntactical meanings for the`=`

symbol, but one can get there somehow.`=`

is used as an operator for assignment. It is also part of the logical operators`==`

,`>=`

,`<=`

,`!=`

and is also used within functions to assign parameters or the definition of default settings.The question probably aims at the difference of global assignment and parameter definition within functions.

So when we play with

`ast()`

, we can directly see that the following is not possible`lobstr::ast(a = 1) #> Error in lobstr::ast(a = 1): unused argument (a = 1)`

We get an error, because

`a =`

makes R looking for an argument called`a`

. Since`x`

is the only argument of`lobstr::ast()`

, we get an error.When we build our workaround for the problem, the solution to the question becomes obvious.

Instead

`a = 1`

, we pass the expression via brackets to`ast()`

. Once via matching by position and once via matching by name`lobstr::ast((a = 1)) #> █─`(` #> └─█─`=` #> ├─a #> └─1 lobstr::ast(x = (a = 1)) #> █─`(` #> └─█─`=` #> ├─a #> └─1`

The second way is more explicit, but both return the same syntax tree. When wee ignore the

`brackets`

and compare the trees, we can finally see from the second tree, that the first`=`

is just part of the syntax and the second one is for the usage of assignment.**Q**: Does`-2^2`

yield 4 or -4? Why?**A**: It yields`-4`

, because`^`

has higher operator precedence than`-`

, which we can verify by looking at the AST:`-2^2 #> [1] -4 lobstr::ast(-2^2) #> █─`-` #> └─█─`^` #> ├─2 #> └─2`

**Q**: What does`!1 + !1`

return? Why?**A**: The first answer is quite simple`!1 + !1 #> [1] FALSE`

To answer the “Why”, we have a look at the syntax tree first

`lobstr::ast(!1 + !1) #> █─`!` #> └─█─`+` #> ├─1 #> └─█─`!` #> └─1`

So first, the second

`!1`

becomes evaluated, which results in`FALSE`

, because in R every non 0 numeric, becomes coerced to`TRUE`

, when a logical operator is applied on it.Next

`1 + FALSE`

is evaluated to`1`

, since`FALSE`

is coerced to`0`

.Finally

`!1`

is evaluated to`FALSE`

, because it is the opposite of`TRUE`

, which is what`1`

becomes coerced to.However, note that if

`!`

had a higher precedence, the result would get`FALSE + FALSE`

as intermediate result, which would be evalutated (again involving coercion) to`0`

.**Q**: Why does`x1 <- x2 <- x3 <- 0`

work? Describe the two reasons.**A**: One reason is that`<-`

is right-associative.**Q**: Compare the ASTs of`x + y %+% z`

and`x ^ y %+% z`

. What have you learned about the precedence of custom infix functions?**A**: Comparison of the syntax trees:`# for ast(x + y %+% z) # y %+% z will be calculated first and the result will be added to x lobstr::ast(x + y %+% z) #> █─`+` #> ├─x #> └─█─`%+%` #> ├─y #> └─z # for ast(x ^ y %+% z) # x ^ y will be calculated first, and the result will be used as # first argument of %+%() lobstr::ast(x ^ y %+% z) #> █─`%+%` #> ├─█─`^` #> │ ├─x #> │ └─y #> └─z`

So we can conclude that custom infix functions must have a precedence between addition and exponentiation. The general precedence rules can be found for example here.

**Q**: What happens if you call`parse_expr()`

with a string that generates multiple expressions? e.g.`parse_expr("x + 1; y + 1")`

**A**:`parse_expr()`

notices that more than one expression would be generated and throws an error.`parse_expr("x + 1; y + 1") #> Warning: `rlang__backtrace_on_error` is no longer experimental. #> It has been renamed to `rlang_backtrace_on_error`. Please update your RProfile. #> This warning is displayed once per session. #> Error: More than one expression parsed`

**Q**: What happens if you attempt to parse an invalid expression? e.g.`"a +"`

or`"f())"`

.**A**: We get an error from the underlying`parse`

function`rlang::parse_expr("a +") #> Error in parse(text = x): <text>:2:0: unexpected end of input #> 1: a + #> ^ rlang::parse_expr("f())") #> Error in parse(text = x): <text>:1:4: unexpected ')' #> 1: f()) #> ^ parse(text = "a +") #> Error in parse(text = "a +"): <text>:2:0: unexpected end of input #> 1: a + #> ^ parse(text = "f())") #> Error in parse(text = "f())"): <text>:1:4: unexpected ')' #> 1: f()) #> ^`

**Q**:`deparse()`

produces vectors when the input is long. For example, the following call produces a vector of length two:`expr <- expr(g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z)) deparse(expr)`

What does

`expr_text()`

do instead?**A**:`expr_text()`

pastes the results from`deparse(expr)`

together with a linebreak`\n`

as separator.`expr <- expr(g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z)) deparse(expr) #> [1] "g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + " #> [2] " p + q + r + s + t + u + v + w + x + y + z)" expr_text(expr) #> [1] "g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + \n p + q + r + s + t + u + v + w + x + y + z)"`

**Q**:`pairwise.t.test()`

assumes that`deparse()`

always returns a length one character vector. Can you construct an input that violates this expectation? What happens?**A**: We can pass an expression to one of`pairwise.t.test()`

’s data input arguments, which exceeds the default cutoff width in`deparse()`

. The expression will be split into a character vector of length greater 1. The deparsed data inputs are directly pasted (read the source code!) with “and” as separator and the result is just used to be displayed in the output. Just the data.name output will change (it will include more than one “and”).`d=1 pairwise.t.test(2, d+d+d+d+d+d+d+d+d+d+d+d+d+d+d+d+d) #> #> Pairwise comparisons using t tests with pooled SD #> #> data: 2 and d + d + d + d + d + d + d + d + d + d + d + d + d + d + d + d + 2 and d #> #> <0 x 0 matrix> #> #> P value adjustment method: holm`

## 18.4 Walking the AST with recursive functions

**Q**:`logical_abbr()`

returns`TRUE`

for`T(1, 2, 3)`

. How could you modify`logical_abbr_rec()`

so that it ignores function calls that use`T`

or`F`

?**A**: We can apply a similar logic as in the multiple assignment example from the textbook and just treat this case as a special case handled within a sub function called`find_T_call()`

which finds`T()`

calls and “bounces them out”:`find_T_call <- function(x) { if (is_call(x, "T")) { x <- as.list(x)[-1] purrr::some(x, logical_abbr_rec) } else { purrr::some(x, logical_abbr_rec) } } logical_abbr_rec <- function(x) { switch_expr( x, # Base cases constant = FALSE, symbol = as_string(x) %in% c("F", "T"), # Recursive cases pairlist = purrr::some(x, logical_abbr_rec), call = find_T_call(x) ) } logical_abbr <- function(x) { logical_abbr_rec(enexpr(x)) }`

Now lets test our new

`logical_abbr()`

function:`logical_abbr(T(1, 2, 3)) #> [1] FALSE logical_abbr(T(T, T(3, 4))) #> [1] TRUE logical_abbr(T(T)) #> [1] TRUE logical_abbr(T()) #> [1] FALSE logical_abbr() #> [1] FALSE logical_abbr(c(T, T, T)) #> [1] TRUE`

**Q**:`logical_abbr()`

works with expressions. It currently fails when you give it a function. Why not? How could you modify`logical_abbr()`

to make it work? What components of a function will you need to recurse over?`f <- function(x = TRUE) { g(x + T) } logical_abbr(!!f)`

**A**: It currently fails, because`"closure"`

is not handled within`switch_expr()`

within`logical_abbr_rec()`

. If we wanted to make it work, we must open a case there and write a function to inspect the formals and the body of the input function.**Q**: Modify find assignment to also detect assignment using replacement functions, i.e.`names(x) <- y`

.**A**: Let`s see what the AST of such an assignment looks like:`ast(names(x) <- x) #> █─`<-` #> ├─█─names #> │ └─x #> └─x`

So we need to catch the case where the first two elements are both calls. Further the first call is identical to

`<-`

and we must return only the second call to see which objects got new values assigned.This is why we add the following block Within another

`else`

statement in`find_assign_call()`

:`if (is_call(x, "<-") && is_call(x[[2]])) { lhs <- expr_text(x[[2]]) children <- as.list(x)[-1] }`

Let us finish with the whole code including some tests for our new function:

`flat_map_chr <- function(.x, .f, ...) { purrr::flatten_chr(purrr::map(.x, .f, ...)) } find_assign <- function(x) unique(find_assign_rec(enexpr(x))) find_assign_call <- function(x) { if (is_call(x, "<-") && is_symbol(x[[2]])) { lhs <- as_string(x[[2]]) children <- as.list(x)[-1] } else { if (is_call(x, "<-") && is_call(x[[2]])) { lhs <- expr_text(x[[2]]) children <- as.list(x)[-1] } else { lhs <- character() children <- as.list(x) }} c(lhs, flat_map_chr(children, find_assign_rec)) } find_assign_rec <- function(x) { switch_expr( x, # Base cases constant = ,symbol = character(), # Recursive cases pairlist = flat_map_chr(x, find_assign_rec), call = find_assign_call(x) ) } find_assign(x <- y) #> [1] "x" find_assign(names(x)) #> character(0) find_assign(names(x) <- y) #> [1] "names(x)" find_assign(names(x(y)) <- y) #> [1] "names(x(y))" find_assign(names(x(y)) <- y <- z) #> [1] "names(x(y))" "y"`

**Q**: Write a function that extracts all calls to a specified function.**A**: We just need to delete the former added else statement and check for a call (not necessarily`<-`

) within the first`if()`

in`find_assign_call()`

. We save a call when we found one and return it later as part of our character output. Everything else stays the same:`find_assign_call <- function(x) { if (is_call(x)) { lhs <- expr_text(x) children <- as.list(x)[-1] } else { lhs <- character() children <- as.list(x) } c(lhs, flat_map_chr(children, find_assign_rec)) } find_assign_rec <- function(x) { switch_expr(x, # Base cases constant = , symbol = character(), # Recursive cases pairlist = flat_map_chr(x, find_assign_rec), call = find_assign_call(x) ) } find_assign(x <- y) #> [1] "x <- y" find_assign(names(x(y)) <- y <- z) #> [1] "names(x(y)) <- y <- z" "names(x(y))" "x(y)" #> [4] "y <- z" find_assign(mean(sum(1:3))) #> [1] "mean(sum(1:3))" "sum(1:3)" "1:3"`