18 Expressions

Prerequisites

To capture and compute on expressions, and to visualise them, we will load the rlang and the lobstr packages.

library(rlang)
library(lobstr)

18.1 Abstract syntax trees

  1. Q: Reconstruct the code represented by the trees below:

    #> █─f 
    #> └─█─g 
    #>   └─█─h
    #> █─`+` 
    #> ├─█─`+` 
    #> │ ├─1 
    #> │ └─2 
    #> └─3
    #> █─`*` 
    #> ├─█─`(` 
    #> │ └─█─`+` 
    #> │   ├─x 
    #> │   └─y 
    #> └─z

    A: The source is with you. ;)

    ast(f(g(h())))
    #> █─f 
    #> └─█─g 
    #>   └─█─h
    ast(1 + 2 + 3)
    #> █─`+` 
    #> ├─█─`+` 
    #> │ ├─1 
    #> │ └─2 
    #> └─3
    ast((x + y) * z)
    #> █─`*` 
    #> ├─█─`(` 
    #> │ └─█─`+` 
    #> │   ├─x 
    #> │   └─y 
    #> └─z
  2. Q: Draw the following trees by hand then check your answers with lobstr::ast().

    f(g(h(i(1, 2, 3))))
    f(1, g(2, h(3, i())))
    f(g(1, 2), h(3, i(4, 5)))

    A: TODO: Drawings by hand

    ast(f(g(h(i(1, 2, 3)))))
    #> █─f 
    #> └─█─g 
    #>   └─█─h 
    #>     └─█─i 
    #>       ├─1 
    #>       ├─2 
    #>       └─3
    ast(f(1, g(2, h(3, i()))))
    #> █─f 
    #> ├─1 
    #> └─█─g 
    #>   ├─2 
    #>   └─█─h 
    #>     ├─3 
    #>     └─█─i
    ast(f(g(1, 2), h(3, i(4, 5))))
    #> █─f 
    #> ├─█─g 
    #> │ ├─1 
    #> │ └─2 
    #> └─█─h 
    #>   ├─3 
    #>   └─█─i 
    #>     ├─4 
    #>     └─5
  3. Q: What’s happening with the ASTs below? (Hint: carefully read ?"^")

    lobstr::ast(`x` + `y`)
    #> █─`+` 
    #> ├─x 
    #> └─y
    lobstr::ast(x ** y)
    #> █─`^` 
    #> ├─x 
    #> └─y
    lobstr::ast(1 -> x)
    #> █─`<-` 
    #> ├─x 
    #> └─1

    A: As the AST starts function calls always with the name of the function, the call in the first expression is translated into its prefix form. In the second case, ** is translated by R’s parser into ^. In the last AST, the expression is flipped when R parses it:

    str(expr(a -> b))
    #>  language b <- a
  4. Q: What is special about the AST below? (Hint: re-read Section 6.2.1)

    lobstr::ast(function(x = 1, y = 2) {})
    #> █─`function` 
    #> ├─█─x = 1 
    #> │ └─y = 2 
    #> ├─█─`{` 
    #> └─<inline srcref>

    A: The last leaf of the AST is not explicitly specified in the expression. Instead the srcref attribute, which points to the functions source code, is created automatically by base R.

  5. Q: What does the call tree of an if statement with multiple else if conditions look like? Why?

    A: The ast of nested else if statements might look a bit confusing because it contains multiple brackets. However, we can see that in the else part of the ast just another expression is being evaluated, which happens to be an if statement and so forth.

    lobstr::ast(
    if (FALSE) {
      1
    } else if (FALSE) {
      2
    } else if (TRUE) {
      3
    }
    )
    #> █─`if` 
    #> ├─FALSE 
    #> ├─█─`{` 
    #> │ └─1 
    #> └─█─`if` 
    #>   ├─FALSE 
    #>   ├─█─`{` 
    #>   │ └─2 
    #>   └─█─`if` 
    #>     ├─TRUE 
    #>     └─█─`{` 
    #>       └─3

    We can see the structure more clearly when we avoid the curly brackets through prefix notation.

    lobstr::ast(`if`(FALSE, 1, `if`(FALSE, 2, `if`(TRUE, 3))))
    #> █─`if` 
    #> ├─FALSE 
    #> ├─1 
    #> └─█─`if` 
    #>   ├─FALSE 
    #>   ├─2 
    #>   └─█─`if` 
    #>     ├─TRUE 
    #>     └─3

18.2 Expressions

  1. Q: Which two of the six types of atomic vector can’t appear in an expression? Why? Similarly, why can’t you create an expression that contains an atomic vector of length greater than one?

    A: It is not possible to create an expression that evaluates to an atomic of length greater than one without using a function (i.e. the c() function). But expressions that include a function would be calls.

    Let us illustrate this observation via the following example:

    is.atomic(quote(1))       # atomic
    #> [1] TRUE
    is.atomic(quote(c(1,1)))  # not an atomic (it would just evaluate to an atomic).
    #> [1] FALSE
    is.call(quote(c(1,1)))    # still a call! (so at least a valid expression).
    #> [1] TRUE

    Two of the six atomic vector types of R do not work with expressions, the first one being raws. We assume, that raws may only be constructed through using as.raw(), but this function would then creating another call in the AST.

    For similar reasons complex numbers also won’t work:

    (function(x){is.atomic(x) & length(x) == 1})(quote(1 + 1.5i))
    #> [1] FALSE
    
    # however, imaginary parts of complex numbers work:
    lobstr::ast(1i)
    #> 1i

  2. Q: What happens when you subset a call object to remove the first element? e.g. expr(read.csv("foo.csv", header = TRUE))[-1]. Why?

    A: When the first element of a call object is removed, the second element moves to the first position, which is in general taken as the function to call. Therefore, we remain with the call "food.csv"(header = TRUE)

  3. Q: Describe the differences between the following call objects.

    x <- 1:10
    
    call2(median, x, na.rm = TRUE)
    call2(expr(median), x, na.rm = TRUE)
    call2(median, expr(x), na.rm = TRUE)
    call2(expr(median), expr(x), na.rm = TRUE)

    A: The call objects differ in their first two elements, which are sometimes evaluated before the call is constructed. In the first one, both median() and x are evaluated. Therefore, we can see in the constructed call that median is a generic and the x argument is 1:10. In the following calls we remain with differing combinations. Once, only x and once only only median() gets evaluated. In the final call both aren’t evaluated.

  4. Q: rlang::call_standardise() doesn’t work so well for the following calls. Why? What makes mean() special?

    library(rlang)
    
    call_standardise(quote(mean(1:10, na.rm = TRUE)))
    #> mean(x = 1:10, na.rm = TRUE)
    call_standardise(quote(mean(n = T, 1:10)))
    #> mean(x = 1:10, n = T)
    call_standardise(quote(mean(x = 1:10, , TRUE)))
    #> mean(x = 1:10, , TRUE)

    A: The reason for this unexpected behaviour lies in the fact that mean() uses the ... argument and therefore can not standardise the regarding arguments. Since mean() uses S3 dispatch (i.e., UseMethod) and the underlying mean.default() method specifies some more arguments, rlang::call_standardise() can do much better when the S3 dispatch is explicit.

    call_standardise(quote(mean.default(1:10, na.rm = TRUE)))
    #> mean.default(x = 1:10, na.rm = TRUE)
    call_standardise(quote(mean.default(n = T, 1:10)))
    #> mean.default(x = 1:10, na.rm = T)
    call_standardise(quote(mean.default(x = 1:10, , TRUE)))
    #> mean.default(x = 1:10, na.rm = TRUE)
  5. Q: Why does this code not make sense?

    x <- expr(foo(x = 1))
    names(x) <- c("x", "")

    A: As stated in the book

    The first element of a call is always the function that gets called.

    We can just look what will happen

    x <- rlang::expr(foo(x = 1))
    x
    #> foo(x = 1)
    
    names(x) <- c("x", "")
    x
    #> foo(1)
    
    names(x) <- c("", "x")
    x
    #> foo(x = 1)

    So giving the first element a name just adds useless metadata.

  6. Q: Construct the expression if(x > 1) "a" else "b" using multiple calls to call2(). How does the structure code reflect the structure of the AST?

    A: Similar to the prefix version we get

    call2("if", expr(x > 1), "a", "b")
    #> if (x > 1) "a" else "b"

    When we reed the AST from left to right, we get the same structure: Function to evaluate, expression, which is another function and becomes evaluated first and two constants which will be evaluated next

    lobstr::ast(`if`(x > 1, "a", "b"))
    #> █─`if` 
    #> ├─█─`>` 
    #> │ ├─x 
    #> │ └─1 
    #> ├─"a" 
    #> └─"b"

18.3 Parsing and grammar

  1. Q: R uses parentheses in two slightly different ways as illustrated by these two calls:

    f((1))
    `(`(1 + 1)

    Compare and contrast the two uses by referencing the AST.

    A: The trick with these examples lies in the fact, that ( can represent a primitive function but also be a part of R’s general prefix function syntax.

    So in the AST of the first example, we will not see the outer (, which belongs to f() and is therefore not shown in the syntax, while the inner ( is treated as a function (symbol):

    lobstr::ast(f((1)))
    #> █─f 
    #> └─█─`(` 
    #>   └─1

    In the second example, we can see that the outer ( is treated as a function and the inner ( belongs to its syntax:

    lobstr::ast(`(`(1 + 1))
    #> █─`(` 
    #> └─█─`+` 
    #>   ├─1 
    #>   └─1

    For the sake of clarity, let’s also create a third example, where none of the ( is part of another functions syntax:

    lobstr::ast(((1 + 1)))
    #> █─`(` 
    #> └─█─`(` 
    #>   └─█─`+` 
    #>     ├─1 
    #>     └─1
  2. Q: = can also be used in two ways. Construct a simple example that shows both uses.

    A: I was not exactly aware of a similar case with multiple syntactical meanings for the = symbol, but one can get there somehow. = is used as an operator for assignment. It is also part of the logical operators ==, >=, <=, != and is also used within functions to assign parameters or the definition of default settings.

    The question probably aims at the difference of global assignment and parameter definition within functions.

    So when we play with ast(), we can directly see that the following is not possible

    lobstr::ast(a = 1)
    #> Error in lobstr::ast(a = 1): unused argument (a = 1)

    We get an error, because a = makes R looking for an argument called a. Since x is the only argument of lobstr::ast(), we get an error.

    When we build our workaround for the problem, the solution to the question becomes obvious.

    Instead a = 1, we pass the expression via brackets to ast(). Once via matching by position and once via matching by name

    lobstr::ast((a = 1))
    #> █─`(` 
    #> └─█─`=` 
    #>   ├─a 
    #>   └─1
    lobstr::ast(x = (a = 1))
    #> █─`(` 
    #> └─█─`=` 
    #>   ├─a 
    #>   └─1

    The second way is more explicit, but both return the same syntax tree. When wee ignore the brackets and compare the trees, we can finally see from the second tree, that the first = is just part of the syntax and the second one is for the usage of assignment.

  3. Q: Does -2^2 yield 4 or -4? Why?

    A: It yields -4, because ^ has higher operator precedence than -, which we can verify by looking at the AST:

    -2^2
    #> [1] -4
    
    lobstr::ast(-2^2)
    #> █─`-` 
    #> └─█─`^` 
    #>   ├─2 
    #>   └─2
  4. Q: What does !1 + !1 return? Why?

    A: The first answer is quite simple

    !1 + !1
    #> [1] FALSE

    To answer the “Why”, we have a look at the syntax tree first

    lobstr::ast(!1 + !1)
    #> █─`!` 
    #> └─█─`+` 
    #>   ├─1 
    #>   └─█─`!` 
    #>     └─1

    So first, the second !1 becomes evaluated, which results in FALSE, because in R every non 0 numeric, becomes coerced to TRUE, when a logical operator is applied on it.

    Next 1 + FALSE is evaluated to 1, since FALSE is coerced to 0.

    Finally !1 is evaluated to FALSE, because it is the opposite of TRUE, which is what 1 becomes coerced to.

    However, note that if ! had a higher precedence, the result would get FALSE + FALSE as intermediate result, which would be evalutated (again involving coercion) to 0.

  5. Q: Why does x1 <- x2 <- x3 <- 0 work? Describe the two reasons.

    A: One reason is that <- is right-associative.

  6. Q: Compare the ASTs of x + y %+% z and x ^ y %+% z. What have you learned about the precedence of custom infix functions?

    A: Comparison of the syntax trees:

    # for ast(x + y %+% z)
    # y %+% z will be calculated first and the result will be added to x
    lobstr::ast(x + y %+% z)
    #> █─`+` 
    #> ├─x 
    #> └─█─`%+%` 
    #>   ├─y 
    #>   └─z
    
    # for ast(x ^ y %+% z) 
    # x ^ y will be calculated first, and the result will be used as 
    # first argument of %+%()
    lobstr::ast(x ^ y %+% z)
    #> █─`%+%` 
    #> ├─█─`^` 
    #> │ ├─x 
    #> │ └─y 
    #> └─z

    So we can conclude that custom infix functions must have a precedence between addition and exponentiation. The general precedence rules can be found for example here.

  7. Q: What happens if you call parse_expr() with a string that generates multiple expressions? e.g. parse_expr("x + 1; y + 1")

    A: parse_expr() notices that more than one expression would be generated and throws an error.

    parse_expr("x + 1; y + 1")
    #> Warning: `rlang__backtrace_on_error` is no longer experimental.
    #> It has been renamed to `rlang_backtrace_on_error`. Please update your RProfile.
    #> This warning is displayed once per session.
    #> Error: More than one expression parsed
  8. Q: What happens if you attempt to parse an invalid expression? e.g. "a +" or "f())".

    A: We get an error from the underlying parse function

    rlang::parse_expr("a +")
    #> Error in parse(text = x): <text>:2:0: unexpected end of input
    #> 1: a +
    #> ^
    rlang::parse_expr("f())")
    #> Error in parse(text = x): <text>:1:4: unexpected ')'
    #> 1: f())
    #> ^
    
    parse(text = "a +")
    #> Error in parse(text = "a +"): <text>:2:0: unexpected end of input
    #> 1: a +
    #> ^
    parse(text = "f())")
    #> Error in parse(text = "f())"): <text>:1:4: unexpected ')'
    #> 1: f())
    #> ^
  9. Q: deparse() produces vectors when the input is long. For example, the following call produces a vector of length two:

    expr <- expr(g(a + b + c + d + e + f + g + h + i + j + k + l + 
      m + n + o + p + q + r + s + t + u + v + w + x + y + z))
    
    deparse(expr)

    What does expr_text() do instead?

    A: expr_text() pastes the results from deparse(expr) together with a linebreak \n as separator.

    expr <- expr(g(a + b + c + d + e + f + g + h + i + j + k + l + 
      m + n + o + p + q + r + s + t + u + v + w + x + y + z))
    deparse(expr)
    #> [1] "g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + "
    #> [2] "    p + q + r + s + t + u + v + w + x + y + z)"
    expr_text(expr)
    #> [1] "g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + \n    p + q + r + s + t + u + v + w + x + y + z)"
  10. Q: pairwise.t.test() assumes that deparse() always returns a length one character vector. Can you construct an input that violates this expectation? What happens?

    A: We can pass an expression to one of pairwise.t.test()’s data input arguments, which exceeds the default cutoff width in deparse(). The expression will be split into a character vector of length greater 1. The deparsed data inputs are directly pasted (read the source code!) with “and” as separator and the result is just used to be displayed in the output. Just the data.name output will change (it will include more than one “and”).

    d=1
    pairwise.t.test(2, d+d+d+d+d+d+d+d+d+d+d+d+d+d+d+d+d)
    #> 
    #>  Pairwise comparisons using t tests with pooled SD 
    #> 
    #> data:  2 and d + d + d + d + d + d + d + d + d + d + d + d + d + d + d + d +  2 and     d 
    #> 
    #> <0 x 0 matrix>
    #> 
    #> P value adjustment method: holm

18.4 Walking the AST with recursive functions

  1. Q: logical_abbr() returns TRUE for T(1, 2, 3). How could you modify logical_abbr_rec() so that it ignores function calls that use T or F?

    A: We can apply a similar logic as in the multiple assignment example from the textbook and just treat this case as a special case handled within a sub function called find_T_call() which finds T() calls and “bounces them out”:

    find_T_call <- function(x) {
      if (is_call(x, "T")) {
      x <- as.list(x)[-1]
      purrr::some(x, logical_abbr_rec)
      } else {
      purrr::some(x, logical_abbr_rec)
      }
    }
    
    logical_abbr_rec <- function(x) {
      switch_expr(
      x,
      # Base cases
      constant = FALSE,
      symbol = as_string(x) %in% c("F", "T"),
    
      # Recursive cases
      pairlist = purrr::some(x, logical_abbr_rec),
      call = find_T_call(x)
      )
    }
    
    logical_abbr <- function(x) {
      logical_abbr_rec(enexpr(x))
    }

    Now lets test our new logical_abbr() function:

    logical_abbr(T(1, 2, 3))
    #> [1] FALSE
    logical_abbr(T(T, T(3, 4)))
    #> [1] TRUE
    logical_abbr(T(T))
    #> [1] TRUE
    logical_abbr(T())
    #> [1] FALSE
    logical_abbr()
    #> [1] FALSE
    logical_abbr(c(T, T, T))
    #> [1] TRUE
  2. Q: logical_abbr() works with expressions. It currently fails when you give it a function. Why not? How could you modify logical_abbr() to make it work? What components of a function will you need to recurse over?

    f <- function(x = TRUE) {
      g(x + T)
    }
    logical_abbr(!!f)

    A: It currently fails, because "closure" is not handled within switch_expr() within logical_abbr_rec(). If we wanted to make it work, we must open a case there and write a function to inspect the formals and the body of the input function.

  3. Q: Modify find assignment to also detect assignment using replacement functions, i.e. names(x) <- y.

    A: Let`s see what the AST of such an assignment looks like:

    ast(names(x) <- x)
    #> █─`<-` 
    #> ├─█─names 
    #> │ └─x 
    #> └─x

    So we need to catch the case where the first two elements are both calls. Further the first call is identical to <- and we must return only the second call to see which objects got new values assigned.

    This is why we add the following block Within another else statement in find_assign_call():

    if (is_call(x, "<-") && is_call(x[[2]])) {
      lhs <- expr_text(x[[2]])
      children <- as.list(x)[-1]
    }

    Let us finish with the whole code including some tests for our new function:

    flat_map_chr <- function(.x, .f, ...) {
      purrr::flatten_chr(purrr::map(.x, .f, ...))
    }
    
    find_assign <- function(x) unique(find_assign_rec(enexpr(x)))
    
    find_assign_call <- function(x) {
      if (is_call(x, "<-") && is_symbol(x[[2]])) {
        lhs <- as_string(x[[2]])
        children <- as.list(x)[-1]
      } else {
      if (is_call(x, "<-") && is_call(x[[2]])) {
        lhs <- expr_text(x[[2]])
        children <- as.list(x)[-1]
      } else {
        lhs <- character()
        children <- as.list(x)
      }}
    
      c(lhs, flat_map_chr(children, find_assign_rec))
    }
    
    find_assign_rec <- function(x) {
      switch_expr(
        x,
        # Base cases
        constant = ,symbol = character(),
        # Recursive cases
        pairlist = flat_map_chr(x, find_assign_rec),
        call = find_assign_call(x)
      )
    }
    
    find_assign(x <- y)
    #> [1] "x"
    find_assign(names(x))
    #> character(0)
    find_assign(names(x) <- y)
    #> [1] "names(x)"
    find_assign(names(x(y)) <- y)
    #> [1] "names(x(y))"
    find_assign(names(x(y)) <- y <- z)
    #> [1] "names(x(y))" "y"
  4. Q: Write a function that extracts all calls to a specified function.

    A: We just need to delete the former added else statement and check for a call (not necessarily <-) within the first if() in find_assign_call(). We save a call when we found one and return it later as part of our character output. Everything else stays the same:

    find_assign_call <- function(x) {
      if (is_call(x)) {
        lhs <- expr_text(x)
        children <- as.list(x)[-1]
        } else {
          lhs <- character()
          children <- as.list(x)
        }
    
      c(lhs, flat_map_chr(children, find_assign_rec))
    }
    
    find_assign_rec <- function(x) {
      switch_expr(x,
                  # Base cases
                  constant = ,
                  symbol = character(),
    
                  # Recursive cases
                  pairlist = flat_map_chr(x, find_assign_rec),
                  call = find_assign_call(x)
      )
    }
    
    find_assign(x <- y)
    #> [1] "x <- y"
    find_assign(names(x(y)) <- y <- z)
    #> [1] "names(x(y)) <- y <- z" "names(x(y))"           "x(y)"                 
    #> [4] "y <- z"
    find_assign(mean(sum(1:3)))
    #> [1] "mean(sum(1:3))" "sum(1:3)"       "1:3"