13 Expressions

13.1 Abstract syntax trees

  1. Q: Use ast() and experimentation to figure out the three arguments to an if() call. What would you call them? Which arguments are required and which are optional?

    A: You can write an if() statement in several ways: with or without else, formatted or in one line and also in prefix notation. Here are several versions focussing on the possibility of leaving out curly brackets.

    One possible way of naming the arguments would be: condition (1), conclusion (2), alternative (3).

    The condition is always required. If the condition is TRUE, also the conclusion is required. If the condition is FALSE and if() is called in combination with else(), then also the alternative is required.

  2. Q: What does the call tree of an if statement with multiple else if conditions look like? Why?

    A: The ast of nested else if statements might look a bit confusing because it contains multiple brackets. However, we can see that in the else part of the ast just another expression is being evaluated, which happens to be an if statement and so forth.

    We can see the structure more clearly when we avoid the curly brackets through prefix notation.

  3. Q: What are the arguments to the for() and while() calls?

    A: for() requires an index (called var in the docs), a sequence and an expression, for example

    while() requires a condition and an expression. Again, an example in prefix notation:

    Note that a minimal expression can consist of { only.

  4. Q: Two arithmetic operators can be used in both prefix and infix style. What are they?

    A: I am not sure how this is meant to be. Theoretically every arithmetic operator can be written in prefix notation via backticks. On the other hand, + and - seem to be the only ones, which can be written in infix notation without backticks.

    However, when we look more closely, the call tree is not what we would expect from a prefix function

    So maybe it is meant to look like this…

    Of course also this doesn’t make too much sense, since in ?Syntax one can read, that R clearly differentiates between unary and binary + and - operators and a unary operator is not really what we mean, when we speak about infix operators.

    However, if we don’t differentiate in this way, this is probably the solution, since it’s obviously also an infix function:

13.2 R’s grammar

  1. Q: R uses parentheses in two slightly different ways as illustrated by these two calls:

    Compare and contrast the two uses by referencing the AST.

    A: The trick with these examples lies in the fact, that ( can represent a primitive function but also be a part of R’s general prefix function syntax.

    So in the AST of the first example, we will not see the outer (, which belongs to f() and is therefore not shown in the syntax, while the inner ( is treated as a function (symbol):

    In the second example, we can see that the outer ( is treated as a function and the inner ( belongs to its syntax:

    For the sake of clarity, let’s also create a third example, where none of the ( is part of another functions syntax:

  2. Q: = can also be used in two ways. Construct a simple example that shows both uses.

    A: I was not exactly aware of a similar case with multiple syntactical meanings for the = symbol, but one can get there somehow. = is used as an operator for assignment. It is also part of the logical operators ==, >=, <=, != and is also used within functions to assign parameters or the definition of default settings.

    The question probably aims at the difference of global assignment and parameter definition within functions.

    So when we play with ast(), we can directly see that the following is not possible

    We get an error, because a = makes R looking for an argument called a. Since x is the only argument of lobstr::ast(), we get an error.

    When we build our workaround for the problem, the solution to the question becomes obvious.

    Instead a = 1, we pass the expression via brackets to ast(). Once via matching by position and once via matching by name

    The second way is more explicit, but both return the same syntax tree. When wee ignore the brackets and compare the trees, we can finally see from the second tree, that the first = is just part of the syntax and the second one is for the usage of assignment.

  3. Q: What does !1 + !1 return? Why?

    A: The first answer is quite simple

    To answer the “Why”, we have a look at the syntax tree first

    So first, the second !1 becomes evaluated, which results in FALSE, because in R every non 0 numeric, becomes coerced to TRUE, when a logical operator is applied on it.

    Next 1 + FALSE is evaluated to 1, since FALSE is coerced to 0.

    Finally !1 is evaluated to FALSE, because it is the opposite of TRUE, which is what 1 becomes coerced to.

    However, note that if ! had a higher precedence, the result would get FALSE + FALSE as intermediate result, which would be evalutated (again involving coercion) to 0.

  4. Q: Why does x1 <- x2 <- x3 <- 0 work? There are two reasons.

    A: One reason is that <- is right-associative.

  5. Q: Compare the ASTs x + y %+% z to x ^ y %+% z. What does that tell you about the precedence of custom infix functions?

    A: Comparison of the syntax trees:

    So we can conclude that custom infix functions must have a precedence between addition and exponentiation. The general precedence rules can be found for example here.

13.3 Data structures

  1. Q: Which two of the six types of atomic vector can’t appear in an expression? Why? Why can’t you create an expression that contains an atomic vector of length greater than one?

    A: It is not possible to create an expression that evaluates to an atomic of length greater than one without using a function (i.e. the c() function). But expressions that include a function would be calls.

    Let us illustrate this observation via the following example:

    Two of the six atomic vector types of R do not work with expressions, the first one being raws. We assume, that raws may only be constructed through using as.raw(), but this function would then creating another call in the AST.

    For similar reasons complex numbers also won’t work:

  2. Q: How is rlang::maybe_missing() implemented? Why does it work?

    A: Let us take a look at the functions source code to see what’s going on

    First it is checked if the argument is missing. If so, the missing arg is returned, otherwise the argument (x) itsself is returned.

  3. Q: rlang::call_standardise() doesn’t work so well for the following calls. Why? What makes mean() special?

    A: The reason for this unexpected behaviour lies in the fact that mean() uses S3 dispatch (i.e., UseMethod) and therefore does not store its formals on mean(), but rather on mean.default(). rlang::call_standardise() can do much better when the S3 dispatch is explicit.

  4. Q: Why does this code not make sense?

    A: As stated in the book

    The first element of a call is always the function that gets called.

    We can just look what will happen

    So giving the first element a name just adds useless metadata.

  5. Q: Construct the expression if(x > 1) "a" else "b" using multiple calls to lang(). How does the structure code reflect the structure of the AST?

    A: Similar to the prefix version we get

    When we reed the AST from left to right, we get the same structure: Function to evaluate, expression, which is another function and becomes evaluated first and two constants which will be evaluated next

13.4 Parsing and deparsing

  1. Q: What happens if you attempt to parse an invalid expression? e.g. "a +" or "f())".

    A: We get an error from the underlying parse function

  2. Q: deparse() produces vectors when the input is long. For example, the following call produces a vector of length two:

    What do expr_text(), expr_name(), and expr_label() do with this input?

    A:

    • expr_text() pastes the output string into one and inserts \n (new line identifiers) as separators
    • expr_name() recreates the call into the form f(…) and deparses this expression into a string
    • expr_label() does the same as expr_name(), but surrounds the output also with backticks
  3. Q: Why does as.Date.default() use substitute() and deparse()? Why does pairwise.t.test() use them? Read the source code.

    A:

  4. Q: pairwise.t.test() assumes that deparse() always returns a length one character vector. Can you construct an input that violates this expectation? What happens?

    A:

13.5 Case study: Walking the AST with recursive functions

  1. Q: logical_abbr() returns TRUE for T(1, 2, 3). How could you modify logical_abbr_rec() so that it ignores function calls that use T or F?

    A: We can apply a similar logic as in the multiple assignment example from the textbook and just treat this case as a special case handled within a sub function called find_T_call() which finds T() calls and “bounces them out”:

    Now lets test our new logical_abbr() function:

  2. Q: logical_abbr() works with expressions. It currently fails when you give it a function. Why not? How could you modify logical_abbr() to make it work? What components of a function will you need to recurse over?

    A: It currently fails, because "closure" is not handled within switch_expr() within logical_abbr_rec(). If we wanted to make it work, we must open a case there and write a function to inspect the formals and the body of the input function.

  3. Q: Modify find assignment to also detect assignment using replacement functions, i.e. names(x) <- y.

    A: Let`s see what the AST of such an assignment looks like:

    So we need to catch the case where the first two elements are both calls. Further the first call is identical to <- and we must return only the second call to see which objects got new values assigned.

    This is why we add the following block Within another else statement in find_assign_call():

    Let us finish with the whole code including some tests for our new function:

  4. Q: Write a function that extracts all calls to a specified function.

    A: We just need to delete the former added else statement and check for a call (not necessarily <-) within the first if() in find_assign_call(). We save a call when we found one and return it later as part of our character output. Everything else stays the same: