18 Measuring performance

18.1 Profiling

Q1: Profile the following function with torture = TRUE. What is surprising? Read the source code of rm() to figure out what’s going on.

f <- function(n = 1e5) {
  x <- rep(1, n)
  rm(x)
}

A: We expect f() to create a vector (x) of length n, which is then removed so that f() just returns NULL. When we profile this function, it executes too fast for meaningful results.

profvis::profvis(f())
#> Error in parse_rprof(prof_output, expr_source): No parsing data available. Maybe your function was too fast?

Setting torture = TRUE triggers garbage collection after every memory allocation call, which may be useful for more exact memory profiling.

profvis::profvis(f(), torture = TRUE)

Surprisingly, profiling f() like this takes a very long time. What could be the reason?

We follow the hint in the question and inspect the source code of rm():

function (..., list = character(), pos = -1,
          envir = as.environment(pos), 
          inherits = FALSE) 
{
  dots <- match.call(expand.dots = FALSE)$...
  if (
    length(dots) && !all(
      vapply(dots, function(x) is.symbol(x) || 
             is.character(x), NA, USE.NAMES = FALSE)
    )
  ) 
    stop("... must contain names or character strings")
  names <- vapply(dots, as.character, "")
  if (length(names) == 0L) 
    names <- character()
  list <- .Primitive("c")(list, names)
  .Internal(remove(list, envir, inherits))
}

rm() does a surprising amount of work to get the name of the object to delete because it relies on non-standard evaluation.

We can make the job of rm() considerably simpler by using the list argument:

f2 <- function(n = 1e5) {
  x <- rep(1, n)
  rm(list = "x")
}
profvis::profvis(f2(), torture = TRUE)

Unfortunately, this still takes too long, and we are literally stuck in profiling.

Anecdotally, one of the authors once finished the profiling under an older R version. But the output seemed to be not very meaningful.

In conclusion, this question appears to be unanswerable for us, even for Hadley.

18.2 Microbenchmarking

Q1: Instead of using bench::mark(), you could use the built-in function system.time(). But system.time() is much less precise, so you’ll need to repeat each operation many times with a loop, and then divide to find the average time of each operation, as in the code below.

n <- 1e6
system.time(for (i in 1:n) sqrt(x)) / n
system.time(for (i in 1:n) x ^ 0.5) / n

How do the estimates from system.time() compare to those from bench::mark()? Why are they different?

A: We first microbenchmark these two expressions using bench::mark()²⁶ and observe that the mean is not reported (as it is generally more affected by outliers).

n <- 1e6
x <- runif(100)

bench_df <- bench::mark(
  sqrt(x), 
  x ^ 0.5,
  iterations = n
)

bench_df
#> # A tibble: 2 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sqrt(x)       888ns   1.04µs   679098.      848B    10.9 
#> 2 x^0.5        3.82µs   4.12µs   219102.      848B     7.01

We need to access the raw data, so we can compare the results of both benchmarking approaches.

t1_bench <- mean(unlist(bench_df[1, "time"]))
t2_bench <- mean(unlist(bench_df[2, "time"]))

t1_systime <- system.time(for (i in 1:n) sqrt(x)) / n
t2_systime <- system.time(for (i in 1:n) x ^ 0.5) / n

We see, that both approaches get the order of magnitude right. We assume, that the bench::mark()-results may be a little more accurate, because of its high precision timer. There may also be overhead introduced by the for loop in the system.time()-approach.

# Compare the results
t1_systime["elapsed"]
#>  elapsed 
#> 1.24e-06
t1_bench
#> [1] 1.88e-06

t2_systime["elapsed"]
#>  elapsed 
#> 4.18e-06
t2_bench
#> [1] 5e-06

Side Note: take a look at ?proc.time if you want to learn about the differences between “user,” “system” and “elapsed” time.

Q2: Here are two other ways to compute the square root of a vector. Which do you think will be fastest? Which will be slowest? Use microbenchmarking to test your answers.

x ^ (1 / 2)
exp(log(x) / 2)

A: To compare these approaches, we’ll bench::mark() them and sort the result by the median execution time.

x <- runif(100)

bm <- bench::mark(
  sqrt(x),
  x^0.5,
  x^(1 / 2),
  exp(log(x) / 2)
)

bm[order(bm$median), ]
#> # A tibble: 4 x 6
#>   expression         min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sqrt(x)          908ns    1.1µs   656785.      848B      0  
#> 2 exp(log(x)/2)   2.97µs   3.22µs   295051.      848B     29.5
#> 3 x^0.5           3.94µs   4.23µs   213837.      848B      0  
#> 4 x^(1/2)         4.08µs   4.45µs   204753.      848B      0

As one might expect the idiomatic primitive function sqrt() is the fastest. The approach exp(log(x) / 2) which builds on two other primitive functions is second, even though already considerably slower. The other calculations are even slower: x ^ 0.5 is faster than x ^ (1 / 2), because 0.5 requires less computation than (1 / 2).

17 Translating R code

19 Improving performance