# 18 Measuring performance

## 18.1 Profiling

**Q1**: Profile the following function with `torture = TRUE`

. What is surprising? Read the source code of `rm()`

to figure out what’s going on.

**A**: We expect `f()`

to create a vector (`x`

) of length `n`

, which is then removed so that `f()`

just returns `NULL`

. When we profile this function, it executes too fast for meaningful results.

```
profvis::profvis(f())
#> Error in parse_rprof(prof_output, expr_source): No parsing data available. Maybe your function was too fast?
```

Setting `torture = TRUE`

triggers garbage collection after every memory allocation call, which may be useful for more exact memory profiling.

`profvis::profvis(f(), torture = TRUE)`

Surprisingly, profiling `f()`

like this takes a very long time. What could be the reason?

We follow the hint in the question and inspect the source code of `rm()`

:

```
function (..., list = character(), pos = -1,
envir = as.environment(pos),
inherits = FALSE)
{
dots <- match.call(expand.dots = FALSE)$...
if (
length(dots) && !all(
vapply(dots, function(x) is.symbol(x) ||
is.character(x), NA, USE.NAMES = FALSE)
)
)
stop("... must contain names or character strings")
names <- vapply(dots, as.character, "")
if (length(names) == 0L)
names <- character()
list <- .Primitive("c")(list, names)
.Internal(remove(list, envir, inherits))
}
```

`rm()`

does a surprising amount of work to get the name of the object to delete because it relies on non-standard evaluation.

We can make the job of `rm()`

considerably simpler by using the `list`

argument:

Unfortunately, this still takes too long, and we are literally stuck in profiling.

Anecdotally, one of the authors once finished the profiling under an older R version. But the output seemed to be not very meaningful.

In conclusion, this question appears to be unanswerable for us, even for Hadley.

## 18.2 Microbenchmarking

**Q1**: Instead of using `bench::mark()`

, you could use the built-in function `system.time()`

. But `system.time()`

is much less precise, so you’ll need to repeat each operation many times with a loop, and then divide to find the average time of each operation, as in the code below.

```
n <- 1e6
system.time(for (i in 1:n) sqrt(x)) / n
system.time(for (i in 1:n) x ^ 0.5) / n
```

How do the estimates from `system.time()`

compare to those from `bench::mark()`

? Why are they different?

**A**: We first microbenchmark these two expressions using `bench::mark()`

^{26} and observe that the mean is not reported (as it is generally more affected by outliers).

```
n <- 1e6
x <- runif(100)
bench_df <- bench::mark(
sqrt(x),
x ^ 0.5,
iterations = n
)
bench_df
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sqrt(x) 888ns 1.04µs 679098. 848B 10.9
#> 2 x^0.5 3.82µs 4.12µs 219102. 848B 7.01
```

We need to access the raw data, so we can compare the results of both benchmarking approaches.

```
t1_bench <- mean(unlist(bench_df[1, "time"]))
t2_bench <- mean(unlist(bench_df[2, "time"]))
t1_systime <- system.time(for (i in 1:n) sqrt(x)) / n
t2_systime <- system.time(for (i in 1:n) x ^ 0.5) / n
```

We see, that both approaches get the order of magnitude right. We assume, that the `bench::mark()`

-results may be a little more accurate, because of its high precision timer. There may also be overhead introduced by the for loop in the `system.time()`

-approach.

```
# Compare the results
t1_systime["elapsed"]
#> elapsed
#> 1.24e-06
t1_bench
#> [1] 1.88e-06
t2_systime["elapsed"]
#> elapsed
#> 4.18e-06
t2_bench
#> [1] 5e-06
```

Side Note: take a look at `?proc.time`

if you want to learn about the differences between “user,” “system” and “elapsed” time.

**Q2**: Here are two other ways to compute the square root of a vector. Which do you think will be fastest? Which will be slowest? Use microbenchmarking to test your answers.

**A**: To compare these approaches, we’ll `bench::mark()`

them and sort the result by the median execution time.

```
x <- runif(100)
bm <- bench::mark(
sqrt(x),
x^0.5,
x^(1 / 2),
exp(log(x) / 2)
)
bm[order(bm$median), ]
#> # A tibble: 4 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sqrt(x) 908ns 1.1µs 656785. 848B 0
#> 2 exp(log(x)/2) 2.97µs 3.22µs 295051. 848B 29.5
#> 3 x^0.5 3.94µs 4.23µs 213837. 848B 0
#> 4 x^(1/2) 4.08µs 4.45µs 204753. 848B 0
```

As one might expect the idiomatic primitive function `sqrt()`

is the fastest. The approach `exp(log(x) / 2)`

which builds on two other primitive functions is second, even though already considerably slower. The other calculations are even slower: `x ^ 0.5`

is faster than `x ^ (1 / 2)`

, because `0.5`

requires less computation than `(1 / 2)`

.