18 Measuring performance
18.1 Profiling
Q1: Profile the following function with torture = TRUE
. What is surprising? Read the source code of rm()
to figure out what’s going on.
A: We expect f()
to create a vector (x
) of length n
, which is then removed so that f()
just returns NULL
. When we profile this function, it executes too fast for meaningful results.
profvis::profvis(f())
#> Error in parse_rprof(prof_output, expr_source): No parsing data available. Maybe your function was too fast?
Setting torture = TRUE
triggers garbage collection after every memory allocation call, which may be useful for more exact memory profiling.
profvis::profvis(f(), torture = TRUE)
Surprisingly, profiling f()
like this takes a very long time. What could be the reason?
We follow the hint in the question and inspect the source code of rm()
:
function (..., list = character(), pos = -1,
envir = as.environment(pos),
inherits = FALSE)
{
dots <- match.call(expand.dots = FALSE)$...
if (
length(dots) && !all(
vapply(dots, function(x) is.symbol(x) ||
is.character(x), NA, USE.NAMES = FALSE)
)
)
stop("... must contain names or character strings")
names <- vapply(dots, as.character, "")
if (length(names) == 0L)
names <- character()
list <- .Primitive("c")(list, names)
.Internal(remove(list, envir, inherits))
}
rm()
does a surprising amount of work to get the name of the object to delete because it relies on non-standard evaluation.
We can make the job of rm()
considerably simpler by using the list
argument:
Unfortunately, this still takes too long, and we are literally stuck in profiling.
Anecdotally, one of the authors once finished the profiling under an older R version. But the output seemed to be not very meaningful.
In conclusion, this question appears to be unanswerable for us, even for Hadley.
18.2 Microbenchmarking
Q1: Instead of using bench::mark()
, you could use the built-in function system.time()
. But system.time()
is much less precise, so you’ll need to repeat each operation many times with a loop, and then divide to find the average time of each operation, as in the code below.
n <- 1e6
system.time(for (i in 1:n) sqrt(x)) / n
system.time(for (i in 1:n) x ^ 0.5) / n
How do the estimates from system.time()
compare to those from bench::mark()
? Why are they different?
A: We first microbenchmark these two expressions using bench::mark()
26 and observe that the mean is not reported (as it is generally more affected by outliers).
n <- 1e6
x <- runif(100)
bench_df <- bench::mark(
sqrt(x),
x ^ 0.5,
iterations = n
)
bench_df
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sqrt(x) 888ns 1.04µs 679098. 848B 10.9
#> 2 x^0.5 3.82µs 4.12µs 219102. 848B 7.01
We need to access the raw data, so we can compare the results of both benchmarking approaches.
t1_bench <- mean(unlist(bench_df[1, "time"]))
t2_bench <- mean(unlist(bench_df[2, "time"]))
t1_systime <- system.time(for (i in 1:n) sqrt(x)) / n
t2_systime <- system.time(for (i in 1:n) x ^ 0.5) / n
We see, that both approaches get the order of magnitude right. We assume, that the bench::mark()
-results may be a little more accurate, because of its high precision timer. There may also be overhead introduced by the for loop in the system.time()
-approach.
# Compare the results
t1_systime["elapsed"]
#> elapsed
#> 1.24e-06
t1_bench
#> [1] 1.88e-06
t2_systime["elapsed"]
#> elapsed
#> 4.18e-06
t2_bench
#> [1] 5e-06
Side Note: take a look at ?proc.time
if you want to learn about the differences between “user,” “system” and “elapsed” time.
Q2: Here are two other ways to compute the square root of a vector. Which do you think will be fastest? Which will be slowest? Use microbenchmarking to test your answers.
A: To compare these approaches, we’ll bench::mark()
them and sort the result by the median execution time.
x <- runif(100)
bm <- bench::mark(
sqrt(x),
x^0.5,
x^(1 / 2),
exp(log(x) / 2)
)
bm[order(bm$median), ]
#> # A tibble: 4 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sqrt(x) 908ns 1.1µs 656785. 848B 0
#> 2 exp(log(x)/2) 2.97µs 3.22µs 295051. 848B 29.5
#> 3 x^0.5 3.94µs 4.23µs 213837. 848B 0
#> 4 x^(1/2) 4.08µs 4.45µs 204753. 848B 0
As one might expect the idiomatic primitive function sqrt()
is the fastest. The approach exp(log(x) / 2)
which builds on two other primitive functions is second, even though already considerably slower. The other calculations are even slower: x ^ 0.5
is faster than x ^ (1 / 2)
, because 0.5
requires less computation than (1 / 2)
.