Why does wrapping this loop in a function speed it up by 8x? [duplicate]

microbenchmark::microbenchmark( unwrapped = { x <- numeric(10000) for (i in 1:10000) { x[i] <- i * (i + 1) / 2 } x }, wrapped = { tri_nums <- function(n) { x <- numeric(n) for (i in 1:n) { x[i] <- i * (i + 1) / 2 } x } tri_nums(10000) }, vapply = vapply(1:10000, \(i) i * (i + 1) / 2, numeric(1)), check = 'equal' ) #> Unit: microseconds #> expr min lq mean median uq max neval #> unwrapped 2652.487 3006.888 3445.896 3150.7555 3832.094 7029.949 100 #> wrapped 398.534 414.010 455.333 439.7445 469.307 656.074 100 #> vapply 4942.000 5154.639 5937.333 5453.2880 5969.760 13730.718 100

It's byte-compiling your function.

We can confirm just-in-time (JIT) compilation with:

compiler::enableJIT(-1)
# [1] 3                        # <--- this is the previous JIT level

where negative returns the current level unchanged, and a value of 3 means highest JIT compiling level. I'm not certain what steps each level is doing, but we can make a simple test to compare them. (See ?enableJIT for more info.)

compiler::enableJIT(0)
# [1] 3
tri_nums <- function(n) {
  x <- numeric(n)
  for (i in 1:n) {
    x[i] <- i * (i + 1) / 2
  }
  x
}
bench::mark(
  unwrapped = {
    x <- numeric(10000)
    for (i in 1:10000) {
      x[i] <- i * (i + 1) / 2
    }
    x
  },
  JIT0 = tri_nums(10000),
  vapply = vapply(1:10000, \(i) i * (i + 1) / 2, numeric(1))
)
# # A tibble: 3 × 13
#   expression      min   median `itr/sec` mem_al…¹ gc/se…² n_itr  n_gc total…³ result memory     time       gc      
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t> <list> <list>     <list>     <list>  
# 1 unwrapped    8.21ms    8.7ms      113.   78.2KB    7.07    48     3   424ms <dbl>  <Rprofmem> <bench_tm> <tibble>
# 2 JIT0         7.26ms   7.72ms      128.   78.2KB    9.84    52     4   407ms <dbl>  <Rprofmem> <bench_tm> <tibble>
# 3 vapply       5.97ms    6.5ms      152.   78.2KB    9.51    64     4   421ms <dbl>  <Rprofmem> <bench_tm> <tibble>
# # … with abbreviated variable names ¹mem_alloc, ²`gc/sec`, ³total_time

(I can't put all three levels in at once, since I believe the JIT check is done when we call it as well as when we define it. I'm really not qualified to speak to this level of R-internal, so ... please correct me and/or add amplifying information.)

Doing this again for levels 1-3 and copy/pasting the relevant bench::mark rows, we see:

# # A tibble: 3 × 13
#   expression      min   median `itr/sec` mem_al…¹ gc/se…² n_itr  n_gc total…³ result memory     time       gc      
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t> <list> <list>     <list>     <list>  
# 1 unwrapped    8.21ms    8.7ms      113.   78.2KB    7.07    48     3   424ms <dbl>  <Rprofmem> <bench_tm> <tibble>
# 2 JIT0         7.26ms   7.72ms      128.   78.2KB    9.84    52     4   407ms <dbl>  <Rprofmem> <bench_tm> <tibble>
# 2 JIT1        419.6µs  502.5µs     1923.  108.7KB    0      962     0   500ms <dbl>  <Rprofmem> <bench_tm> <tibble>
# 2 JIT2        413.4µs  494.3µs     1971.  108.7KB    0      986     0   500ms <dbl>  <Rprofmem> <bench_tm> <tibble>
# 2 JIT3        426.7µs  498.3µs     1981.  108.7KB    0      991     0   500ms <dbl>  <Rprofmem> <bench_tm> <tibble>
# 3 vapply       5.97ms    6.5ms      152.   78.2KB    9.51    64     4   421ms <dbl>  <Rprofmem> <bench_tm> <tibble>
# # … with abbreviated variable names ¹mem_alloc, ²`gc/sec`, ³total_time

showing that the vast majority of gains are in the first level of byte-compiling (not too surprising given the simplicity of this function).

Note: for anybody who is actually testing some of this code, you might want to ensure you're back at the default level of 3:

compiler::enableJIT(3)

Recommended topics

Hot tags