R microbenchmark: How to pass same argument to evaluated functions?
Asked Answered
C

2

5

I'd like to evaluate the time to extract data from a raster time series using different file types (geotiff, binary) or objects (RasterBrick, RasterStack). I created a function that will extract the time series from a random point of the raster object and I then use microbenchmark to test it.

Ex.:

# read a random point from a raster stack
sample_raster <- function(stack) {
  poi <- sample(ncell(stack), 1)
  raster::extract(stack, poi)
}

# opening the data using different methods
data_stack <- stack(list.files(pattern = '3B.*tif'))
data_brick <- brick('gpm_multiband.tif')

bench <- microbenchmark(
  sample_stack = sample_raster(data_stack),
  sample_brick = sample_raster(data_brick),
  times = 10
)

boxplot(bench)

# this fails because sampled point is different
bench <- microbenchmark(
  sample_stack = sample_raster(data_stack),
  sample_brick = sample_raster(data_brick),
  times = 10,
  check = 'equal'
)

I included a sample of my dataset here

With this I can see that sampling on RasterBrick is faster than stacks (R Raster manual also says so -- good). The problem is that I'm sampling at different points at each evaluated expression. So I can't check if the results are the same. What I'd like to do is sample at the same location (poi) on both objects. But have the location be different for each iteration. I tried to use the setup option in microbenchmark but from what I figured out, the setup is evaluated before each function is timed, not once per iteration. So generating a random poi using the setup will not work.

Is it possible to pass the same argument to the functions being evaluated in microbenchmark?

Result

Solution using microbenchmark

As suggested (and explained bellow), I tried the bench package with the press call. But for some reason it was slower than setting the same seed at each microbenchmark iteration, as suggested by mnist. So I ended up going back to microbenchmark. This is the code I'm using:

library(microbenchmark)
library(raster)

annual_brick <- raster::brick('data/gpm_tif_annual/gpm_2016.tif')
annual_stack <- raster::stack('data/gpm_tif_annual/gpm_2016.tif')

x <- 0
y <- 0

bm <- microbenchmark(
  ext = {
    x <- x + 1
    set.seed(x)
    poi = sample(raster_size, 1)
    raster::extract(annual_brick, poi)
  },
  slc = {
    y <- y + 1
    set.seed(y)
    poi = sample(raster_size, 1)
    raster::extract(annual_stack, poi)
  },
  check = 'equal'
)

Solution using bench::press

For completeness sake, this was how I did, using the bench::press. In the process, I also separated the code for selecting the random cell from the point sampling function. So I can time only the point sampling part of the code. Here is how I'm doing it:

library(bench)
library(raster)

annual_brick <- raster::brick('data/gpm_tif_annual/gpm_2016.tif')
annual_stack <- raster::stack('data/gpm_tif_annual/gpm_2016.tif')

bm <- bench::press(
  pois = sample(ncell(annual_brick), 10),
  mark(
    iterations = 1,
    sample_brick = raster::extract(annual_brick, pois),
    sample_stack = raster::extract(annual_stack, pois)
  )
)
Caribbean answered 10/1, 2020 at 18:2 Comment(5)
Maybe github.com/r-lib/bench is better suited for your needs than microbenchmark? Can you share some data? I find it hard to follow your question without know what the data looks like.Danelledanete
The dataset I'm working is a bit large. I'll try to separate a small chunk and make a reproducible exampleCaribbean
You could do dput(head(annual_brick, 20)).Danelledanete
Would setting the set.seed() work?Extol
wouldn't set.seed() be evaluated before each expression in the microbenchmark? So I'd have different seeds (and rando points) for each expression. If I fix the seed, than I'd have the same point sampled over all iterations of the benchmark.Caribbean
A
4

My approach would be to set the same seats for each option in microbenachmark but change them prior to each function call. See the output and how the same seats are used for both calls eventually

x <- 0
y <- 0

microbenchmark::microbenchmark(
  "checasdk" = {
    # increase seat value by 1
    x <- x + 1
    print(paste("1", x))
    set.seed(x)}, 

  "check2" = {
    y <- y + 1
    print(paste("2", y))
    set.seed(y)
    }
  )
Archdeacon answered 10/1, 2020 at 23:32 Comment(0)
C
2

If I understand correctly, the OP has two requirements:

  1. The same data points should be sampled when timing the two expressions in order to check the results are identical.
  2. In addition, timing of the two expressions is to be repeated for different data points sampled.

Using the same random numbers

As suggested by Roman, set.seed() can be used to set the seed values for R's random number generator. If the same parameter is used, the sequence of generated random numbers will be the same.

sample_raster() can be modified to ensure that the random number generator will be initiliased for each call.

sample_raster <- function(stack) {
  set.seed(1L)
  poi <- sample(ncell(stack), 1)
  raster::extract(stack, poi)
}

This will met requirement 1 but not requirement 2 as the same data samples will be used for all repetitions.

Different random numbers in repetitions

The OP has asked:

Is it possible to pass the same argument to the functions being evaluated in microbenchmark?

One possibility is to use for or lapply() to loop over a sequence of seed values as suggested in answers to a similar question.

In this case, I suggest to use the bench package for benchmarking. It has a press() function which runs bench::mark() across a grid of parameters.

For this, sample_raster() gets a second parameter:

sample_raster <- function(stack, seed) {
  set.seed(seed)
  poi <- sample(ncell(stack), 1L)
  # cat(substitute(f), s, poi, "\n") # just to check, NOT to use for timings
  raster::extract(stack, poi)
}

The timings are executed for different seeds as given in vector seed_vec.

library(bench)
bm <- press(
  seed_vec = 1:10,
  mark(
    iterations = 1L,
    sample_stack = sample_raster(data_stack, seed_vec),
    sample_brick = sample_raster(data_brick, seed_vec)
  )
)

Note that the length of seed_vec determines the number of repetitions with different poi, now. The iterations parameter to mark() specifies how often the timings are to be repeated for the same seed / poi.

The results can be plotted using

library(ggplot2)
autoplot(bm)

or summarized using

library(dplyr)
bm %>% 
  group_by(expression = expression %>% as.character()) %>% 
  summarise(median = median(median), n_itr = n())
Commence answered 10/1, 2020 at 18:41 Comment(2)
But then won't the sampled point be the same in every iteration of the benchmark process?Caribbean
Yes, that is correct. Perhaps, press() from the bench package might be an alternative here. There you can pass different values for set.seed() as parameters to the benchmark.Commence

© 2022 - 2024 — McMap. All rights reserved.