I'd like to evaluate the time to extract data from a raster time series using different file types (geotiff, binary) or objects (RasterBrick, RasterStack). I created a function that will extract the time series from a random point of the raster object and I then use microbenchmark to test it.
Ex.:
# read a random point from a raster stack
sample_raster <- function(stack) {
poi <- sample(ncell(stack), 1)
raster::extract(stack, poi)
}
# opening the data using different methods
data_stack <- stack(list.files(pattern = '3B.*tif'))
data_brick <- brick('gpm_multiband.tif')
bench <- microbenchmark(
sample_stack = sample_raster(data_stack),
sample_brick = sample_raster(data_brick),
times = 10
)
boxplot(bench)
# this fails because sampled point is different
bench <- microbenchmark(
sample_stack = sample_raster(data_stack),
sample_brick = sample_raster(data_brick),
times = 10,
check = 'equal'
)
I included a sample of my dataset here
With this I can see that sampling on RasterBrick is faster than stacks (R Raster manual also says so -- good). The problem is that I'm sampling at different points at each evaluated expression. So I can't check if the results are the same. What I'd like to do is sample at the same location (poi) on both objects. But have the location be different for each iteration. I tried to use the setup option in microbenchmark but from what I figured out, the setup is evaluated before each function is timed, not once per iteration. So generating a random poi using the setup will not work.
Is it possible to pass the same argument to the functions being evaluated in microbenchmark?
Result
Solution using microbenchmark
As suggested (and explained bellow), I tried the bench
package with the press
call. But for some reason it was slower than setting the same seed at each microbenchmark
iteration, as suggested by mnist. So I ended up going back to microbenchmark
. This is the code I'm using:
library(microbenchmark)
library(raster)
annual_brick <- raster::brick('data/gpm_tif_annual/gpm_2016.tif')
annual_stack <- raster::stack('data/gpm_tif_annual/gpm_2016.tif')
x <- 0
y <- 0
bm <- microbenchmark(
ext = {
x <- x + 1
set.seed(x)
poi = sample(raster_size, 1)
raster::extract(annual_brick, poi)
},
slc = {
y <- y + 1
set.seed(y)
poi = sample(raster_size, 1)
raster::extract(annual_stack, poi)
},
check = 'equal'
)
Solution using bench::press
For completeness sake, this was how I did, using the bench::press
. In the process, I also separated the code for selecting the random cell from the point sampling function. So I can time only the point sampling part of the code. Here is how I'm doing it:
library(bench)
library(raster)
annual_brick <- raster::brick('data/gpm_tif_annual/gpm_2016.tif')
annual_stack <- raster::stack('data/gpm_tif_annual/gpm_2016.tif')
bm <- bench::press(
pois = sample(ncell(annual_brick), 10),
mark(
iterations = 1,
sample_brick = raster::extract(annual_brick, pois),
sample_stack = raster::extract(annual_stack, pois)
)
)
microbenchmark
? Can you share some data? I find it hard to follow your question without know what the data looks like. – Danelledanetedput(head(annual_brick, 20))
. – Danelledaneteset.seed()
work? – Extolset.seed()
be evaluated before each expression in the microbenchmark? So I'd have different seeds (and rando points) for each expression. If I fix the seed, than I'd have the same point sampled over all iterations of the benchmark. – Caribbean