Parallel processing of big rasters in R (windows)
Asked Answered
S

2

8

I'm using the doSNOW package and more specifically the parLapply function to perform reclassification (and subsequently other operations) on a list of big raster datasets (OS: Windows x64).

The code looks a little like this minimalistic example:

library(raster)
library(doSNOW)

#create list containing test rasters

x <- raster(ncol=10980,nrow=10980) 
x <- setValues(x,1:ncell(x)) 

list.x <- replicate( 9 , x )

#setting up cluster

NumberOfCluster <- 8
cl <- makeCluster(NumberOfCluster)
registerDoSNOW(cl)
junk <- clusterEvalQ(cl,library(raster))

#perform calculations on each raster

list.x <- parLapply(cl,list.x,function(x) calc(x,function(x) { x * 10 }))

#stop cluster

stopCluster(cl)

The code actually works as intended. The problem occurs when I want to proceed with the results. I'm receiving this error message:

> plot(list.x[[1]])
Error in file(fn, "rb") : cannot open the connection
In addition: Warning message:
In file(fn, "rb") :
  cannot open file 'C:\Users\*****\AppData\Local\Temp\RtmpyKYdpY\raster\r_tmp_2016-02-29_133158_752_67867.gri': No such file or directory

As far as I understood, since the rasters are quite big, they are saved in a temp file on disk. And when I'm closing the snow cluster, these files can't be accessed anymore.

So my question is, how can I access the data once the cluster is closed? Can I proceed using this method?

Thanks!

Syrup answered 29/2, 2016 at 12:56 Comment(0)
V
1

I had this exact problem while running the rasterize fucntion inside a cluster in R.

All tests worked perfectly but when I upscaled to very large and fine resolution rasters, I repeatedly got errors regarding temp files that I couldn't even find on my computer. The list object, which I needed to merge and write as 1 raster, was in R but I could do nothing with it.

After watching the temp file directory whilst the cluster was running I noticed that closing the cluster will auto-delete all temp files created, so I had to perform the merge and writeRaster functions inside the cluster, otherwise it would fail on a very similar error to yours.

Vasileior answered 23/1, 2017 at 9:25 Comment(1)
Thanks Sam! If you think of it, it's actually fairly obvious ... tried and works like a charm.Syrup
H
3

You could pass specific filenames to calc (or, e.g., reclassify), and have your function return those filenames as a vector to be read into a stack:

ff <- parSapply(cl, list.x, function(x) { 
  calc(x, function(x) x*10, filename=f <- tempfile(fileext='.tif'))
  f
})

s <- stack(ff)

But also look at ?clusterR- I suspect it will work with reclassify. From the docs:

This function only works with functions that have a Raster* object as first argument and that operate on a cell by cell basis (i.e., there is no effect of neigboring cells) and return an object with the same number of cells as the input raster object. The first argument of the function called must be a Raster* object. There can only be one Raster* object argument. For example, it works with calc and it also works with overlay as long as you provide a single RasterStack or RasterBrick as the first argument.

Hirza answered 1/3, 2016 at 16:38 Comment(3)
Still, when I'm trying to access the data after the cluster is closed, R is unable to find the tempfile and returns this error message: > plot(s[[1]]) Error in .local(.Object, ...) : C:\Users\******\AppData\Local\Temp\Rtmpsh1u3n\file1e482e517fd9.tif' does not exist in the file system, and is not recognised as a supported dataset name.Syrup
Strange - it works for me. Maybe try saving files to a persistent path then.Hirza
Thanks. I will have another look at ClusterR though ... I might have moved on from this too fast. Also I found this which has a cluster function that looks promising.Syrup
V
1

I had this exact problem while running the rasterize fucntion inside a cluster in R.

All tests worked perfectly but when I upscaled to very large and fine resolution rasters, I repeatedly got errors regarding temp files that I couldn't even find on my computer. The list object, which I needed to merge and write as 1 raster, was in R but I could do nothing with it.

After watching the temp file directory whilst the cluster was running I noticed that closing the cluster will auto-delete all temp files created, so I had to perform the merge and writeRaster functions inside the cluster, otherwise it would fail on a very similar error to yours.

Vasileior answered 23/1, 2017 at 9:25 Comment(1)
Thanks Sam! If you think of it, it's actually fairly obvious ... tried and works like a charm.Syrup

© 2022 - 2024 — McMap. All rights reserved.