How can I find the pixel-wise standard deviation?

F

2

5

I have 20 rasters with same resolution and extent. It's a time series and each raster is for one year.

And I want to calculate the pixel-wise standard deviation of all rasters.So far, I am using the raster package.

qq2<-list(maxras1,maxras2,maxras3,maxras4,maxras5,maxras6,maxras7,maxras8,maxras9,maxras10)
qq2stack<-stack(qq2)
qq2mean<-mean(qq2stack)
qq2sd<-sd(qq2stack)

The mean works. But standard deviation is giving me this error:

Error in as.double(x) : 
  cannot coerce type 'S4' to vector of type 'double'

Franconia answered 13/11, 2015 at 19:52 Comment(3)

Pixel-wise as in you want the SD of the vector of values from the time series for each pixel? If so, take a look at calc(qq2stack, fun=sd) – Eran 13/11, 2015 at 20:23

Dr. Stevens: Yes, I want to calculate for every pixel. Basically each pixel has 10 values in 10 rasters and I want to calculate the SD for each pixel. – Franconia 13/11, 2015 at 20:24

Thank you. This works. Though it's rather slow. – Franconia 13/11, 2015 at 20:59

E

6

Unfortunately, as you note after my comment above, per-pixel analyses can be slow. I think the next thing for you to try is to parallelize the process. Assuming you have a multi-core processor you can take advantage of calc() and its built-in multi-process optimization:

cores <- 4
beginCluster(cores, type='SOCK')
calc(qq2stack, fun=sd)
endCluster()

This would result in a significant speed-up if your operating/hardware environment supports it. Obviously, you can increase the number of processes as well depending on your architecture.

Eran answered 13/11, 2015 at 23:39 Comment(0)

C

5

To address the slow performance of raster::calc for calculating the standard deviation of large rasters, I wrote a gist (here) that does it using GDAL's gdal_calc.py.

To calculate the sd of 5 rasters with 4 million cells (2000 x 2000), gdal_calc takes about 2 seconds, compared to about 60 seconds for raster::calc (see example below).

You'll need gdal_calc.py on the system path so that it can be found by Sys.which, or else modify gdal_calc <- Sys.which('gdal_calc.py') to specify the full path to gdal_calc.py.

Note that, from memory, this function supports a maximum of 26 input rasters, since gdal_calc refers to each by a letter of the alphabet. Perhaps this constraint can be circumvented - I'm not familiar enough with gdalnumeric syntax to work it out.

E.g.:

First, let's create a stack of five random, 2000 x 2000 rasters. We'll use raster::calc on this stack, but we'll also write them out to temp files, because gdal_calc needs file paths:

s <- stack(replicate(5, raster(matrix(runif(4000000), 2000))))  
ff <- replicate(5, tempfile(fileext='.tif')) # a vector of temp file paths
writeRaster(s, ff, bylayer=TRUE)

Here's the raster::calc version:

system.time(sd_calc <- calc(s, sd))

##  user  system elapsed 
## 79.83    0.08   80.00

and the gdal_sd (i.e., using gdal_calc.py) version:

devtools::source_gist('61c8062938e05a4c6b92') # source the gdal_sd gist
system.time(gdal_sd(infile=ff, outfile=f <- tempfile(fileext='.tif')))

## Calculating standard deviation and writing to file17b0146640ac.tif
##    user  system elapsed 
##    0.00    0.03    2.05

And a comparison of their values:

range(sd_calc[] - raster(f)[])
## [1] -1.110223e-16  1.110223e-16

Note that raster::calc is likely to be faster for small rasters, but clearly gdal_sd is a huge improvement for large rasters.

Documentation for the standard deviation function used by gdal_calc is given here. I've specified ddof=1 (i.e. delta degrees of freedom) such that results are consistent with those returned by R's sd.

For posterity, here is the source of gdal_sd as it was at the time of posting:

gdal_sd <- function(infile, outfile, quiet=TRUE) {
  require(rgdal)
  # infile:   The multiband raster file (or a vector of paths to multiple raster
  #           files) for which to calculate cell standard deviations.
  # outfile:  Path to raster output file.
  # quiet:    Logical. Should gdal_calc.py output be silenced?
  gdal_calc <- Sys.which('gdal_calc.py')
  if(gdal_calc=='') stop('gdal_calc.py not found on system.')
  if(file.exists(outfile)) stop('outfile already exists.')
  nbands <- sapply(infile, function(x) nrow(attr(GDALinfo(x), 'df')))
  if(length(infile) > 26 || nbands > 26) stop('Maximum number of inputs is 26.')
  if(length(nbands) > 1 & any(nbands > 1)) 
    warning('One or more rasters have multiple bands. First band used.')

  if(length(infile)==1) {
    inputs <- paste0('-', LETTERS[seq_len(nbands)], ' ', infile, ' --', 
           LETTERS[seq_len(nbands)], '_band ', seq_len(nbands), collapse=' ')
    n <- nbands
  } else {
    inputs <- paste0('-', LETTERS[seq_along(nbands)], ' ', infile, ' --', 
           LETTERS[seq_along(nbands)], '_band 1', collapse=' ')
    n <- length(infile)
  }

  message('Calculating standard deviation and writing to ', basename(outfile))
  cmd <- 'python %s %s --outfile=%s --calc="std([%s], 0, ddof=1)"'
  out <- system(
    sprintf(cmd, gdal_calc, inputs, outfile, 
            paste0(LETTERS[seq_len(n)], collapse=',')),
    show.output.on.console=!quiet, intern=TRUE
  )
  if(any(grepl('Error', out))) stop(out, call.=FALSE) else NULL
}

Conoid answered 14/11, 2015 at 23:16 Comment(0)

Recommended topics

Hot tags