G'day, I am working with a large dataset with ~125,000 lon/lat locations with date, for species presence/absence records. For at each location I want to work out what the weather was like at each location on the date and during the 3mths prior to the date. To do this I have downloaded daily weather data for a given weather variable (e.g., max temperature) during the 5yr period the data was taken. I have a total of 1,826 raster files, all between 2-3mb.
I had planned to stack all raster files, then extract a value from every raster (1,826) for each point. This would produce a massive file I could use to search for the dates I need. This is, however, not possible because I can't stack that many rasters. I tried splitting the rasters into stacks of 500, this works, but the files it produces are about 1Gb and very slow (rows, 125,000; columns, 500). Also, when I try to bring all of these files into R to create a big data frame it doesn't work.
I would like to know if there is a way to work with this amount of data in R, or if there is a package that I could use to help. Could I use a package like ff? Does anyone have any suggestions for a less power intensive method to do what I want to do? I have thought about something like a lapply function, but have never used one before and am not really sure where to begin.
Any help would be really great, thanks in advance for your time. The code I am currently using without success is below.
Kind regards, Adam
library(raster)
library(rgdal)
library (maptools)
library(shapefiles)
# To create weather data files, first set the working directory to the appropriate location (i.e., maxt)
# list of raster weather files
files<- list.files(getwd(), pattern='asc')
length(files)
memory.size(4000)
memory.limit(4000)
# read in lon/lat data
X<-read.table(file.choose(), header=TRUE, sep=',')
SP<- SpatialPoints(cbind(X$lon, X$lat))
#separate stacks into mannageable sizes
s1<- stack(files[1:500])
i1 <- extract( s1,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i1, file="maxt_vals_all_points_all_dates_1.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s1,i1)
s2<- stack(files[501:1000])
i2 <- extract( s2,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i2, file="maxt_vals_all_points_all_dates_2.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s2,i2)
s3<- stack(files[1001:1500])
i3 <- extract( s3,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i3, file="maxt_vals_all_points_all_dates_3.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s3,i3)
s4<- stack(files[1501:1826])
i4 <- extract( s4,SP, cellnumbers = True, layer = 1, nl =325)
write.table(i4, file="maxt_vals_all_points_all_dates_4.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s4,i4)
# read files back in to bind into final file !!! NOT WORKING FILES ARE TOO BIG!!
i1<-read.table(file.choose(),header=TRUE,sep=',')
i2<-read.table(file.choose(),header=TRUE,sep=',')
i3<-read.table(file.choose(),header=TRUE,sep=',')
i4<-read.table(file.choose(),header=TRUE,sep=',')
vals<-data.frame(X, i1, i2, i3 ,i4)
write.table(vals, file="maxt_master_lookup.csv", sep=",", row.names= FALSE, col.names= TRUE)
True
defined here, reproducible and tested code please . . – Subaltern