How to manage memory in agent-based modeling with R

Asked 31/5, 2019 at 22:33 Answered 10/6, 2019 at 12:57

I am building an agent-based model with R but I have memory issues by trying to use large objects. In particular, 8 3D arrays are created at initialization and at each time step each 3D array is filled by different functions.

For the moment, the ABM runs over 1825 days and 2500 individuals are simulated to move across the landscape. There are 1000 cells in the landscapes. With this configuration, I don't have memory issues.

At initialization,

1 3D array is like:

h <- array(NA, dim=c(1825, 48, 2500),
           dimnames=list(NULL, NULL, as.character(seq(1, 2500, 1))))
           ## 3th dimension = individual ID

1 3D array is like:

p <- array(NA, dim=c(1825, 38, 1000),
           dimnames=list(NULL, NULL, as.character(seq(1, 1000, 1))))
           ## 3th dimension = cell ID

6 3D arrays are like:

t <- array(NA, dim=c(1825, 41, 2500),
           dimnames=list(NULL, NULL, as.character(seq(1, 2500, 1))))
           ## 3th dimension = individual ID

The arrays contain character/string data types.

Ideally, I would like to increase the number of individuals and/or number of patches, but this is impossible due to memory issues. It seems that there are some tools available like bigmemory, gc to manage memory. Are these tools efficient? I’m a beginner in programming and I don’t have experience in managing memory and high performance computing. Any advice is greatly appreciated, thanks for your time.

sessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

Hardman answered 31/5, 2019 at 22:33 Comment(3)

What are the second dimensions of the arrays? Properties of the individuals/cells? And do you need to keep all of the time points in memory, or can you save the status after each step to file and keep only the current status in memory? (That would reduce memory requirements by a factor of 2/1825) – Laky 4/6, 2019 at 8:8

Its quite difficult to tell, without seeing what exactly happens in those functions. Maybe its possible to reduce the dimensions, because just the initialization arrays take around 4GB of memory just to fill them with NAs. I'm not sure if the bigmemory package can handle multi dimensional arrays, but i would also consider the ff package. – Bourse 4/6, 2019 at 9:40

@Jan van der Laan Thank you very much for your answer. Yes, the second dimension corresponds to properties of individuals. I can't keep only the current status in memory because I need array values at t - 1 and 1 - tf - 1 where tf is a duration parameter. – Hardman 17/6, 2019 at 19:21

From my understanding bigmemory just works on matrices and not multi-dimensional arrays, but you could save a multidimensional array as a list of matrices.

gc is just the garbace collector and you don't really have to call it, since it will be called automatically, but the manual also states:

It can be useful to call gc after a large object has been removed, as this may prompt R to return memory to the operating system.

I think the most useful package for you're task would be ff. Here's a short example to illustrate the strength of the package ff, which stores data on disk and almost doesn't affect memory.

Initialization arrays with base-R:

p <- array(NA, dim=c(1825, 38, 1000),
           dimnames=list(NULL, NULL, as.character(seq(1, 1000, 1))))

format(object.size(p), units="Mb")

"264.6 Mb"

So in total, your initial arrays would take almost up to 5GB memory already, which will get you in trouble with heavy computation.

Initialization arrays with ff:

library(ff)
myArr <- ff(NA, dim=c(1825, 38, 1000), 
            dimnames=list(NULL, NULL, as.character(seq(1, 1000, 1))),
            filename="arr.ffd", vmode="logical", overwrite = T)

format(object.size(myArr), units="Mb")

[1] "0.1 Mb"

Test for equality:

euqals <- list()
for (i in 1:dim(p)[1]) {
  euqals[[i]] <-  all.equal(p[i,,],
                            myArr[i,,])
}
all(unlist(euqals))

[1] TRUE

Bourse answered 4/6, 2019 at 10:1 Comment(0)

Is there any reason why you have to stick to array data type?
If there are many NAs present in your arrays then it means you are using more memory than you really need. This is the downside of arrays in R. If the operations you are performing do not necessarily require your data to be arrays then you should save some memory by remodelling it as data.frame.

Below example shows how you're data.frame could look like after transforming from array. Note that I had to explicitly use na.rm=FALSE, otherwise the result would be 0 rows data.

devtools::install_github("Rdatatable/[email protected]")
library(data.table)

p <- array(NA, dim=c(1825, 38, 1000),
           dimnames=list(NULL, NULL, as.character(seq(1, 1000, 1))))
as.data.table(p, na.rm=FALSE)
#             V1    V2     V3  value
#          <int> <int> <char> <lgcl>
#       1:     1     1      1     NA
#       2:     1     1     10     NA
#       3:     1     1    100     NA
#       4:     1     1   1000     NA
#       5:     1     1    101     NA

The alternative way is to use data.cube package. It will basically do what I wrote above for you behind the scene. You still have array's [ operator, but data.cube objects won't work with R functions that expects array on input, as they will coerce data.cube to array loosing all memory benefits. Memory benefits can be significant, example in data.cube vignette:

array: 34.13 GB
data.cube: 0.01 GB

Ghana answered 10/6, 2019 at 12:57 Comment(3)

Wouldn't it be even better by remodelling to matrices instead of data.frames? The size is almost the same, but calculation is often much faster. Or otherwise going straight to data.table? – Bourse 10/6, 2019 at 16:41

@Bourse straight to data.table :) – Ghana 11/6, 2019 at 3:5

Thank you for your answer. I use an array to save data for each individual (corresponding to the 3th dimension). However, I convert the array to data.frame to apply functions at each time step (1st dimension) and thus to fill the array. I'm trying the data.cube package but I have problems to install it:

Warning in install.packages :   unable to access index for repository https://jangorecki.gitlab.io/data.cube/src/contrib:   cannot open URL 'https://jangorecki.gitlab.io/data.cube/src/contrib/PACKAGES' .3)

– Hardman 17/6, 2019 at 19:15

Recommended topics

Hot tags