I have a list of fairly large objects that I want to apply a complicated function to in parallel, but my current method uses too much memory. I thought Reference Classes might help, but using mcapply
to modify them doesn't seem to work.
The function modifies the object itself, so I overwrite the original object with the new one. Since the object is a list and I'm only modifying a small part of it, I was hoping that R's copy-on-modify semantics would avoid having multiple copies made; however, in running it, it doesn't seem to be the case for what I'm doing. Here's a small example of the base R methods I have been using. It correctly resets the balance to zero.
## make a list of accounts, each with a balance
## and a function to reset the balance
foo <- lapply(1:5, function(x) list(balance=x))
reset1 <- function(x) {x$balance <- 0; x}
foo[[4]]$balance
## 4 ## BEFORE reset
foo <- mclapply(foo, reset1)
foo[[4]]$balance
## 0 ## AFTER reset
It seems that using Reference Classes might help as they are mutable, and when using lapply
it does do as I expect; the balance is reset to zero.
Account <- setRefClass("Account", fields=list(balance="numeric"),
methods=list(reset=function() {balance <<- 0}))
foo <- lapply(1:5, function(x) Account$new(balance=x))
foo[[4]]$balance
## 4
invisible(lapply(foo, function(x) x$reset()))
foo[[4]]$balance
## 0
But when I use mclapply
, it doesn't properly reset. Note that if you're on Windows or have mc.cores=1
, lapply
will be called instead.
foo <- lapply(1:5, function(x) Account$new(balance=x))
foo[[4]]$balance
## 4
invisible(mclapply(foo, function(x) x$reset()))
foo[[4]]$balance
## 4
What's going on? How can I work with Reference Classes in parallel? Is there a better way altogether to avoid unnecessary copying of objects?
mclapply
( I get 0 balance) . Do you I need to init core numbers before? – Alpenglowlapply
is simply called instead. Updated question to reflect. – Little