parallel computations on Reference Classes
Asked Answered
L

1

6

I have a list of fairly large objects that I want to apply a complicated function to in parallel, but my current method uses too much memory. I thought Reference Classes might help, but using mcapply to modify them doesn't seem to work.

The function modifies the object itself, so I overwrite the original object with the new one. Since the object is a list and I'm only modifying a small part of it, I was hoping that R's copy-on-modify semantics would avoid having multiple copies made; however, in running it, it doesn't seem to be the case for what I'm doing. Here's a small example of the base R methods I have been using. It correctly resets the balance to zero.

## make a list of accounts, each with a balance
## and a function to reset the balance
foo <- lapply(1:5, function(x) list(balance=x))
reset1 <- function(x) {x$balance <- 0; x}
foo[[4]]$balance
## 4 ## BEFORE reset
foo <- mclapply(foo, reset1)
foo[[4]]$balance
## 0 ## AFTER reset

It seems that using Reference Classes might help as they are mutable, and when using lapply it does do as I expect; the balance is reset to zero.

Account <- setRefClass("Account", fields=list(balance="numeric"),
                       methods=list(reset=function() {balance <<- 0}))

foo <- lapply(1:5, function(x) Account$new(balance=x))
foo[[4]]$balance
## 4
invisible(lapply(foo, function(x) x$reset()))
foo[[4]]$balance
## 0

But when I use mclapply, it doesn't properly reset. Note that if you're on Windows or have mc.cores=1, lapply will be called instead.

foo <- lapply(1:5, function(x) Account$new(balance=x))
foo[[4]]$balance
## 4
invisible(mclapply(foo, function(x) x$reset()))
foo[[4]]$balance
## 4

What's going on? How can I work with Reference Classes in parallel? Is there a better way altogether to avoid unnecessary copying of objects?

Little answered 6/12, 2013 at 18:12 Comment(4)
I can't reproduce your behavior. it resets well for me with mclapply( I get 0 balance) . Do you I need to init core numbers before?Alpenglow
Well that's interesting, @agstudy. I tried it again here and the same things happens. What do you mean by your second sentences? It's not clear to me.Little
@agstudy, I wonder if you're on Windows, where lapply is simply called instead. Updated question to reflect.Little
Similar question: https://mcmap.net/q/935368/-r-and-shared-memory-for-parallel-mclapply/210673Little
L
2

I think the forked processes, while they have access to all the variables in the workspace, must not be able to change them. This works, but I don't know yet if it improves the memory issues or not.

foo <- mclapply(foo, function(x) {x$reset(); x})
foo[[4]]$balance
## 0
Little answered 6/12, 2013 at 20:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.