Running foreach without returning any value in R
Asked Answered
E

4

8

I have a function doSomething() which runs in a foreach loop and as a result saves some calculations as .csv files. Hence I have no need for a return value of foreach, in fact I don't want a return value because it clutters my memory to the point where I cannot run as many iterations as I would want to.

How can I force foreach to not have a return value, or delete the return values of the iterations?

Here is a minimal example that illustrates my problem:

cl <- parallel::makePSOCKcluster(1)
doParallel::registerDoParallel(cl)

"%dopar%" <- foreach::"%dopar%"

doSomething <- function () {
  a <- as.numeric(1L)
}

foreach::foreach (i = 1:4) %dopar% {

  doSomething()

}

The output is:

[[1]]
[1] 1

[[2]]
[1] 1

[[3]]
[1] 1

[[4]]
[1] 1
Enrollee answered 5/2, 2020 at 8:49 Comment(7)
What is with doSomething(); NULL ?Hohenzollern
This would return a list of NULLsEnrollee
I think your issue is not the return, it is the memory which causes you troubles right?Refrangible
@Refrangible yes, you are right. I ran some code over night on 31 cores and it used up nearly all of my 65GB of memoryEnrollee
Parallel computing in R workes (as far as i experienced) such that for each cluster node the memory will be allocated. That means if you have a big data set which each node needs for calculation, this data will be allocated mulitple times. This yields to high RAM consumption. Since you want to write the output in each loop run and throw away the result afterwards you can may try the rm function and call the garbage collection in each function call. I am not sure if this helps but at leas you can tryRefrangible
Thank you for your suggestion, I will try this. However, I see that the used memory increases somewhat linearly over time, which leads me to believe that the gigantic list created by foreach as a return value is the problem.Enrollee
@Refrangible Indeed using rm() and gc() in every worker yielded the desired result! Thank you for your help, if you want to add your own answer, I would accept it.Enrollee
R
5

Parallel computing in R works (as far as I experienced) such that for each cluster node the memory will be allocated.

That means if you have a big data set which each node needs for calculation, this data will be allocated multiple times. This yields to high RAM consumption. Since you want to write the output in each loop and throw away the result afterwards you can try the rm function and call the garbage collection (for example with gc) in each function call.

This worked for E L M as mention above. Thx for testing!

Refrangible answered 5/2, 2020 at 12:11 Comment(0)
G
1

From ?foreach:

The foreach and %do%/%dopar% operators provide a looping construct that can be viewed as a hybrid of the standard for loop and lapply function. It looks similar to the for loop, and it evaluates an expression, rather than a function (as in lapply), but it's purpose is to return a value (a list, by default), rather than to cause side-effects.

The line

but it's purpose is to return a value (a list, by default)

Says that this is the intended behaviour of foreach. Not sure how you want to proceed from that...

Gleason answered 5/2, 2020 at 9:6 Comment(1)
Maybe there is a way to discard the return values of the iterations and have foreach return an empty list in the end? Or could you think of an alternative in my situation, maybe using a different parallelization tool?Enrollee
O
0

As noted by dario; foreach returns a list. Therefore, what you want to do is to use for loop instead. You can use write.csv function inside the loop to write the results of each iteration inside the csv file.

For parallel computing, try using parSapply function from parallel package:

library(parallel)
cl <- parallel::makePSOCKcluster(1)
doParallel::registerDoParallel(cl)
parSapply(cl, 1:4, function(doSomething) a <- as.numeric(1L))

Edit;

Combining this with Freakozoid's suggestion (set the argument of the rm funciton to a);

library(parallel)
cl <- parallel::makePSOCKcluster(1)
doParallel::registerDoParallel(cl)
parSapply(cl, 1:4, function(doSomething) {a <- as.numeric(1L); write.csv(a, "output.csv"); rm()})

will give you the resulting output as csv file, as well as a list of NAs. Since the list consists of only NAs, it may not take lots of space.

Please let me know the result.

Oligoclase answered 5/2, 2020 at 9:14 Comment(0)
C
0

As other mentioned, if you are only interested in the side-effects of the function, returning NULL at the end will not save any input, saving on RAM.

If on top of that, you want to reduce the visual clutter (avoid having a list of 100 NULL), you could use the .final argument, setting it to something like .final = function(x) NULL.

library(foreach)
doSomething <- function ()  as.numeric(1L)

foreach::foreach(i = 1:4, .final = function(x) NULL) %do% {
  
  doSomething()
}
#> NULL

Created on 2022-05-24 by the reprex package (v2.0.1)

Chaplin answered 24/5, 2022 at 16:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.