Saving multiple outputs of foreach dopar loop
Asked Answered
C

3

34

I would like to know if/how it would be possible to return multiple outputs as part of foreach dopar loop.

Let's take a very simplistic example. Let's suppose I would like to do 2 operations as part of the foreach loop, and would like to return or save the results of both operations for each value of i.

For only one output to return, it would be as simple as:

library(foreach)
library(doParallel)
cl <- makeCluster(3)
registerDoParallel(cl)

oper1 <- foreach(i=1:100000) %dopar% {
    i+2
}

oper1 would be a list with 100000 elements, each element is the result of the operation i+2 for each value of i.

Suppose now I would like to return or save the results of two different operations separately, e.g. i+2 and i+3. I tried the following:

oper1 = list()
oper2 <- foreach(i=1:100000) %dopar% {
    oper1[[i]] = i+2
    return(i+3)
}

hoping that the results of i+2 will be saved in the list oper1, and that the results of the second operation i+3 will be returned by foreach. However, nothing gets populated in the list oper1! In this case, only the result of i+3 gets returned from the loop.

Is there any way of returning or saving both outputs in two separate lists?

Clause answered 5/11, 2013 at 14:42 Comment(3)
Why not simply return(c(i+2,i+3))? If you really need them in separate lists, you can do that after foreach returns.Hypoblast
This is a very simplistic example. In my practical(real) example, the results of the two operations are of different structures (matrix and vector) or (list and scalar). This won't work then..Clause
That would have been useful to mention in your question... in that case, use return(list(i+2,i+3)).Hypoblast
K
48

Don't try to use side-effects with foreach or any other parallel program package. Instead, return all of the values from the body of the foreach loop in a list. If you want your final result to be a list of two lists rather than a list of 100,000 lists, then specify a combine function that transposes the results:

comb <- function(x, ...) {
  lapply(seq_along(x),
    function(i) c(x[[i]], lapply(list(...), function(y) y[[i]])))
}

oper <- foreach(i=1:10, .combine='comb', .multicombine=TRUE,
                .init=list(list(), list())) %dopar% {
  list(i+2, i+3)
}

oper1 <- oper[[1]]
oper2 <- oper[[2]]

Note that this combine function requires the use of the .init argument to set the value of x for the first invocation of the combine function.

Kiddy answered 5/11, 2013 at 23:24 Comment(2)
Huge +1 on this one. Took me a long time to figure out the .init=list() piece. Doesn't work so well without that!Cognizance
The answer does not explain how it would function in general in scenarios of data.frames, matrices or vectorsHaleyhalf
A
12

I prefer to use a class to hold multiple results for a %dopar% loop.

This example spins up 3 cores, calculates multiple results on each core, then returns the list of results to the calling thread.

Tested under RStudio, Windows 10, and R v3.3.2.

library(foreach)
library(doParallel)

# Create class which holds multiple results for each loop iteration.
# Each loop iteration populates two properties: $result1 and $result2.
# For a great tutorial on S3 classes, see: 
# http://www.cyclismo.org/tutorial/R/s3Classes.html#creating-an-s3-class
multiResultClass <- function(result1=NULL,result2=NULL)
{
  me <- list(
    result1 = result1,
    result2 = result2
  )

  ## Set the name for the class
  class(me) <- append(class(me),"multiResultClass")
  return(me)
}

cl <- makeCluster(3)
registerDoParallel(cl)
oper <- foreach(i=1:10) %dopar% {
   result <- multiResultClass()
   result$result1 <- i+1
   result$result2 <- i+2
   return(result)
}
stopCluster(cl)

oper1 <- oper[[1]]$result1
oper2 <- oper[[1]]$result2
Aboveground answered 21/5, 2017 at 20:59 Comment(0)
A
3

This toy example shows how to return multiple results from a %dopar% loop.

This example:

  • Spins up 3 cores.
  • Renders a graph on each core.
  • Returns the graph and an attached message.
  • Prints the graphs and it's attached message out.

I found this really useful to speed up using Rmarkdown to print 1,800 graphs into a PDF document.

Tested under Windows 10, RStudio, and R v3.3.2.

R code:

# Demo of returning multiple results from a %dopar% loop.
library(foreach)
library(doParallel)
library(ggplot2)

cl <- makeCluster(3)
registerDoParallel(cl)

# Create class which holds multiple results for each loop iteration.
# Each loop iteration populates two properties: $resultPlot and $resultMessage.
# For a great tutorial on S3 classes, see: 
# http://www.cyclismo.org/tutorial/R/s3Classes.html#creating-an-s3-class
plotAndMessage <- function(resultPlot=NULL,resultMessage="?")
{
  me <- list(
    resultPlot = resultPlot,
    resultMessage = resultMessage
  )

  # Set the name for the class
  class(me) <- append(class(me),"plotAndMessage")
  return(me)
}

oper <- foreach(i=1:5, .packages=c("ggplot2")) %dopar% {

  x <- c(i:(i+2))
  y <- c(i:(i+2))
  df <- data.frame(x,y)
  p <- ggplot(df, aes(x,y))
  p <- p + geom_point()

  message <- paste("Hello, world! i=",i,"\n",sep="")

  result <- plotAndMessage()
  result$resultPlot <- p
  result$resultMessage <- message
  return(result)
}

# Print resultant plots and messages. Despite running on multiple cores,
# 'foreach' guarantees that the plots arrive back in the original order.
foreach(i=1:5) %do% {
  # Print message attached to plot.
  cat(oper[[i]]$resultMessage)
  # Print plot.
  print(oper[[i]]$resultPlot)
}

stopCluster(cl)
Aboveground answered 21/5, 2017 at 20:52 Comment(1)
This solution worked really well for me. I did change the last foreach loop to a regular for loop because it wasn't playing nice with my R Notebook.Botts

© 2022 - 2024 — McMap. All rights reserved.