Converting nested for-loop to parallel in R
Asked Answered
C

1

6

Below you can find a piece of code in R which I would like to convert to run as a parallel process using several CPU's. I tried using foreach package, but didn't go far.. I could not find a good example how to make it work given the fact I have 3-level nested loop. Help would be extremely appreciated. Example of code below - I made a simple function so it can serve as an example:

celnum <- c(10,20,30)
t2 <- c(1,2,3)
allrepeat <- 10

samplefunction <- function(celnum,t2){

        x <- rnorm(100,celnum,t2)
        y = sample(x, 1)
        z = sample(x,1)

        result = y+z


        result 
}

Getting results conventional way:

z_grid <- matrix(, nrow = length(celnum), ncol = length(t2))

repetitions <- matrix(, nrow = allrepeat, ncol = 1)



set.seed=20
for(i in 1:length(celnum)){
        for (j in 1:length(t2)){
                for (k in 1:allrepeat) {
                        results <- samplefunction(celnum[i],t2[j]) 
                                repetitions[k] <- results
                                z_grid[i,j] <- mean(repetitions,na.rm=TRUE) 
                }  
        }
}

z_grid

Now trying to do the same using foreach:

set.seed=20

library(foreach)
library(doSNOW)

cl <- makeCluster(3, type = "SOCK")
registerDoSNOW(cl)

set.seed=20
output <- foreach(i=1:length(celnum),.combine='cbind' ) %:% 
        foreach (j=1:length(t2), .combine='c') %:%   
                foreach (k = 1:allrepeat) %do% {
                        mean(samplefunction(celnum[i],t2[j]) )
}  
output

This doesn't work as I would like it to, as it is returning a matrix of 30x2 dimensions instead 3x3. My intention is to simulate the scenario for i and j combinations k times, and would like to get an average of these k simulations for each combination of i and j.

Caudill answered 25/7, 2016 at 13:7 Comment(2)
perhaps you have a look at the documentationDeus
@Deus : thanks this is very good. I can run a simple loop with foreach. I do however have a difficult time converting this piece of code which gives me an output of a list of four matrices as a result, and where there is a 3-level nested loop to run the function. I am still on a steep learning curve with RCaudill
D
6

EDIT:

The nested for loops should look like this. Note that there is only one foreach and two for loops nested.

library(foreach)
library(doSNOW)

cl <- makeCluster(3, type = "SOCK")
registerDoSNOW(cl)

set.seed(20)
output <- foreach(k=1:allrepeat) %dopar% {
  df <- data.frame()
  for (i in 1:length(t2)) {
    for (j in 1:length(celnum)) {
      df[i,j] <- mean(samplefunction(celnum[i],t2[j]))
    }  
  }
  df
}

The Result output as well is a list. To calculate the cell means this post helped a lot.

library(plyr)
aaply(laply(output, as.matrix), c(2,3), mean)

#   X2
# X1       V1       V2       V3
#  1 20.30548 21.38818 18.49324
#  2 40.09506 40.64564 40.34847
#  3 60.10946 59.68913 58.66209

btw: you should ...

stopCluster(cl)

... afterwards.


Original Post:

At first, you have to determine which of the for loops you want to replace with the foreach loop.

Basically this decision is mostly influenced by the results of the loop and thus, how this results can be combined. Since you outsource single processes to the individual processors of your PC, only the last element will be returned. These results will be combined as stated in the .combine parameter (e.g. 'c', 'cbind', etc.). Since you are trying to generate two lists this might be probably not very easy for a first start. Thus, I would like to propose an example which outlines the functionality of the foreach loop nested within other for loops.

library(foreach)
library(doSNOW)

dat1 <- c(15.2, 12.58, 4.25, 1.05, 6.78, 9.22, 11.20)
dat2 <- data.frame(matrix(1:15, ncol = 3))


cl <- makeCluster(3, type = "SOCK")
registerDoSNOW(cl)

for (i in 1:nrow(dat2)) {
  FEresult <- foreach(j = 1:ncol(dat2), .combine = c, .inorder = TRUE) %dopar% {
    tmp <- dat1 * dat2[i, j]
    data.frame(tmp)
  }
  FEresult
  if (i == 1) {
    res <- FEresult
  } else {
    res <- rbind(res, FEresult)
  }
}

res

You will notice, that the result of this loop is a list.

Deus answered 25/7, 2016 at 16:10 Comment(5)
Thanks. I decided to simplify the function slightly (see edited post). Do you mind having a look and see what final tweak will help it work?Caudill
when I copy the loop, it works. perhaps you edit the error message into your question, so that I can have a look at itDeus
There is no error message but the results are not as desired. I would like to get the same results in output as in z_grid. The former is a [3,3] matrix of [i,j] where k simulations of this matrix are summed up as a mean in each matrix entry. Does it makes sense? In other words, I simulate the scenario for i and j combinations k times, and would like to get a mean of these k simulations for each combination of i and j. The current output with foreach gives me 30x3 output dimension and I am not even sure exactly how they are ordered.Caudill
thanks, works great on this example. however, when i am replacing this function with my more complicated function (fully working, all libraries installed, works with previous loops i had etc) somehow, with %dopar% I get an error message: Error in { : task 1 failed - "could not find function "SpatialPoints"" . When I do %do% there is no problem and the statement is executed. Ever encountered such problem?Caudill
you have to add the parameter foreach (..blabla.., .packages = c("sp", "rgdal") or whatever packages you want to use during the foreach loop. Thus, the packages are loaded in the individual processes.Deus

© 2022 - 2024 — McMap. All rights reserved.