run a for loop in parallel in R
Asked Answered
D

1

72

I have a for loop that is something like this:

for (i=1:150000) {
   tempMatrix = {}
   tempMatrix = functionThatDoesSomething() #calling a function
   finalMatrix =  cbind(finalMatrix, tempMatrix)

}

Could you tell me how to make this parallel ?

I tried this based on an example online, but am not sure if the syntax is correct. It also didn't increase the speed much.

finalMatrix = foreach(i=1:150000, .combine=cbind) %dopar%  {
   tempMatrix = {}
   tempMatrix = functionThatDoesSomething() #calling a function

   cbind(finalMatrix, tempMatrix)

}
Draft answered 12/7, 2016 at 0:20 Comment(3)
Running things in parallel requires quite a bit of overhead. You will only get a substantial speed up if functionThatDoesSomething takes enough time for the overhead to be worth it.Thigmotaxis
I think there's also a lot more work that you need to do before this post is qualified. Look up parallel and doParallel packages, for instance...Gymnastics
You shouldn't need this -- cbind(finalMatrix, tempMatrix) -- if you are using the .combine argument, just return the function output.Jobber
D
114

Thanks for your feedback. I did look up parallel after I posted this question.

Finally after a few tries, I got it running. I have added the code below in case it is useful to others

library(foreach)
library(doParallel)

#setup parallel backend to use many processors
cores=detectCores()
cl <- makeCluster(cores[1]-1) #not to overload your computer
registerDoParallel(cl)

finalMatrix <- foreach(i=1:150000, .combine=cbind) %dopar% {
   tempMatrix = functionThatDoesSomething() #calling a function
   #do other things if you want

   tempMatrix #Equivalent to finalMatrix = cbind(finalMatrix, tempMatrix)
}
#stop cluster
stopCluster(cl)

Note - I must add a note that if the user allocates too many processes, then user may get this error: Error in serialize(data, node$con) : error writing to connection

Note - If .combine in the foreach statement is rbind , then the final object returned would have been created by appending output of each loop row-wise.

Hope this is useful for folks trying out parallel processing in R for the first time like me.

References: http://www.r-bloggers.com/parallel-r-loops-for-windows-and-linux/ https://beckmw.wordpress.com/2014/01/21/a-brief-foray-into-parallel-processing-with-r/

Draft answered 12/7, 2016 at 17:43 Comment(5)
Can I return multiple different objects from parallel loop. For example I want to return dataframe and vector/list?Daughter
@user1700890, this might reply your question #19792109Readily
Note that if your code within the %dopar% loop contains any functions from outside packages, you'll have to put the library() call inside the loop.Swiger
@Swiger This can also be accomplished with the .packages argument to foreach, e.g. foreach(i=1:150000, .combine=cbind, .packages=c("dplyr", "tidyr")) ...Rye
For optimization purposes, be sure to try compare detectCores() VS detectCores(logical = FALSE). If you utilize all your processors for 97% or more, most of the time, your program will most like run faster with detectCores(logical = FALSE) otherwise use detectCores()Dasheen

© 2022 - 2024 — McMap. All rights reserved.