The following (simplified) script works fine on the master node of a unix cluster (4 virtual cores).
library(foreach)
library(doParallel)
nc = detectCores()
cl = makeCluster(nc)
registerDoParallel(cl)
foreach(i = 1:nrow(data_frame_1), .packages = c("package_1","package_2"), .export = c("variable_1","variable_2")) %dopar% {
row_temp = data_frame_1[i,]
function(argument_1 = row_temp, argument_2 = variable_1, argument_3 = variable_2)
}
stopCluster(cl)
I would like to take advantage of the 16 nodes in the cluster (16 * 4
virtual cores in total).
I guess all I need to do is change the parallel backend specified by makeCluster
. But how should I do that? The documentation is not very clear.
Based on this quite old (2013) post http://www.r-bloggers.com/the-wonders-of-foreach/ it seems that I should change the default type (sock
or MPI
- which one- would that work on unix?)
EDIT
From this vignette by the authors of foreach:
By default, doParallel uses multicore functionality on Unix-like systems and snow functionality on Windows. Note that the multicore functionality only runs tasks on a single computer, not a cluster of computers. However, you can use the snow functionality to execute on a cluster, using Unix-like operating systems, Windows, or even a combination.
What does you can use the snow functionality
mean? How should I do that?