Setup torque/moab cluster to use multiple cores per node with a single loop
Asked Answered
M

1

2

This is a followup on [How to set up doSNOW and SOCK cluster with Torque/MOAB scheduler?

I have a memory limited script that only uses 1 foreach loop but I'd like to get 2 iterations running on node1 and 2 iterations running on node2. The above linked question allows you to start a SOCK cluster to each node for the outer loop and then MC cluster for the inner loop and I think doesn't make use of the multiple cores on each node. I get the warning message Warning message: closing unused connection 3 (<-compute-1-30.local:11880)

if I do registerDoMC(2) if I do this after registerDoSNOW(cl) Thanks.

EDIT: The solution from the previous question works fine for the problem asked. see my example below for what I want.

starting an interactive job with 2 nodes and 2cores per processor:

qsub -I -l nodes=2:ppn=2

after starting R:

library(doParallel)
f <- Sys.getenv('PBS_NODEFILE')
nodes <- unique(if (nzchar(f)) readLines(f) else 'localhost')
print(nodes)

here are the two nodes I"m running on:

[1] "compute-3-15" "compute-1-32"

start the sock cluster on these two nodes:

cl <- makePSOCKcluster(nodes, outfile='')

i'm not sure why they both seem to be on compute-3-15 .... ?

starting worker pid=25473 on compute-3-15.local:11708 at 16:54:17.048
starting worker pid=14746 on compute-3-15.local:11708 at 16:54:17.523

but register the two nodes and run a single foreach loop:

registerDoParallel(cl)
r=foreach(i=seq(1,6),.combine='c') %dopar% { Sys.info()[['nodename']]}
print(r)

output of r indicates that both nodes were used though:

 [1] "compute-3-15.local" "compute-1-32.local" "compute-3-15.local"
 [4] "compute-1-32.local" "compute-3-15.local" "compute-3-15.local"

now, what I'd really like is for that foreach loop to run on 4 cores, 2 on each node.

library(doMC)
registerDoMC(4)
r=foreach(i=seq(1,6),.combine='c') %dopar% { Sys.info()[['nodename']]}
print(r)

the output indicates that only 1 node was used, but presumably both cores on that one node.

[1] "compute-3-15.local" "compute-3-15.local" "compute-3-15.local"
[4] "compute-3-15.local" "compute-3-15.local" "compute-3-15.local"

How do I get a SINGLE foreach loop to use multiple cores on multiple nodes?

Marsala answered 21/1, 2015 at 17:20 Comment(4)
Requesting that a job use more than core per node is just modifying the qsub to have -l nodes=1:ppn=X where X is the number of cores you're like to use. I don't know where the job submission comes in for your stack, but if you can place that at the appropriate part of the stack it'll solve your problem.Rimose
I don't think that will do it. I'm requesting -l nodes=4:ppn=8 but when you set up a SOCK cluster to each node, my understanding is that there are 4 instances of R, each on 1 core of each node. I would like to use 8 cores, 2 on each node but registering MC seems to shut down the sock cluster if its the same loop (as opposed to an inner loop).Marsala
You need to make sure that your MPI script (or whatever is launching each process) understands the ppn piece of things. Also, for the record if you want two cores per node, ppn should be equal to 2, not 8. ppn=8 means you want 8 cores per node.Rimose
Unfortunately the system im using only allows singlejob node use if you request the whole node (8cores), hence the ppn=8. So my question is how do i tell r to use 2 cores on each on the 4 nodes in a single foreach instance (the linked question shows it with nested parallelized loops)?Marsala
E
2

In order to use multiple nodes with foreach/doParallel, you specify a vector of hostnames when calling makePSOCKcluster. If you want to use multiple cores on those hosts, you simply specify the hostnames multiple times so that makePSOCKcluster will start multiple workers per host.

Since you're using the Torque resource manager, you could use the following function to generate the node list which can limit the maximum number of workers started on any of the nodes:

getnodelist <- function(maxpernode=100) {
  f <- Sys.getenv('PBS_NODEFILE')
  x <- if (nzchar(f)) readLines(f) else rep('localhost', 3)
  d <- as.data.frame(table(x), stringsAsFactors=FALSE)
  rep(d$x, pmin(d$Freq, maxpernode))
}

Here's an example that uses this function to run no more than two workers on each node that was allocated by Torque:

library(doParallel)
nodelist <- getnodelist(2)
print(nodelist)
cl <- makePSOCKcluster(nodelist, outfile='')
registerDoParallel(cl)
r <- foreach(i=seq_along(nodelist), .combine='c') %dopar% {
  Sys.info()[['nodename']]
}
cat('results:\n')
print(r)

Note that you cannot use the doMC backend to execute tasks on multiple nodes, since doMC uses the mclapply function which can only create workers on the local machine. To use multiple nodes, you have to use a backend such as doParallel, doSNOW, or doMPI.

Etheridge answered 31/1, 2015 at 23:5 Comment(2)
great thanks! Thats what I was looking for. on my system PBS_NODEFILE already has each node listed for each core so simply f<-... and x<-readLines(f) suffice for getnodelist. Is there an effective difference between a 'sock' cluster and doMC? it seems both could work on my personal machine but only a sock cluster works with networked nodes.Marsala
@Marsala The biggest difference is that doMC workers are forked when the foreach loop is executed, while makePSOCKcluster workers can be started via ssh and are persistent across multiple loops. This has a number of consequences, but foreach tries to make them work as consistently as possible.Etheridge

© 2022 - 2024 — McMap. All rights reserved.