Initializing MPI cluster with snowfall R
Asked Answered
P

1

7

I've been trying to run Rmpi and snowfall on my university's clusters but for some reason no matter how many compute nodes I get allocated, my snowfall initialization keeps running on only one node.

Here's how I'm initializing it:

sfInit(parallel=TRUE, cpus=10, type="MPI")

Any ideas? I'll provide clarification as needed.

Prehensile answered 27/7, 2013 at 16:2 Comment(3)
What kind of MPI is your cluster running?Vase
openMPI, no error message as far as I know (things run smoothly, but the sysadmin tells me it's running it all on one cluster even though I have 5 allocated). Also I just realized I meant one node not one cluster.Prehensile
any ideas on what could be causing it? Do I have to use socketHosts perhaps?Prehensile
T
7

To run an Rmpi-based program on a cluster, you need to request multiple nodes using your batch queueing system, and then execute your R script from the job script via a utility such as mpirun/mpiexec. Ideally, the mpirun utility has been built to automatically detect what nodes have been allocated by the batch queueing system, otherwise you will need to use an mpirun argument such as --hostfile to tell it what nodes to use.

In your case, it sounds like you requested multiple nodes, so the problem is probably with the way that the R script is executed. Some people don't realize that they need to use mpirun/mpiexec, and the result is that your script runs on a single node. If you are using mpirun, it may be that your installation of Open MPI wasn't built with support for your batch queueing system. In that case, you would have to create an appropriate hostfile from information supplied by your batch queueing system which is usually supplied via an environment variable and/or a file.

Here is a typical mpirun command that I use to execute my parallel R scripts from the job script:

mpirun -np 1 R --slave -f par.R

Since we build Open MPI with support for Torque, I don't use the --hostfile option: mpirun figures out what nodes to use from the PBS_NODEFILE environment variable automatically. The use of -np 1 may seem strange, but is needed if your program is going to spawn workers, which is typically done when using the snow package. I've never used snowfall, but after looking over the source code, it appears to me that sfInit always calls makeMPIcluster with a "count" argument which will cause snow to spawn workers, so I think that -np 1 is required for MPI clusters with snowfall. Otherwise, mpirun will start your R script on multiple nodes, and each one will spawn 10 workers on their own node which is not what you want. The trick is to set the sfInit "cpus" argument to a value that is consistent with the number of nodes allocated to your job by the batch queueing system. You may find the Rmpi mpi.universe.size function useful for that.

If you think that all of this is done correctly, the problem may be with the way that the MPI cluster object is being created in your R script, but I suspect that it has to do with the use (or lack of use) of mpirun.

Tigerish answered 28/7, 2013 at 3:42 Comment(4)
Thanks, I think the problem is most likely that I did not use mpirun (was not aware of this). So the documentation for using mpirun is as follows from the instructions for the cluster: # Invoke mpirun. # Note: $NSLOTS is set by OGS to N, the number of processors # requested by the "-pe mpi_M_tasks_per_node N" option. # If M=4, possible N are: 4, 8, 12, 16, . . . # (see Table 2 in Runningjobs page) mpirun -np $NSLOTS mpi_program arg1 arg2 .. If I want to have snowfall do all the work for me I should just do -np 1?Prehensile
@Prehensile As mentioned in my updated answer, I think you do need to use '-np 1' with snowfall. Rmpi-based packages are rather unusual in using MPI spawning, so the usual advice on how to set -np is wrong.Tigerish
So if I'm allocated 16 compute nodes and I want to utilize all 16 cores on each of these nodes (which the system gives me access to) if I let cpus <- mpi.universe.size() it'll run it on all those compute nodes and also fully utilize them?Prehensile
@Prehensile If you've requested and been allocated 16 nodes each with 16 cores, then I expect that mpi.universe.size will return a value of 256. I also expect that setting "cpus" to 256 will result in 16 workers being spawned on each of those 16 nodes. But that's a guess since I'm not very familiar with OGS and it depends on having OGS support built into your installation of Open MPI.Tigerish

© 2022 - 2024 — McMap. All rights reserved.