any use case for mpirun on slurm-managed cluster?
Asked Answered
B

1

7

I was recently looking at this post about mpirun vs mpiexec and this post about srun vs sbatch, but I am wondering how mpirun relates to slurm and srun.

Usually in examples I see, files that get sent to sbatch have srun <program> in them to execute an MPI program, but I sometimes see ones that use mpirun or mpiexec instead. However, I don't understand why one would do this. As exemplified in another question I recently asked, it seems using mpirun or mpiexec potentially produces all sorts of (implementation-dependent?) errors and there is no reason not to use srun.

Is this accurate, or is there a good reason why you would want to use mpirun or mpiexec instead of srun in executing programs on a slurm-managed cluster?

Busily answered 12/7, 2018 at 7:54 Comment(3)
mpirun start proxy on each node, and then start the MPI tasks. On the other hand (e.g. the MPI tasks are not directly known by the resource manager). srun directly start the MPI tasks, but that requires some support (PMI or PMIx) from SLURM.Milagro
@GillesGouaillardet So you would be using mpirun to get parallelization just across the (perhaps 16) cores of one node? Isn't the idea of srun to be able to create multiple processes across nodes and within them?Busily
from a high level point of view, the result is the same. srun (aka direct launch) uses the resource manager for the "wire up" (e.g. initialization of the parallel job), whereas mpirun uses its own proxy.Milagro
D
7

This question is very highly dependent on the flavor of MPI you are using and its integration with SLURM.

For myself, and I fully appreciate that this is a matter of personal preference, I'd say that, having to jungle with multitude of different clusters and environments, I try to reduce as much as possible the span of variability. So if SLURM is available on the cluster I run,I will try to make all the run-time adjustments for my code via SLURM and sbatch, and let MPI inherit them.

For that, I will define what I want and how I want my MPI code to be submitted from my #SBATCH submission parameters: number of nodes, number of cores per process, number of processes per node, etc. Then , the MPI launch will hopefully be as simple as possible via the mpirun, mpiexec or alike command the MPI library gives. For example, most (if not all) recent MPI libraries can directly detect that the job has been submitted within SLURM and inherit SLURM's process placement without any added effort. Usually, for Intel MPI for example, I do use mpirun -bootstrap slurm <mycode> and all processes are placed as expected. Indeed, this -bootstrap slurm option might not even be necessary, but I keep it just in case.

Conversely, using srun instead on the library's mpirun or mpiexec will require that the MPI code has been linked with SLURM's process management library. This may or may not be the case, so that may or may not do what you want it to. But more importantly, even if it does work, it won't give you any extra advantage compared to just using the MPI default launcher, since the process management will already have been done by SLURM at job's submission via sbatch. So for me, except for rare cases of quick and dirty tests, whenever SLURM is used for batch scheduling, srun isn't to be used, but rather the MPI's mpirun or mpiexec default command.

Duhamel answered 12/7, 2018 at 8:32 Comment(3)
So if I am just logging onto a cluster managed with SLURM and doing for instance module load intel-mpi mpi4py, then is this a case where using srun makes sense?Busily
Also, you say that "most (if not all) recent MPI libraries can directly detect that the job has been submitted within SLURM and inherit SLURM's process placement without any added effort"... is the question I linked with errors using Intel MPI with SLURM an exception, or was I just using something incorrectly?Busily
I also don't see why using srun would require that the MPI code has been linked with SLURM's library but having mpirun be aware of and inherit from SLURM's process placement automatically would not.Busily

© 2022 - 2024 — McMap. All rights reserved.