This question is very highly dependent on the flavor of MPI you are using and its integration with SLURM.
For myself, and I fully appreciate that this is a matter of personal preference, I'd say that, having to jungle with multitude of different clusters and environments, I try to reduce as much as possible the span of variability. So if SLURM is available on the cluster I run,I will try to make all the run-time adjustments for my code via SLURM and sbatch, and let MPI inherit them.
For that, I will define what I want and how I want my MPI code to be submitted from my #SBATCH
submission parameters: number of nodes, number of cores per process, number of processes per node, etc. Then , the MPI launch will hopefully be as simple as possible via the mpirun, mpiexec or alike command the MPI library gives. For example, most (if not all) recent MPI libraries can directly detect that the job has been submitted within SLURM and inherit SLURM's process placement without any added effort. Usually, for Intel MPI for example, I do use mpirun -bootstrap slurm <mycode>
and all processes are placed as expected. Indeed, this -bootstrap slurm
option might not even be necessary, but I keep it just in case.
Conversely, using srun
instead on the library's mpirun
or mpiexec
will require that the MPI code has been linked with SLURM's process management library. This may or may not be the case, so that may or may not do what you want it to. But more importantly, even if it does work, it won't give you any extra advantage compared to just using the MPI default launcher, since the process management will already have been done by SLURM at job's submission via sbatch
.
So for me, except for rare cases of quick and dirty tests, whenever SLURM is used for batch scheduling, srun
isn't to be used, but rather the MPI's mpirun
or mpiexec
default command.
mpirun
start proxy on each node, and then start the MPI tasks. On the other hand (e.g. the MPI tasks are not directly known by the resource manager).srun
directly start the MPI tasks, but that requires some support (PMI
orPMIx
) fromSLURM
. – Milagrompirun
to get parallelization just across the (perhaps 16) cores of one node? Isn't the idea ofsrun
to be able to create multiple processes across nodes and within them? – Busilysrun
(aka direct launch) uses the resource manager for the "wire up" (e.g. initialization of the parallel job), whereasmpirun
uses its own proxy. – Milagro