why does mpirun behave as it does when used with slurm?
Asked Answered
M

0

1

I am using Intel MPI and have encountered some confusing behavior when using mpirun in conjunction with slurm.

If I run (in a login node)

mpirun -n 2 python -c "from mpi4py import MPI; print(MPI.COMM_WORLD.Get_rank())"

then I get as output the expected 0 and 1 printed out.

If however I salloc --time=30 --nodes=1 and run the same mpirun from the interactive compute node, I get two 0s printed out instead of the expected 0 and 1.

Then, if I change -n 2 to -n 3 (still in compute node), I get a large error from slurm saying srun: error: PMK_KVS_Barrier task count inconsistent (2 != 1) (plus a load of other stuff), but I am not sure how to explain this either...

Now, based on this OpenMPI page, it seems these kind of operations should be supported at least for OpenMPI:

Specifically, you can launch Open MPI's mpirun in an interactive SLURM allocation (via the salloc command) or you can submit a script to SLURM (via the sbatch command), or you can "directly" launch MPI executables via srun.

Maybe the Intel MPI implementation I was using just doesn't have the same support and is not designed to be used directly in a slurm environment (?), but I am still wondering: what is the underlying nature of mpirun and slurm (salloc) that this is the behavior produced? Why would it print two 0s in the first "case," and what are the inconsistent task counts it talks about in the second "case"?

Masculine answered 12/7, 2018 at 7:42 Comment(5)
Actually a whole variety of errors can be produced... changing to --nodes=2 in the salloc and running the mpirun produces a BAD TERMINATION error from Intel MPI, using mpiexec instead of mpirun produces srun: error: PMK_KVS_Barrier duplicate request from task 0, and the list probably goes on. Am I just not understanding how mpirun / slurm should be used?Masculine
note IntelMPI is based on MPICH (and not Open MPI)Bibliology
@GillesGouaillardet I have heard OpenMPI described as an "MPICH compatible library" though, so I would expect the behavior to mostly be similar?Masculine
Open MPI and MPICH both implement the same MPI standard. That only means the same code can be built without any changes with the library of your choice. That being said, there is no binary compatibility and you cannot mix mpirun from one implementation with the library of an other one.Bibliology
@GillesGouaillardet Yeah this makes sense - this is why I said maybe Intel MPI just doesn't have the same support. I am still wondering though how to explain those outputs.Masculine

© 2022 - 2024 — McMap. All rights reserved.