Python multiprocessing within mpi
Asked Answered
F

1

21

I have a python script that I've written using the multiprocessing module, for faster execution. The calculation is embarrassingly parallel, so the efficiency scales with the number of processors. Now, I'd like to use this within an MPI program, which manages an MCMC calculation across multiple computers. This code has a call to system() which invokes the python script. However, I'm finding that when it is called this way, the efficiency gain from using python multiprocessing vanishes.

How can I get my python script to retain the speed gains from multiprocessing when called from MPI?

Here is a simple example, which is analogous to the much more complicated codes I want to use but displays the same general behavior. I write an executable python script called junk.py.

#!/usr/bin/python
import multiprocessing
import numpy as np

nproc = 3
nlen = 100000


def f(x):
    print x
    v = np.arange(nlen)
    result = 0.
    for i, y in enumerate(v):
        result += (x+v[i:]).sum()
    return result


def foo():
    pool = multiprocessing.Pool(processes=nproc)
    xlist = range(2,2+nproc)
    print xlist
    result = pool.map(f, xlist)
    print result

if __name__ == '__main__':
    foo()

When I run this from the shell by itself, using "top" I can see three python processes each taking 100% of cpu on my 16-core machine.

node094:mpi[ 206 ] /usr/bin/time junk.py
[2, 3, 4]
2
3
4
[333343333400000.0, 333348333450000.0, 333353333500000.0]
62.68user 0.04system 0:21.11elapsed 297%CPU (0avgtext+0avgdata 16516maxresident)k
0inputs+0outputs (0major+11092minor)pagefaults 0swaps

However, if I invoke this with mpirun, each python process takes 33% of cpu, and overall it takes about three times as long to run. Calling with -np 2 or more results in more processes, but doesn't speed up the computation any.

node094:mpi[ 208 ] /usr/bin/time mpirun -np 1 junk.py
[2, 3, 4]
2
3
4
[333343333400000.0, 333348333450000.0, 333353333500000.0]
61.63user 0.07system 1:01.91elapsed 99%CPU (0avgtext+0avgdata 16520maxresident)k
0inputs+8outputs (0major+13715minor)pagefaults 0swaps

(Additional notes: This is mpirun 1.8.1, python 2.7.3 on Linux Debian version wheezy. I have heard system() is not always allowed within MPI programs, but it's been working for me for the last five years on this computer. For example I have called a pthread-based parallel code from system() within an MPI program, and it's used 100% of cpu for each thread, as desired. Also, in case you were going to suggest running the python script in serial and just calling it on more nodes...the MCMC calculation involves a fixed number of chains which need to move in a synchronized way, so the computation unfortunately can't be reorganized that way.)

Flitch answered 10/9, 2014 at 18:13 Comment(0)
D
20

OpenMPI's mpirun, v1.7 and later, defaults to binding processes to cores - that is, when it launches the python junk.py process, it binds it to the core that it will run on. That's fine, and the right default behaviour for most MPI use cases. But here each MPI task is then forking more processes (through the multiprocessing package), and those forked processes inherit the binding state of their parent - so they're all bound to the same core, fighting amongst themselves. (The "P" column in top will show you they're all on the same processor)

If you mpirun -np 2, you'll find two sets of three processes, each on a different core, each contending amongst themselves.

With OpenMPI, you can avoid this by turning off binding,

mpirun -np 1 --bind-to none junk.py

or choosing some other binding which makes sense given the final geometry of your run. MPICH has similar options with hydra.

Note that the fork()ing of subprocesses with mpi isn't always safe or supported, particularly with clusters running with infiniband interconnects, but OpenMPI's mpirun/mpiexec will warn you if it isn't safe.

Damiendamietta answered 10/9, 2014 at 18:38 Comment(4)
so the answer is to turn off multiprocessing in the script?Slavocracy
Sorry, the answer to which part? Launching with --bind-to none will avoid the CPU problem. As to the issue with fork, OpenMPI's mpirun/mpiexec will let you know if you're running in situations where fork-ing isn't safe, and you can tackle that issue when/if it arises.Damiendamietta
I've confirmed this solution works, both with the simple example and my real code. Thanks very much! I'm guessing this behavior of mpirun changed along with the binding options in openmpi version 1.7, otherwise I can't see why some of the codes I ran earlier would have worked.Flitch
In clusters with infiniband interconnects you can make fork() safe by adding export RDMAV_FORK_SAFE=1 before you call mpirun. If said cluster is using huge pages as well, you may need export RDMAV_HUGEPAGES_SAFE=1Mooneyham

© 2022 - 2024 — McMap. All rights reserved.