I have a python script that I've written using the multiprocessing module, for faster execution. The calculation is embarrassingly parallel, so the efficiency scales with the number of processors. Now, I'd like to use this within an MPI program, which manages an MCMC calculation across multiple computers. This code has a call to system() which invokes the python script. However, I'm finding that when it is called this way, the efficiency gain from using python multiprocessing vanishes.
How can I get my python script to retain the speed gains from multiprocessing when called from MPI?
Here is a simple example, which is analogous to the much more complicated codes I want to use but displays the same general behavior. I write an executable python script called junk.py.
#!/usr/bin/python
import multiprocessing
import numpy as np
nproc = 3
nlen = 100000
def f(x):
print x
v = np.arange(nlen)
result = 0.
for i, y in enumerate(v):
result += (x+v[i:]).sum()
return result
def foo():
pool = multiprocessing.Pool(processes=nproc)
xlist = range(2,2+nproc)
print xlist
result = pool.map(f, xlist)
print result
if __name__ == '__main__':
foo()
When I run this from the shell by itself, using "top" I can see three python processes each taking 100% of cpu on my 16-core machine.
node094:mpi[ 206 ] /usr/bin/time junk.py
[2, 3, 4]
2
3
4
[333343333400000.0, 333348333450000.0, 333353333500000.0]
62.68user 0.04system 0:21.11elapsed 297%CPU (0avgtext+0avgdata 16516maxresident)k
0inputs+0outputs (0major+11092minor)pagefaults 0swaps
However, if I invoke this with mpirun, each python process takes 33% of cpu, and overall it takes about three times as long to run. Calling with -np 2 or more results in more processes, but doesn't speed up the computation any.
node094:mpi[ 208 ] /usr/bin/time mpirun -np 1 junk.py
[2, 3, 4]
2
3
4
[333343333400000.0, 333348333450000.0, 333353333500000.0]
61.63user 0.07system 1:01.91elapsed 99%CPU (0avgtext+0avgdata 16520maxresident)k
0inputs+8outputs (0major+13715minor)pagefaults 0swaps
(Additional notes: This is mpirun 1.8.1, python 2.7.3 on Linux Debian version wheezy. I have heard system() is not always allowed within MPI programs, but it's been working for me for the last five years on this computer. For example I have called a pthread-based parallel code from system() within an MPI program, and it's used 100% of cpu for each thread, as desired. Also, in case you were going to suggest running the python script in serial and just calling it on more nodes...the MCMC calculation involves a fixed number of chains which need to move in a synchronized way, so the computation unfortunately can't be reorganized that way.)