mpi4py: Communicating between spawned processes
Asked Answered
P

1

2

I have one process running a program called t1.py which spawns 3 other processes, all of which run t2.py. I want to broadcast a value from the spawned process with a rank of 0 to the two other spawned processes. However, when bcast is called, the program blocks. Any idea why this happens? And how do I fix it?

t1.py

from mpi4py import MPI
import sys

sub_comm = MPI.COMM_SELF.Spawn(sys.executable, args=['t2.py'], maxprocs=3)
print 'hi'

t2.py

from mpi4py import MPI

comm = MPI.Comm.Get_Parent()

print 'ho ', comm.Get_rank()
a = comm.bcast(comm.Get_rank(), root=0)
print a

output

hi
ho  2
ho  0
ho  1
Prohibitionist answered 5/11, 2016 at 15:33 Comment(3)
I faintly remember that MPISpawn returns an Inter-Communicator in contrast to the Intra-Communicators that we are used to (or the other way round). You can however convert one type to the other. Please consult the MPI standard about this.Drake
Do you ever call MPI_Finalize()? In all processes?Drake
I guess the parent process must take part in the broadcast.Drake
D
6

If you just want the childs to talk to each other, you can use MPI.COMM_WORLD:

a = MPI.COMM_WORLD.bcast(MPI.COMM_WORLD.Get_rank(), root=0)

By printing MPI.COMM_WORLD.Get_rank(), ' of ',MPI.COMM_WORLD.Get_size(), you can check that the childs'MPI.COMM_WORLD is limited to the childs.

Now, let's investigate the reason why comm.bcast(...) failed if comm is obtained by comm=MPI.Comm.Get_parent(). Indeed, by looking at the size and ranks of this communicator, it seems very similar to MPI.COMM_WORLD. But, on the contrary, comm is very different from MPI.COMM_WORLD: it is an intercommunicator. More precisely, it is the way a parent can talk to its childs. Collective communications can be used, but all processes, both the parent and its childs, must call the function. Please carrefully read the MPI standards, in particular the sections 5.2.2 and 5.2.3 about Intercommunicator Collective Operations. Regarding bcast(), MPI.ROOT and MPI.PROC_NULL are used instead of the rank of the broadcaster root to specify the direction (parent to child of child to parent) and the sending process. Lastly, an intracommunicator can be defined on the base of an intercommunicator by using Merge() (corresponding to MPI_Intercomm_merge()). In this intracommunicator, parents and childs do not belong to two different groups: they are processes characterized by their unique rank, as usual.

Here are the modified versions of t1.py and t2.py, where a bcast() for a intercommunicator is performed. Then the intercommunicator is Merge() and a bcast() on the resulting intracommunicator is called as usual.

t1.py

from mpi4py import MPI
import sys

sub_comm = MPI.COMM_SELF.Spawn(sys.executable, args=['t2.py'], maxprocs=3)

val=42
sub_comm.bcast(val, MPI.ROOT)

common_comm=sub_comm.Merge(False)
print 'parent in common_comm ', common_comm.Get_rank(), ' of  ',common_comm.Get_size()
#MPI_Intercomm_merge(parentcomm,1,&intracomm);

val=13
c=common_comm.bcast(val, root=0)
print "value from rank 0 in common_comm", c

t2.py

from mpi4py import MPI

comm = MPI.Comm.Get_parent()

print 'ho ', comm.Get_rank(), ' of  ',comm.Get_size(),' ', MPI.COMM_WORLD.Get_rank(), ' of  ',MPI.COMM_WORLD.Get_size()
a = MPI.COMM_WORLD.bcast(MPI.COMM_WORLD.Get_rank(), root=0)
print "value from other child", a

print "comm.Is_inter", comm.Is_inter()
b = comm.bcast(comm.Get_rank(), root=0)
print "value from parent", b

common_comm=comm.Merge(True)
print "common_comm.Is_inter", common_comm.Is_inter()
print 'common_comm ', common_comm.Get_rank(), ' of  ',common_comm.Get_size()

c=common_comm.bcast(0, root=0)
print "value from rank 0 in common_comm", c
Dread answered 5/11, 2016 at 21:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.