For nearest neighbour style halo swaps, usually one of the most efficient implementations is to use a set of MPI_Sendrecv
calls, usually two per each dimension:
Half-step one - Transfer of data in positive direction: each rank receives from the one on its left and into its left halo and sends data to the rank on its right
+-+-+---------+-+-+ +-+-+---------+-+-+ +-+-+---------+-+-+
--> |R| | (i,j-1) |S| | --> |R| | (i,j) |S| | --> |R| | (i,j+1) |S| | -->
+-+-+---------+-+-+ +-+-+---------+-+-+ +-+-+---------+-+-+
(S
designates the part of the local data being communicated while R
designates the halo into which data is being received, (i,j)
are the coordinates of the rank in the process grid)
Half-step two - Transfer of data in negative direction: each rank receives from the one on its right and into its right halo and sends data to the rank on its left
+-+-+---------+-+-+ +-+-+---------+-+-+ +-+-+---------+-+-+
<-- |X|S| (i,j-1) | |R| <-- |X|S| (i,j) | |R| <-- |X|S| (i,j+1) | |R| <--
+-+-+---------+-+-+ +-+-+---------+-+-+ +-+-+---------+-+-+
(X
is that part of the halo region that has already been populated in the previous half-step)
Most switched networks support multiple simultaneous bi-directional (full duplex) communications and the latency of the whole exchange is
Both of the above half-steps are repeated as many times as is the dimensionality of the domain decomposition.
The process is even more simplified in version 3.0 of the standard, which introduces the so-called neighbourhood collective communications. The whole multidimensional halo swap can be performed using a single call to MPI_Neighbor_alltoallw
.