It might look like trivial, but actually, what you ask here is extremely complex for distributed memory models such as MPI...
In a shared memory environment, such as OpenMP for example, this would be trivially solved by defining a shared counter, incremented atomically by all threads, and checked afterwards to see if it's value corresponds to the number of threads. If so, then that would mean all threads passed the point and the current being the last one, he would take care of the printing.
In a distributed environment, defining and updating such a shared variable is very complex, since each process might run on a remote machine. To still allow for that, MPI proposes since MPI-2.0 memory windows and one-sided communications. However, even with that, it wasn't possible to properly implement an atomic counter increment while also reliably getting it's value. It is only with MPI 3.0 and the introduction of the MPI_Fetch_and_op()
function that this became possible. Here is an example of implementation:
#include <mpi.h>
#include <iostream>
int main( int argc, char *argv[] ) {
// initialisation and inquiring of rank and size
MPI_Init( &argc, &argv);
int rank, size;
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
// creation of the "shared" counter on process of rank 0
int *addr = 0, winSz = 0;
if ( rank == 0 ) {
winSz = sizeof( int );
MPI_Alloc_mem( winSz, MPI_INFO_NULL, &addr );
*addr = 1; // initialised to 1 since MPI_Fetch_and_op returns value *before* increment
}
MPI_Win win;
MPI_Win_create( addr, winSz, sizeof( int ), MPI_INFO_NULL, MPI_COMM_WORLD, &win );
// atomic incrementation of the counter
int counter, one = 1;
MPI_Win_lock( MPI_LOCK_EXCLUSIVE, 0, 0, win );
MPI_Fetch_and_op( &one, &counter, MPI_INT, 0, 0, MPI_SUM, win );
MPI_Win_unlock( 0, win );
// checking the value of the counter and printing by last in time process
if ( counter == size ) {
std::cout << "Process #" << rank << " did the last update" << std::endl;
}
// cleaning up
MPI_Win_free( &win );
if ( rank == 0 ) {
MPI_Free_mem( addr );
}
MPI_Finalize();
return 0;
}
As you can see, this is quite lengthy and complex for such a trivial request. And moreover, this requires MPI 3.0 support.
Unfortunately, Boost.MPI which seems to your target, only "supports the majority of functionality in MPI 1.1". So if you really want to get this functionality, you'll have to use some plain MPI programming.
MPI_Barrier
after thestd::cout
loop, and haveif(rank == 0) { std::cout << std::endl; }
right after the barrier. Only do this if you absolutely have to, though, as needlessly introducing synchronization blocks is a great way to ensure that your code doesn't scale. – Monodrama