cout slowest processor MPI
Asked Answered
C

1

3

I am writing a program using MPI. Each processor executes a for loop:

int main(int argc, char** argv) {
  boost::mpi::environment env(argc, argv);

  for( int i=0; i<10; ++i ) {
    std::cout << "Index " << i << std::endl << std::flush;
  }
}

Is there a way to make the cout only happen on the last processor to hit index i? Or flag so a line is only executed on the last processor to get to it?

Crying answered 17/9, 2015 at 14:25 Comment(3)
I don't know about how to make it only print out the last one, but another way to get the same effect would be to place an MPI_Barrier after the std::cout loop, and have if(rank == 0) { std::cout << std::endl; } right after the barrier. Only do this if you absolutely have to, though, as needlessly introducing synchronization blocks is a great way to ensure that your code doesn't scale.Monodrama
Thanks! Yea I thought of this solution too but I'm really just trying to print out some indicator of where the code is ... its not necessary for the computation to work. Each iteration takes a while and I need to run a lot of them and I like having some notion of how far into the simulation I am.Crying
I don't know if there is a more elegant/efficient way, but could you initialize an atomic counter to 0 before running this and have each thread increment the counter when it completes that iteration? Then each thread can look at the counter to see if it is the last one and cout if it is.Perception
M
10

It might look like trivial, but actually, what you ask here is extremely complex for distributed memory models such as MPI...

In a shared memory environment, such as OpenMP for example, this would be trivially solved by defining a shared counter, incremented atomically by all threads, and checked afterwards to see if it's value corresponds to the number of threads. If so, then that would mean all threads passed the point and the current being the last one, he would take care of the printing.

In a distributed environment, defining and updating such a shared variable is very complex, since each process might run on a remote machine. To still allow for that, MPI proposes since MPI-2.0 memory windows and one-sided communications. However, even with that, it wasn't possible to properly implement an atomic counter increment while also reliably getting it's value. It is only with MPI 3.0 and the introduction of the MPI_Fetch_and_op() function that this became possible. Here is an example of implementation:

#include <mpi.h>
#include <iostream>

int main( int argc, char *argv[] ) {

    // initialisation and inquiring of rank and size
    MPI_Init( &argc, &argv);

    int rank, size;
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    MPI_Comm_size( MPI_COMM_WORLD, &size );

    // creation of the "shared" counter on process of rank 0
    int *addr = 0, winSz = 0;
    if ( rank == 0 ) {
        winSz = sizeof( int );
        MPI_Alloc_mem( winSz, MPI_INFO_NULL, &addr );
        *addr = 1; // initialised to 1 since MPI_Fetch_and_op returns value *before* increment
    }
    MPI_Win win;
    MPI_Win_create( addr, winSz, sizeof( int ), MPI_INFO_NULL, MPI_COMM_WORLD, &win );

    // atomic incrementation of the counter
    int counter, one = 1;
    MPI_Win_lock( MPI_LOCK_EXCLUSIVE, 0, 0, win );
    MPI_Fetch_and_op( &one, &counter, MPI_INT, 0, 0, MPI_SUM, win );
    MPI_Win_unlock( 0, win );

    // checking the value of the counter and printing by last in time process
    if ( counter == size ) {
        std::cout << "Process #" << rank << " did the last update" << std::endl;
    }

    // cleaning up
    MPI_Win_free( &win );
    if ( rank == 0 ) {
        MPI_Free_mem( addr );
    }
    MPI_Finalize();

    return 0;
}

As you can see, this is quite lengthy and complex for such a trivial request. And moreover, this requires MPI 3.0 support.

Unfortunately, Boost.MPI which seems to your target, only "supports the majority of functionality in MPI 1.1". So if you really want to get this functionality, you'll have to use some plain MPI programming.

Milinda answered 18/9, 2015 at 7:22 Comment(1)
This example is very valuable because it is hard to find a simple example for MPI_Fetch_and_op.Intine

© 2022 - 2024 — McMap. All rights reserved.