Is MPI_Reduce blocking (or a natural barrier)?
Asked Answered
R

3

9

I have the code snippet below in C++ which basically calculates the pi using classic monte carlo technic.

    srand48((unsigned)time(0) + my_rank);

    for(int i = 0 ; i < part_points; i++)
    {
            double x = drand48();

            double y = drand48();

            if( (pow(x,2)+pow(y,2)) < 1){ ++count; }
    }

    MPI_Reduce(&count, &total_hits, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

    MPI_Barrier(MPI_COMM_WORLD);

    if(my_rank == root)
    {
            pi = 4*(total_hits/(double)total_points);

            cout << "Calculated pi: "  <<  pi << " in " << end_time-start_time <<  endl;
    }

I am just wondering if the MPI_Barrier call is necessary. Does MPI_Reduce make sure that the body of the if statement won't be executed before the reduce operation is completely finished ? Hope I was clear. Thanks

Rankle answered 14/2, 2012 at 21:27 Comment(0)
X
12

Yes, all collective communication calls (Reduce, Scatter, Gather, etc) are blocking. There's no need for the barrier.

Xenocrates answered 14/2, 2012 at 22:12 Comment(2)
The MPI standard allows for early exit of participating processes. The only collective call that guarantees synchronisation is MPI_Barrier.Allerus
And I don't agree with @HD189733b: if the root finishes its computation early, it has to sit and wait in the reduce until everyone has contributed. There is zero possibility of crash or incorrect result.Voltmer
V
5

Ask your self if that barrier is needed. Suppose you are not the root; you call Reduce, which sends off your data. Is there any reason to sit and wait until the root has the result? Answer: no, so you don't need the barrier.

Suppose you're the root. You issue the reduce call. Semantically you are now forced to sit and wait until the result is fully assembled. So why the barrier? Again, no barrier call is needed.

In general, you almost never need a barrier because you don't care about temporal synchronization. The semantics guarantee that your local state is correct after the reduce call.

Voltmer answered 2/1, 2018 at 18:48 Comment(0)
E
4

Blocking yes, a barrier, no. It is very important to call MPI_Barrier() for MPI_Reduce() when executing in a tight loop. If not calling MPI_Barrier() the receive buffers of the reducing process will eventually run full and the application will abort. While other participating processes only need to send and continue, the reducing process has to receive and reduce. The above code does not need the barrier if my_rank == root == 0 (what probably is true). Anyways... MPI_Reduce() does not perform a barrier or any form of synchronization. AFAIK even MPI_Allreduce() isn't guaranteed to synchronize (at least not by the MPI standard).

Ethno answered 11/3, 2013 at 16:57 Comment(2)
As this answer appears to contradict the selected answer and there's no evidence that it has been voted down, can someone comment on whether this is actually wrong?Fallfish
This answer is only half-correct. The receiving buffers might run full. Most MPI libraries implement flow control mechanisms that prevent such thing from happening.Allerus

© 2022 - 2024 — McMap. All rights reserved.