MPI_Barrier with MPI_Gather using small vs. large data set sizes? [duplicate]

Asked 15/5, 2017 at 19:23 Answered 15/5, 2017 at 22:31

Solved c synchronization mpi distributed-computing

I've been using MPI_Scatter / MPI_Gather for a variety of parallel computations. One thing I've noticed is that MPI_Barrier() is often called to synchronize processors, akin to the OpenMP barrier directive. I was tweaking my code for a project and commented out my MPI_Barrier() lines below, and found that the computations were still correct. Why is this the case? I can understand why the first MPI_Barrier() is needed- the other processors don't need to wait; as soon as they get the data from processor MASTER they can begin computations. But is MPI_Barrier ever needed AFTER an MPI_Gather, or does MPI_Gather already have an implicit barrier within?

Edit: does the size of the data being processed matter in this case?

MPI_Scatter(&sendingbuffer,sendingcount,MPI_FLOAT,receivingbuffer,sendcount,
MPI_INT,MASTER_ID,MPI_COMM_WORLD);

// PERFORM SOME COMPUTATIONS
MPI_Barrier(); //<--- I understand why this is needed

MPI_Gather(localdata,sendcount, MPI_INT, global,sendcount, MPI_INT, MASTER_ID, MPI_COMM_WORLD);
//MPI_Barrier(); <------ is this ever needed?

Halfandhalf answered 15/5, 2017 at 19:23 Comment(2)

Also, the buffering feature of MPI will do some default safety job for you. Your "sender" will send your data to the "receiver" system buffer. Then inside the "receiver", it will transfer your data from system buffer to application buffer. Therefore, for small amount of data, you can assume your program is safe. – Digressive 15/5, 2017 at 19:47

Regarding your edit: No, data sizes have no impact of the correctness in this case. – Blakeblakelee 16/5, 2017 at 16:29

None of the barriers are needed!

MPI_Gather is a blocking operation, that is the outputs are available after the call completes. That does not imply a barrier, because non-root-ranks are allowed to, but not guaranteed to, complete before the root / other ranks starts it's operation. However, it is perfectly safe to access global on the MASTER_ID rank and reuse localdata on any rank after the local call completes.

Synchronization with the message-based MPI is different from the shared-memory OpenMP. For blocking communication, usually no explicit synchronization ins necessary - the result is guaranteed to be available after the call completes.

Synchronization of sorts is necessary for non-blocking communication, but that is done via MPI_Test/MPI_Wait on specific messages - barriers might even provide a false sense of correctness if you tried to substitute a MPI_Wait with MPI_Barrier. With one-sided communication, it gets more complicated and barriers can play a role.

Actually, you only rarely need a barrier, instead avoid them to not introduce any unnecessary synchronization.

Edit: Given the contradicting other answers, here is the standard (MPI 3.1, Section 5.1) citation (emphasis mine).

Collective operations can (but are not required to) complete as soon as the caller’s participation in the collective communication is finished. A blocking operation is complete as soon as the call returns. A nonblocking (immediate) call requires a separate completion call (cf. Section 3.7). The completion of a collective operation indicates that the caller is free to modify locations in the communication buffer. It does not indicate that other processes in the group have completed or even started the operation (unless otherwise implied by the description of the operation). Thus, a collective communication operation may, or may not, have the effect of synchronizing all calling processes. This statement excludes, of course, the barrier operation.

To address the recent edit: No, data sizes have no impact of the correctness in this case. Data sizes in MPI sometimes have an impact on whether a incorrect MPI program will deadlock or not.

Blakeblakelee answered 15/5, 2017 at 22:31 Comment(0)

MPI_Gather() does not have an implicit barrier internally.

Question: Is the barrier after gathering needed?

No.

As you can read in When do I need to use MPI_Barrier()?:

All collective operations in MPI before MPI-3.0 are blocking, which means that it is safe to use all buffers passed to them after they return. In particular, this means that all data was received when one of these functions returns. (However, it does not imply that all data was sent!) So MPI_Barrier is not necessary (or very helpful) before/after collective operations, if all buffers are valid already.

Sconce answered 15/5, 2017 at 19:46 Comment(2)

Apart from MPI_Gather not implying a barrier, this is absolutely wrong. MPI_Gather is blocking. You aren't working with a shared memory model here. – Blakeblakelee 15/5, 2017 at 22:34

@Blakeblakelee thank you, answer updated. However, I made it so that our answers are still distinct. Nice answer BTW! – Sconce 16/5, 2017 at 7:40

Recommended topics

Hot tags