MPI_ERR_TRUNCATE: On Broadcast
Asked Answered
R

1

6

I have an int I intend to broadcast from root (rank==(FIELD=0)).

int winner

if (rank == FIELD) {
    winner = something;
}

MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&winner, 1, MPI_INT, FIELD, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
if (rank != FIELD) {
    cout << rank << " informed that winner is " << winner << endl;
}

But it appears I get

[JM:6892] *** An error occurred in MPI_Bcast
[JM:6892] *** on communicator MPI_COMM_WORLD
[JM:6892] *** MPI_ERR_TRUNCATE: message truncated
[JM:6892] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

Found that I can increase the buffer size in Bcast

MPI_Bcast(&winner, NUMPROCS, MPI_INT, FIELD, MPI_COMM_WORLD);

Where NUMPROCS is number of running processes. (actually seems like I just need it to be 2). Then it runs, but gives unexpected output ...

1 informed that winner is 103
2 informed that winner is 103
3 informed that winner is 103
5 informed that winner is 103
4 informed that winner is 103

When I cout the winner, it should be -1

Riordan answered 8/11, 2012 at 14:8 Comment(3)
I don't have any problems with the code as written; how do you define FIELD? Can you post more code; and are you absolutely sure that it's this broadcast that is causing the problem?Naji
@JonathanDursi, here it is gist.github.com/4039617Riordan
I hope that after three edits of my answer, the root cause of your problem is now readable and understandable :)Pallaton
P
11

There is an error early in your code:

if (rank == FIELD) {
   // randomly place ball, then broadcast to players
   ballPos[0] = rand() % 128;
   ballPos[1] = rand() % 64;
   MPI_Bcast(ballPos, 2, MPI_INT, FIELD, MPI_COMM_WORLD);
}

This is a very common mistake. MPI_Bcast is a collective operation and it must be called by all processes in order to complete. What happens in your case is that this broadcast is not called by all processes in MPI_COMM_WORLD (but only by the root) and hence interferes with the next broadcast operation, namely the one inside the loop. The second broadcast operation actually receives messages sent by the first one (two int elements) into a buffer for just one int and hence the truncation error message. In Open MPI each broadcast uses internally the same message tag values and hence different broadcasts can interfere with each other in not issued in sequence. This is compliant with the (old) MPI standard - one cannot have more than one outstanding collective operations in MPI-2.2 (in MPI-3.0 one can have several outstanding non-blocking collective operations). You should rewrite the code as:

if (rank == FIELD) {
   // randomly place ball, then broadcast to players
   ballPos[0] = rand() % 128;
   ballPos[1] = rand() % 64;
}
MPI_Bcast(ballPos, 2, MPI_INT, FIELD, MPI_COMM_WORLD);
Pallaton answered 8/11, 2012 at 16:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.