MPI hangs on MPI_Send for large messages

There is a simple program in c++ / mpi (mpich2), which sends an array of type double. If the size of the array more than 9000, then during the call MPI_Send my programm hangs. If array is smaller than 9000 (8000, for example) programm works fine. Source code is bellow:

main.cpp

using namespace std;

Cube** cubes;
int cubesLen;

double* InitVector(int N) {
   double* x = new double[N];
   for (int i = 0; i < N; i++) {
       x[i] = i + 1;
   }
   return x;
}

void CreateCubes() {
    cubes = new Cube*[12];
    cubesLen = 12;
    for (int i = 0; i < 12; i++) {
       cubes[i] = new Cube(9000);
    }
}

void SendSimpleData(int size, int rank) {
    Cube* cube = cubes[0];
    int nodeDest = rank + 1;
    if (nodeDest > size - 1) {
        nodeDest = 1;
    }

    double* coefImOut = (double *) malloc(sizeof (double)*cube->coefficentsImLength);
    cout << "Before send" << endl;
    int count = cube->coefficentsImLength;
    MPI_Send(coefImOut, count, MPI_DOUBLE, nodeDest, 0, MPI_COMM_WORLD);
    cout << "After send" << endl;
    free(coefImOut);

    MPI_Status status;
    double *coefIm = (double *) malloc(sizeof(double)*count);

    int nodeFrom = rank - 1;
    if (nodeFrom < 1) {
        nodeFrom = size - 1;
    }

    MPI_Recv(coefIm, count, MPI_DOUBLE, nodeFrom, 0, MPI_COMM_WORLD, &status);
    free(coefIm);
}

int main(int argc, char *argv[]) {
    int size, rank;
    const int root = 0;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    CreateCubes();

    if (rank != root) {
         SendSimpleData(size, rank);
    }

    MPI_Finalize();
    return 0;
}

class Cube

 class Cube {
 public:
    Cube(int size);
    Cube(const Cube& orig);
    virtual ~Cube();

    int Id() { return id; } 
    void Id(int id) { this->id = id; }

    int coefficentsImLength;
    double* coefficentsIm;

private:
    int id;
};

Cube::Cube(int size) {
    this->coefficentsImLength = size;

    coefficentsIm = new double[size];
    for (int i = 0; i < size; i++) {
        coefficentsIm[i] = 1;
    }
}

Cube::Cube(const Cube& orig) {
}

Cube::~Cube() {
    delete[] coefficentsIm;
}

The program runs on 4 processes:

mpiexec -n 4 ./myApp1

Any ideas?

The details of the Cube class aren't relevant here: consider a simpler version

#include <mpi.h>
#include <cstdlib>

using namespace std;

int main(int argc, char *argv[]) {
    int size, rank;
    const int root = 0;

    int datasize = atoi(argv[1]);

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank != root) {
        int nodeDest = rank + 1;
        if (nodeDest > size - 1) {
            nodeDest = 1;
        }
        int nodeFrom = rank - 1;
        if (nodeFrom < 1) {
            nodeFrom = size - 1;
        }

        MPI_Status status;
        int *data = new int[datasize];
        for (int i=0; i<datasize; i++)
            data[i] = rank;

        cout << "Before send" << endl;
        MPI_Send(&data, datasize, MPI_INT, nodeDest, 0, MPI_COMM_WORLD);
        cout << "After send" << endl;
        MPI_Recv(&data, datasize, MPI_INT, nodeFrom, 0, MPI_COMM_WORLD, &status);

        delete [] data;

    }

    MPI_Finalize();
    return 0;
}

where running gives

$ mpirun -np 4 ./send 1
Before send
After send
Before send
After send
Before send
After send
$ mpirun -np 4 ./send 65000
Before send
Before send
Before send

If in DDT you looked at the message queue window, you'd see everyone is sending, and no one is receiving, and you have a classic deadlock.

MPI_Send's semantics, wierdly, aren't well defined, but it is allowed to block until "the receive has been posted". MPI_Ssend is clearer in this regard; it will always block until the receive has been posted. Details about the different send modes can be seen here.

The reason it worked for smaller messages is an accident of the implementation; for "small enough" messages (for your case, it looks to be <64kB), your MPI_Send implementation uses an "eager send" protocol and doesn't block on the receive; for larger messages, where it isn't necessarily safe just to keep buffered copies of the message kicking around in memory, the Send waits for the matching receive (which it is always allowed to do anyway).

There's a few things you could do to avoid this; all you have to do is make sure not everyone is calling a blocking MPI_Send at the same time. You could (say) have even processors send first, then receive, and odd processors receive first, and then send. You could use nonblocking communications (Isend/Irecv/Waitall). But the simplest solution in this case is to use MPI_Sendrecv, which is a blocking (Send + Recv), rather than a blocking send plus a blocking receive. The send and receive will execute concurrently, and the function will block until both are complete. So this works

#include <mpi.h>
#include <cstdlib>

using namespace std;

int main(int argc, char *argv[]) {
    int size, rank;
    const int root = 0;

    int datasize = atoi(argv[1]);

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank != root) {
        int nodeDest = rank + 1;
        if (nodeDest > size - 1) {
            nodeDest = 1;
        }
        int nodeFrom = rank - 1;
        if (nodeFrom < 1) {
            nodeFrom = size - 1;
        }

        MPI_Status status;
        int *outdata = new int[datasize];
        int *indata  = new int[datasize];
        for (int i=0; i<datasize; i++)
            outdata[i] = rank;

        cout << "Before sendrecv" << endl;
        MPI_Sendrecv(outdata, datasize, MPI_INT, nodeDest, 0,
                     indata, datasize, MPI_INT, nodeFrom, 0, MPI_COMM_WORLD, &status);
        cout << "After sendrecv" << endl;

        delete [] outdata;
        delete [] indata;
    }

    MPI_Finalize();
    return 0;
}

Running gives

$ mpirun -np 4 ./send 65000
Before sendrecv
Before sendrecv
Before sendrecv
After sendrecv
After sendrecv
After sendrecv

Recommended topics

Hot tags