Is there a limit for the message size in mpi using boost::mpi?
Asked Answered
L

1

6

I'm currently writing a simulation using boost::mpi on top of openMPI and everything works great. However once I scale up the system and therefore have to send larger std::vectors I get errors.

I've reduced the issue to the following problem:

#include <boost/mpi.hpp>
#include <boost/mpi/environment.hpp>
#include <boost/mpi/communicator.hpp>
#include <boost/serialization/vector.hpp>
#include <iostream>
#include <vector>
namespace mpi = boost::mpi;

int main() {
    mpi::environment env;
    mpi::communicator world;

    std::vector<char> a;
    std::vector<char> b;
    if (world.rank() == 0) {
        for (size_t i = 1; i < 1E10; i *= 2) {
            a.resize(i);
            std::cout << "a " << a.size();
            world.isend(0, 0, a);
            world.recv(0, 0, b);
            std::cout << "\tB " << b.size() << std::endl;
        }
    }
    return 0;
}

prints out:

a 1 B 1
a 2 B 2
a 4 B 4
....
a 16384 B 16384
a 32768 B 32768
a 65536 B 65536
a 131072    B 0
a 262144    B 0
a 524288    B 0
a 1048576   B 0
a 2097152   B 0

I'm aware that there is a limit to a mpi message size, but 65kB seems a little low to me. Is there a way of sending larger messages?

Lueck answered 15/1, 2015 at 14:58 Comment(7)
According to this you should not even be close to the max. message size. No idea what's going wrong here though.Suzettesuzi
What happens if you change isend to send? It could be that the non blocking send is causing an issue.Archducal
@Archducal : If I change the isend to send, it just stops (blocks) after writing the a 65536 B 65536 line.Lueck
@tk - can you query the status that is returned by recv? That might point you in a direction.Archducal
@Archducal Ok, I tried that: status.error() always returns 0.Lueck
With it stopping at 65K then only other thing I can think of is there is some sort of thread local storage going on.Archducal
Although this is not a correct MPI program (you are not waiting on or testing the request returned by isend()), it must be a bug in boost.mpi. And yes, it is supposed to block when isend() is replaced by send() and the message size is above the internal eager limit.Sentence
P
4

The limit of the message size is the same as for MPI_Send: INT_MAX.

The issue is that you are not waiting for the isend to finish before resizing the vector a in the next iteration. This means that isend will read invalid data due to the reallocations in the vector a. Note that buffer a is passed by reference to boost::mpi and you are thus not allowed to change the buffer a until the isend operation has finished.

If you run your program with valgrind, you will see invalid reads as soon as i = 131072.

The reason your program works till 65536 bytes, is that OpenMPI will send messages directly if they are smaller than the components btl_eager_limit. For the self component (sending to the own process), this happens to be 128*1024 bytes. Since boost::serialization adds the size of the std::vector to the byte stream, you exceed this eager_limit as soon as you use 128*1024 = 131072 as your input size.

To fix your code, save the boost::mpi::request return value from isend() and then add wait() to the end of the loop:

#include <boost/mpi.hpp>
#include <boost/mpi/environment.hpp>
#include <boost/mpi/communicator.hpp>
#include <boost/serialization/vector.hpp>
#include <iostream>
#include <vector>
namespace mpi = boost::mpi;

int main() {
    mpi::environment env;
    mpi::communicator world;

    std::vector<char> a;
    std::vector<char> b;
    if (world.rank() == 0) {
        for (size_t i = 1; i < 1E9; i *= 2) {
            a.resize(i);
            std::cout << "a " << a.size();
            mpi::request req = world.isend(0, 0, a);
            world.recv(0, 0, b);
            std::cout << "\tB " << b.size() << std::endl;
            req.wait();
        }
    }
    return 0;
}
Pillow answered 7/3, 2015 at 22:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.