The MPI-3 standard introduces shared-memory, that can be read and written by all processes sharing this memory without using calls to the MPI library. While there are examples of one-sided communications using shared or non-shared memory, I did not find much information about how to use shared memory correctly with direct access.
I ended up doing something like this, which works well, but I was wondering if the MPI standard guarantees that it will always work?
// initialization:
MPI_Comm comm_shared;
MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, i_mpi, MPI_INFO_NULL, &comm_shared);
// allocation
const int N_WIN=10;
const int mem_size = 1000*1000;
double* mem[10];
MPI_Win win[N_WIN];
for (int i=0; i<N_WIN; i++) { // I need several buffers.
MPI_Win_allocate_shared( mem_size, sizeof(double), MPI_INFO_NULL, comm_shared, &mem[i], &win[i] );
MPI_Win_lock_all(0, win);
}
while(1) {
MPI_Barrier(comm_shared);
... // write anywhere on shared memory
MPI_Barrier(comm_shared);
... // read on shared memory written by other processes
}
// deallocation
for (int i=0; i<N_WIN; i++) {
MPI_Win_unlock_all(win[i]);
MPI_Win_free(&win[i]);
}
Here, I ensure synchronization by using MPI_Barrier()
and assume the hardware makes the memory view consistent.
Furthermore, because I have several shared windows, a single call to MPI_Barrier seems more efficient than calling MPI_Win_fence()
on every shared memory window.
It seems to work well an my x86 laptops and servers. But is this programm a valid/correct MPI program? Is there a more efficient method of achieving the same thing?
#pragma omp barrier
followed byMPI_Barrier(comm_shared);
and another#pragma omp barrier
might do the trick ? (If I understood correctly,#pragma omp barrier
is also a memory barrier). – Bullroarer