How can I pass long and/or unsigned integers to MPI arguments?
Asked Answered
I

2

5

Assume that I have a very large array which I wish to send or receive with MPI (v1). In order to index this array, I use an unsigned long integer.

Now, all MPI function calls I have seen use int types for their "count" arguments, such as in this example:

MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)

But what if, in my implementation, I require the ability to send/receive an array larger than the maximum number an int can hold? The compiler, naturally, gives me an "invalid conversion" error, when I try to feed an unsigned integer to the "count" argument. I thought about doing a cast, but then I am worried that this will shrink my variable, so I am kind of at a loss what to do.

Inchon answered 21/4, 2014 at 16:35 Comment(2)
You might just have to break the packet into multiple smaller packets.Oblate
See https://mcmap.net/q/2035095/-communicate-data-with-count-value-close-to-int_max/2189128Grose
S
7

Doing a cast is not the solution as it will simply truncate the long count. There are two obstacles to overcome here - an easy one and a hard one.

The easy obstacle is the int type for the count argument. You can get past it simply by creating a contiguous type of smaller size and then send the data as multiples of the new datatype. An example code follows:

// Data to send
int data[1000];

// Create a contiguous datatype of 100 ints
MPI_Datatype dt100;
MPI_Type_contiguous(100, MPI_INT, &dt100);
MPI_Type_commit(&dt100);

// Send the data as 10 elements of the new type
MPI_Send(data, 10, dt100, ...);

Since the count argument of MPI_Type_contiguous is int, with this technique you can send up to (231-1)2 = (262 - 232 + 1) elements. If this is not enough, you can create a new contiguous datatype from the dt100 datatype, e.g.:

// Create a contiguous datatype of 100 dt100's (effectively 100x100 elements)
MPI_Datatype dt10000;
MPI_Type_contiguous(100, dt100, &dt10000);
MPI_Type_commit(&dt10000);

If your original data size is not a multiple of the size of the new datatype, you could create a structure datatype whose first element is an array of int(data_size / cont_type_length) elements of the contiguous datatype and whose second element is an array of datasize % cont_type_length elements of the primitive datatype. Example follows:

// Data to send
int data[260];

// Create a structure type
MPI_Datatype dt260;

int blklens[2];
MPI_Datatype oldtypes[2];
MPI_Aint offsets[2];

blklens[0] = 2; // That's int(260 / 100)
offsets[0] = 0;
oldtypes[0] = dt100;

blklens[1] = 60; // That's 260 % 100
offsets[1] = blklens[0] * 100L * sizeof(int); // Offsets are in BYTES!
oldtypes[1] = MPI_INT;

MPI_Type_create_struct(2, blklens, offsets, oldtypes, &dt260);
MPI_Type_commit(&dt260);

// Send the data
MPI_Send(data, 1, dt260, ...);

MPI_Aint is large enough integer that can hold offsets larger than what int can represent on LP64 systems. Note that the receiver must construct the same datatype and use it similarly in the MPI_Recv call. Receiving an arbitrary non-integer amount of the contiguous datatype is a bit problematic though.

That's the easy obstacle. The not so easy one comes when your MPI implementation does not use internally long counts. In that case MPI would usually crash or only send part of the data or something weird might happen. Such an MPI implementation could be crashed even without constructing a special datatype by simply sending INT_MAX elements of type MPI_INT as the total message size would be (231 - 1) * 4 = 233 - 4. If that is the case, your only escape is manually splitting the message and sending/receiving it in a loop.

Souffle answered 21/4, 2014 at 20:7 Comment(2)
just to comment regarding "it's 2014", with Intel MPI for example, message sizes are still limited to 2GB, even if the number of elements fits in a 4-byte int (software.intel.com/en-us/forums/topic/361060 and software.intel.com/en-us/forums/topic/505683). I don't know about other implementations.Shetler
That's just the effect of automated writing in the evening. We experienced recently problems with some software on our cluster that started having problems when the user switched from Open MPI to Intel MPI and the message size was the culprit.Souffle
S
0

A quick/hacky solution is to do a reinterpret_cast<int>() of your unsigned counter in the sender, and do the reverse cast in the receiver. However I think a better solution is to make a struct that contains the pointer and the count with the correct types and follow the advice of this answer to create your own custom data type to pass around using MPI_Type_create_struct.

Stealthy answered 21/4, 2014 at 16:39 Comment(6)
I don't see how either of your ideas will result in the correct amount of data being transferred. Size of message isn't an opaque value.Sestos
Thanks for your answer, but I don't quite see how I can do this with a struct, either. The main problem is that, the struct also requires a "count" variable. So, yes, I could pack the entire array range I need into a struct, and then only send that "one" struct. But in creating the struct, I face the same problem, which is that MPI won't accept integer arguments. Could you maybe help me a bit in understanding what you had in mind?Inchon
@MarkAnderson When you send the struct the count is the number of those structs you send. In your case it would be a single struct so the count is 1.Stealthy
@BenVoigt The count parameter doesn't specify the amount of data being sent directly, it is the number of elements contained in the buffer. See mpich.org/static/docs/v3.1/www3/MPI_Recv.html, the size of the datatype being sent is a separate parameter.Stealthy
@Stealthy that I know. But in creating the struct itself you must give the count of elements inside the struct. So this only pushes the problem one step back. The problem is not the size of the datatype, the problem is precisely the number of elements contained in the buffer, since my buffer is aimed possibly containing a huge number of elements.Inchon
@MarkAnderson right I forgot about the distributed filesystem aspect, the comment to the OP is probably the best way to go.Stealthy

© 2022 - 2024 — McMap. All rights reserved.