What's the benefit of MPI Datatype?

Asked 27/9, 2013 at 19:41 Answered 27/9, 2013 at 23:31

The MPI basic data types correspond to the data types of the host language, except MPI_BYTE and MPI_PACKED. My question is what's the benefit of using those MPI basic data type? Or equivalently, why it is bad to just use the host language data types?

I read a tutorial by William Gropp etc. In slide 31 "Why Datatypes", it says:

Since all data is labeled by type, an MPI implementation can support communication between processes on machines with very different memory representations and lengths of elementary datatypes (heterogeneous communication).
Specifying application-oriented layout of data in memory
- reduces memory-to-memory copies in the implementation
- allows the use of special hardware (scatter/gather) when available

(http://www.mcs.anl.gov/research/projects/mpi/tutorial/mpiintro/ppframe.htm)

I don't grasp the explanation. First, if elementary datatypes are different, I don't see why using MPI datatypes can resolve the difference since the basic MPI datatypes correspond to basic datatype of host language (elementary datatypes). Second, why this application-oriented layout of data in memory has the two benefits mentioned?

Any answers that address my original questions will be accepted. Any answer resolves my questions to William Gropp's explanation will also be accepted.

Rivulet answered 27/9, 2013 at 19:41 Comment(0)

The short answer is that this system adds a level of strong-typing to MPI.

The long answer is that the purpose of the MPI datatypes is to tell the MPI functions what they're working with. So, for example, if you send an int from a little-endian machine to a big-endian one then MPI can do the byte order conversion for you. Another more common benefit is that MPI knows how big an MPI_DOUBLE is, so you don't have to have a bunch of sizeof statements everywhere.

Note that the MPI datatypes are tags, not actual datatypes. In other words, you use

double d;

NOT

MPI_DOUBLE d;

Vankirk answered 27/9, 2013 at 21:1 Comment(6)

I don't get the idea of the last example. Let's say d is variable that will be sent out from one process to another process. When we define d, we certainly use MPI_DOUBLE d, not double d, right? – Rivulet 27/9, 2013 at 21:55

You use double d in your program. Look at the prototype of, let's say, MPI_Send. Notice it takes a void* as the buffer to send. It also asks for an MPI_Datatype to tell it what is in the buffer. THAT'S where you put MPI_DOUBLE. – Vankirk 27/9, 2013 at 22:46

I see what you mean. In other words, we can use either double d; or MPI_DOUBLE d; in the program (if the compiler doesn't complain). Do you have any insight on the two benefits, namely reducing mem-to-mem copies and allow using of special hardware? – Rivulet 27/9, 2013 at 23:22

You cannot use MPI_DOUBLE d;. MPI_DOUBLE is not a C or C++ type. It's only for internal MPI usage. In some MPI implementations the datatypes are simple indices (numbers), in others they're structs. You can't use them like regular types. That would be like saying 17 d; where 17 is your type. It makes no sense. – Vankirk 27/9, 2013 at 23:29

Yes, you are right. I am confused with the host programing language types with MPI types. – Rivulet 27/9, 2013 at 23:35

Also, operations like MPI_Reduce need to know what exactly it is supposed to sum. – Carhart 20/1, 2023 at 6:18

First, if elementary datatypes are different, I don't see why using MPI datatypes can resolve the difference since the basic MPI datatypes correspond to basic datatype of host language (elementary datatypes).

Because a given MPI datatype does not need to refer to the same elementary type on two diferent machines. MPI_INT could be an int on one machine and a long on the other. This is especially useful in C++, since the C++ standard doesn't specify byte size for the various integral types, so an int may in fact have more bits on one machine than the other.

Second, why this application-oriented layout of data in memory has the two benefits mentioned?

Look at the arguments of MPI_Send(). It receives a void* to the start of the data, and the number of elements to send. It assumes that the elements are lined up contiguously in memory, one after the other, and are all of the same type. In all but the luckiest of cases, this will not be true in your application. Even if you just have a simple array of structs (where the elements of the struct are not all the same type), the only way to send these structs without user-defined MPI datatypes would be to copy the first element from each struct to a separate array, send it, then copy the second element from each struct to a different array, send it, and so forth. Derived MPI datatypes allow you to pull data directly from where it is, without rearranging or copying it.

I'm not sure what the second point is supposed to refer to, though.

Lamblike answered 27/9, 2013 at 23:31 Comment(0)

Recommended topics

Hot tags