What are some scenarios for which MPI is a better fit than MapReduce?
Asked Answered
M

5

31

As far as I understand, MPI gives me much more control over how exactly different nodes in the cluster will communicate.

In MapReduce/Hadoop, each node does some computation, exchanges data with other nodes, and then collates its partition of results. Seems simple, but since you can iterate the process, even algorithms like K-means or PageRank fit the model quite well. On a distributed file system with locality of scheduling, the performance is apparently good. In comparison, MPI gives me explicit control over how nodes send messages to each other.

Can anyone describe a cluster programming scenario where the more general MPI model is an obvious advantage over the simpler MapReduce model?

Married answered 7/10, 2009 at 9:22 Comment(0)
B
26

Almost any scientific code -- finite differences, finite elements, etc. Which kind of leads to the circular answer, that any distributed program which doesn't easily map to MapReduce would be better implemented with a more general MPI model. Not sure that's much help to you, I'll downvote this answer right after I post it.

Boulevard answered 7/10, 2009 at 11:25 Comment(6)
Thanks, Mark (no need to downvote). Do you mean that iterative algorithms are more efficient in MPI, since in MapReduce they have to be implemented with a sequence of jobs? Apparently, MapReduce has acceptable performance at least for some iterative algorithms.Married
Not really. I was thinking of computations such as finite difference solvers, in which individual processes (on individual processors) computer over part of the total domain, then exchange halo information, then carry on computing. I find it difficult to see how this would map to MapReduce.Boulevard
In MapReduce, it is implemented by multiple jobs. Each MapReduce job is of the form: compute results, then exchange them. Multiple jobs can implement multiple "exchanges". With locality of scheduling, the next iteration of jobs is scheduled so that each task reads the data that was written to the local node by a task in the previous job, so the cost of multiple rounds of jobs is reduced.Married
Hmmm, I'll have to look a bit more closely at MapReduce. However, one source of performance reduction with MapReduce may be the strict sequencing of computation with communication; with MPI we try very hard (usually without much success) to overlap them.Boulevard
Iterative algorithms are fine with a MapReduce framework, for example "run this job on the previous job results until a condition is met or we decide to give up". There are job control schemes for Hadoop which abstract this away behind a query language. What the map-reduce paradigm doesn't do is communicate between nodes - no "start reducing when enough results are found among all mappers". So yes, no overlapping or skipping one un-needed mapper because another found what was needed.Byer
Overlapping communication with computation is mostly a myth. Expensive networks can do it (they use DMA), but normally the CPU is involved with packing buffers. We don't yet have nonblocking collectives (though this might go into MPI-3) which is the use case where a lot of computation could be meaningfully performed. MPI is a much more general and higher performance model, MapReduce offers a convenient abstraction with better fault tolerance for use cases where the "parallel" part of the algorithm is almost trivial.Bushwa
S
21

Athough, this question has been answered, I would like to add/reiterate one very important point.

MPI is best suited for problems that require a lot of interprocess communication.

When Data becomes large (petabytes, anyone?), and there is little interprocess communication, MPI becomes a pain. This is so because the processes will spend all the time sending data to each other (bandwidth becomes a limiting factor) and your CPUs will remain idle. Perhaps an even bigger problem is reading all that data.

This is the fundamental reason behind having something like Hadoop. The Data also has to be distributed - Hadoop Distributed File System!

To say all this in short, MPI is good for task parallelism and Hadoop is good for Data Parallelism.

Shortening answered 6/1, 2010 at 15:32 Comment(1)
This is somewhat inaccurate. A primary focus of MPI is domain decomposition algorithms, a highly data-parallel domain, but with some communication between subdomains. Data can be stored locally with MPI as well. MPI is good when the communication pattern has some locality and any time you need low-latency reductions. MapReduce/Hadoop is good when fault tolerance is more important than absolute performance.Bushwa
M
1

The best answer that I could come up with is that MPI is better than MapReduce in two cases:

  1. For short tasks rather than batch processing. For example, MapReduce cannot be used to respond to individual queries - each job is expected to take minutes. I think that in MPI, you can build a query response system where machines send messages to each other to route the query and generate the answer.

  2. For jobs nodes need to communicate more than what iterated MapReduce jobs support, but not too much so that the communication overheads make the computation impractical. I am not sure how often such cases occur in practice, though.

Married answered 11/10, 2009 at 23:51 Comment(1)
map reduce tasks can take milliseconds too, there is no requirement to say that they must take minutesColville
C
1

I expect that MPI beats MapReduce easily when the task is iterating over a data set whose size is comparable with the processor cache, and when communication with other tasks is frequently required. Lots of scientific domain-decomposition parallelization approaches fit this pattern. If MapReduce requires sequential processing and communication, or ending of processes, then the computational performance benefit from dealing with a cache-sized problem is lost.

Chela answered 30/6, 2011 at 7:14 Comment(0)
C
1

When the computation and data that you are using have irregular behaviors that mostly translates to many message-passings between objects, or when you need low level hardware level accesses e.g. RDMA then MPI is better. In some answers that you see in here the latency of tasks or memory consistency model gets mentioned, frameworks like Spark or Actor Models like AKKA have shown that they can compete with MPI. Finally one should consider that MPI has benefit of being for years the main base for development of libraries needed for scientific computations (This are the most important missing parts missing from new frameworks using DAG/MapReduce Models).

All in all, I think the benefits that MapReduce/DAG models are bringing to the table like dynamic resource managers, and fault tolerance computation will make make them feasible for scientific computing groups.

Conductivity answered 16/7, 2014 at 17:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.