Difference between Fork/Join and Map/Reduce

Asked 29/3, 2010 at 13:34 Answered 9/9, 2013 at 20:22

What is the key difference between Fork/Join and Map/Reduce?

Do they differ in the kind of decomposition and distribution (data vs. computation)?

Fugal answered 29/3, 2010 at 13:34 Comment(0)

One key difference is that F-J seems to be designed to work on a single Java VM, while M-R is explicitly designed to work on a large cluster of machines. These are very different scenarios.

F-J offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.

M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.

Sacking answered 29/3, 2010 at 14:28 Comment(3)

More specifically, F-J allows workers to steal subtasks from each others' queues. This is not possible if the worker threads are on different machines (and thus do not have shared memory.) – Zeitler 21/1, 2011 at 12:24

According to the MapReduce Wikipedia entry, M-R is not necessarily restricted to a single tier of forked tasks. – Basifixed 7/3, 2013 at 1:35

what's the difference between fork/join & mapreduce outside the context of Java? – Cageling 15/1, 2017 at 22:9

There is a whole scientific paper on the subject, Comparing Fork/Join and MapReduce.

The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.

What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.

But there is a lot more to read there if you're up for it.

Poker answered 9/9, 2013 at 20:22 Comment(0)

Recommended topics

Hot tags