Difference between Fork/Join and Map/Reduce
Asked Answered
F

2

54

What is the key difference between Fork/Join and Map/Reduce?

Do they differ in the kind of decomposition and distribution (data vs. computation)?

Fugal answered 29/3, 2010 at 13:34 Comment(0)
S
47

One key difference is that F-J seems to be designed to work on a single Java VM, while M-R is explicitly designed to work on a large cluster of machines. These are very different scenarios.

F-J offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.

M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.

Sacking answered 29/3, 2010 at 14:28 Comment(3)
More specifically, F-J allows workers to steal subtasks from each others' queues. This is not possible if the worker threads are on different machines (and thus do not have shared memory.)Zeitler
According to the MapReduce Wikipedia entry, M-R is not necessarily restricted to a single tier of forked tasks.Basifixed
what's the difference between fork/join & mapreduce outside the context of Java?Cageling
P
20

There is a whole scientific paper on the subject, Comparing Fork/Join and MapReduce.

The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.

What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.

But there is a lot more to read there if you're up for it.

Poker answered 9/9, 2013 at 20:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.