What does the shuffling phase actually do?
Asked Answered
I

1

2

What does the shuffling phase actually do?


Possibility - A

As shuffling is the process of bringing the mapper o/p to the reducer o/p, it just brings the specific keys from the mappers to the particular reducers based on the code written in partitioner

eg. the o/p of mapper 1 is {a,1} {b,1}

the o/p of mapper 2 is {a,1} {b,1}

and in my partitioner, I have written that all keys starting with 'a' will go to reducer 1 and all keys starting with 'b will go to reducer 2 so the o/p would be:

reducer 1: {a,1}{a,1}

reducer 2: {b,1}{b,1}


Possibility - B

Or along with he above process, does it also groups the keys:

So, the o/p would be:

reducer 1: {a,[1,1]}

reducer 2: {b,[1,1]}


In my opinion I think it should be A because grouping of keys must take place after sorting because sorting is only done so that reducer can easily point out when one key is ending and the other key is starting. If yes, when does grouping of keys actually happen, please elaborate.

Ic answered 4/6, 2017 at 6:5 Comment(0)
D
0

Mappers and Reducers are not separate machines but just separate code. Both, the mapping code as well as the reducing code runs on the same set machines present in the cluster.


So, after all machines in the cluster have run mapper, the results are:

  1. Binned locally on the node (Consider it a "local-grouping"); and,
  2. Shuffled/Redistributed across all nodes on the cluster.

Consider the step-2 a "global-grouping" because it is done in a manner that all values belonging to one key, go to their assigned unique node.

Now, the nodes run the Reducer code on the (key, value) pairs residing on their memory.

Drape answered 10/4, 2020 at 0:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.