What does the shuffling phase actually do?

Possibility - A

As shuffling is the process of bringing the mapper o/p to the reducer o/p, it just brings the specific keys from the mappers to the particular reducers based on the code written in partitioner

eg. the o/p of mapper 1 is {a,1} {b,1}

the o/p of mapper 2 is {a,1} {b,1}

and in my partitioner, I have written that all keys starting with 'a' will go to reducer 1 and all keys starting with 'b will go to reducer 2 so the o/p would be:

reducer 1: {a,1}{a,1}

reducer 2: {b,1}{b,1}

Possibility - B

Or along with he above process, does it also groups the keys:

So, the o/p would be:

reducer 1: {a,[1,1]}

reducer 2: {b,[1,1]}

In my opinion I think it should be A because grouping of keys must take place after sorting because sorting is only done so that reducer can easily point out when one key is ending and the other key is starting. If yes, when does grouping of keys actually happen, please elaborate.

Possibility - A

Possibility - B

Recommended topics

Hot tags