I'm running a Hadoop job using Hive actually that is supposed to uniq
lines in many text files. In the reduce step, it chooses the most recently timestamped record for each key.
Does Hadoop guarantee that every record with the same key, output by the map step, will go to a single reducer, even if many reducers are running across a cluster?
I worry that the mapper output might be split after the shuffle happens in the middle of a set of records with the same key.