Understanding a mapreduce algorithm for overlap calculation - McMap

About

Understanding a mapreduce algorithm for overlap calculation

Asked 10/3, 2013 at 6:5 Answered 18/4, 2013 at 11:21

java hadoop mapreduce elastic-map-reduce hadoop-partitioning

T

1

6

I want help understanding the algorithm. I ve pasted the algorithm explanation first and then my doubts.

Algorithm:( For calculating the overlap between record pairs)

Given a user defined parameter K, the file DR( *Format: record_id, data*) is split into K nearly equi-sized chunks, such that the data of a document, Di falls into the i/K th chunk.

We overrode Hadoop’s partitioning function which maps a key emitted by the mapper to a reducer instance. Every key (i,j) is mapped to a reducer in the j/Kth group.

The special key i,* and its associated value, i.e, the document's data are replicated at most K times, so that the full content of the document can be delivered at every reducer. Each reducer in a group thus needs to recover and load in memory only one chunk of DR file, whose size can be set arbitrarily small by varying K. Thus overlap can be calculated. This is achieved at the cost of replicating the documents delivered through the MapReduce framework.

Doubts:

I have made some assumptions:

Statement: Every key (i,j) is mapped to a reducer in the j/Kth group. Assumption: K reduce nodes are present, and the the key is mapped to j/Kth reduce node.

Doubt: Are some reduce nodes grouped together? say 0,1,2 nodes are grouped as Group-0?

Statement: the document's data are replicated at most K times, so that the full content of the document can be delivered at every reducer.

So that means K equals no. of reducer nodes? If not then, we are wasting compute nodes right, without using them right?

Main Doubt: Is K equal to the number of Reducer Nodes??

Hoping for responses!

Thanks!

Tucson answered 10/3, 2013 at 6:5 Comment(6)

I don't think you're giving us enough information to understand this algorithm... – Scan 3/4, 2013 at 19:32

Basically, there are two types of Mapper outputs: 1. <i,i><value>, – Tucson 3/4, 2013 at 21:52

Basically, there are two types of mapper output:a. key: <i,j>, val: <intersection> b. key: <i,*>, val: <data>. The first type of outputs can reach any reducer based on the second part of the second part of the key, i.e. j. The goal is to group together at any reduce instance records from any mapper output that has same "i" in first part of the key. <i,*> key must be replicated k times to reach every reduce instance, so that every intersection record with i as the first part of the key gets the i data. Does this mean k must be replicated as many times as the no. of reduce nodes??? – Tucson 3/4, 2013 at 22:1

This seems like a lot of replication... – Tucson 3/4, 2013 at 22:2

I'm still not clear on: (1) what this map/reduce program is trying to do; (2) what overriding "Hadoop’s partitioning function" means; (3) why you need two different keys/values and how do they get produced in the map phase (the mapper allows only one type for key and one type for value); (4) why this can't be designed using two or more map/reduce algorithm so that no "hack" (which is always dangerous and not portable) need to be made and the MR paradigm can be respected. – Scan 4/4, 2013 at 13:50

Not cleat either on what you are trying to do. What do you mean by "overlap"? Overlap between what and what? Please provide some minimal example of input (or at least of the mapper output) and the expected output out of the reduce stage. – Volney 10/4, 2013 at 3:45

I

0

Test the same program breaking after nodes become unnecessary against your current program. I find that it’s usually better to break an operation once it’s finished.

However, if the operation only just knows to process later on during operation then it might be necessary to allow the code to continue through to the end.

Ingeingeberg answered 18/4, 2013 at 11:21 Comment(0)

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.