Hadoop - Produce multiple values for a single key
Asked Answered
E

1

6

I was able to successfully change the wordcount program in hadoop to suit my requirement. However, I have another situation where in I use the same key for 3 values. Let's say my input file is as below.

A Uppercase 1 firstnumber  I  romannumber a lowercase
B Uppercase 2 secondnumber II romannumber b lowercase

Currently in my map/reduce program, I am doing something like below. Here A is the key and 1 is the value.

A 1

I need my map reduce to perform something like below.

A 1 I a 

I can do them in 3 different programs like below and can produce the output.

A 1
A I
A a

However, I want them to do in a single program itself. Basically, from my map function I want to do this.

context.write(key,value1);
context.write(key,value2);
context.write(key,value3);

Is there any way I can do it in the same program rather than writing three different programs?

EDIT:

Let me provide a much more clearer example. I need to do something like below.

A uppercase 1 firstnumber  1.0 floatnumber str stringchecking
A uppercase 2 secondnumber 2.0 floatnumber ing stringchecking

My final output would be,

A 3 3.0 string

3 is the sum of two integers, 3.0 being sum of float numbers and string is the concatenation of two strings.

Execration answered 20/6, 2013 at 16:1 Comment(8)
What's wrong with doing what you just proposed? You can definitely emit multiple key/value pairs per map().Rici
Won't it get confused with the values in the reduce function? Won't it mix up the values together and produce some clumsy output?Execration
Also, what if my formats are different? For example, "a" is a character and "1" is an integer. So, should I set two mapOutputValueclass?Execration
Is it always going to be 3 values per key? You can create a custom Writable, or use an ArrayWritable to define a value that is composed of 3 different values.Rici
Yeah. It will be always 3 values per key.Execration
What if I need to do some calculations on these 3 values? In that case, how can I have it as a Writable?Execration
Let me put together an answer and I can edit it as needed. Just one question, will it always be in the order int,string,string?Rici
let us continue this discussion in chatExecration
R
15

First you'll need a composite writable for all three of your values.

public class CompositeWritable implements Writable {
    int val1 = 0;
    float val2 = 0;
    String val3 = "";

    public CompositeWritable() {}

    public CompositeWritable(int val1, float val2, String val3) {
        this.val1 = val1;
        this.val2 = val2;
        this.val3 = val3;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        val1 = in.readInt();
        val2 = in.readFloat();
        val3 = WritableUtils.readString(in);
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(val1);
        out.writeFloat(val2);
        WritableUtils.writeString(out, val3);
    }

    public void merge(CompositeWritable other) {
        this.val1 += other.val1;
        this.val2 += other.val2;
        this.val3 += other.val3;
    }

    @Override
    public String toString() {
        return this.val1 + "\t" + this.val2 + "\t" + this.val3;
    }
}

Then in your reduce you'll do something like this...

public void reduce(Text key, Iterable<CompositeWritable> values, Context ctx) throws IOException, InterruptedException{

    CompositeWritable out;

    for (CompositeWritable next : values)
    {
        out.merge(next);
    }

    ctx.write(key, out);
}

Your mapper will simply output one CompositeWritable per map.

I haven't tried to compile this, but the general idea is there.

Rici answered 20/6, 2013 at 16:46 Comment(8)
Just curious, can you use "Text" type for val3 instead of string?Taps
@Taps I don't see why not. It would just be val3.readFields(in); instead of val3 = WritableUtils.readString(in);. You can also use Text.readString(in) which returns a string.Rici
Great!, so DataInput & DataOutput only read/write integers & floats?Taps
@Taps Yes primitive types. You can read/write byte arrays which is how strings are stored. They are length encoded with the first 4 bytes (int) describing the length of the string and the number of bytes of the stream to read.Rici
@climbage could you please help in writing mapper for this helpful piece of code suggested by you. How would this output one CompositeWritable per map? I'm using something like context.write(new Text(line[0]), new CustomWritable(Integer.parseInt(line[2]),Float.parseFloat(line[4]),line[6])); in the Mapper but it seems to be incorrect as mapper would output data in the format K,V1 V2 V3 in this case which would disallow reducer to handle such values. Please help.Et
@Et The whole point of the CompositeWritable is so you can represent multiple values as a single value. What do you mean by seems? Did you try it?Rici
@climbage Yes I tried running a MRunit test for the mapper. It gives output in the format K,V1 V2 V3. Please assist. ThanksEt
what do you want it to do?Rici

© 2022 - 2024 — McMap. All rights reserved.