Hadoop PIG Max of Tuple
Asked Answered
A

2

6

How do I find the MAX of a tuple in Pig?

My code looks like this:

A,20
B,10
C,40
D,5

data = LOAD 'myData.txt' USING PigStorage(',') AS key, value;
all = GROUP data ALL;
maxKey = FOREACH all GENERATE MAX(data.value);
DUMP maxKey;

This returns 40, but I want the full key-value pair: C,40. Any ideas?

Apologete answered 27/12, 2012 at 14:6 Comment(0)
T
7

This works with Pig 0.10.0:

data = LOAD 'myData.txt' USING PigStorage(',') AS (key, value: long);
A = GROUP data ALL;
B = FOREACH A GENERATE MAX(data.value) AS val;
C = FILTER data BY value == (long)C.val;
DUMP C;
Truc answered 27/12, 2012 at 15:32 Comment(2)
Just a heads-up: while computing 'C' data should be filtered by B.val instead of C.valNela
I'd like to second @Zibi. The last declaration should be C = FILTER data BY value == (long)B.val;, not C = FILTER data BY value == (long)C.val;. Thanks for the solution @Truc Schmaljohann. That worked for me on Pig 0.10.Strangulation
D
3

Try this:

data = LOAD 'myData.txt' USING PigStorage(',') AS (key: chararray, value: int);

sorted = ORDER data BY value DESC;

limited = LIMIT sorted 1;

projected = FOREACH limited GENERATE key;

DUMP projected;
Defluxion answered 1/2, 2013 at 13:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.