Pig Order By Query
Asked Answered
F

1

5
grunt> dump jn;

(k1,k4,10)
(k1,k5,15)
(k2,k4,9)
(k3,k4,16)

grunt> jn = group jn by $1;
grunt> dump jn;


(k4,{(k1,k4,10),(k2,k4,9),(k3,k4,16)})
(k5,{(k1,k5,15)})

Now, from here I want the following output :

(k4,{(k3,k4,16),(k1,k4,10)})
(k5,{(k1,k5,15)})

Bascially, I want to sort on the numbers : 10,9,16 and select the top 2 for every row.
How do I do it?

Florist answered 3/2, 2012 at 7:18 Comment(0)
D
9

This is similar to this question and you could use a Nested FOREACH, e.g.:

A = LOAD 'data';
jn = group A by $1;
B = FOREACH jn {
  sorted = ORDER A by $2 ASC;
  lim = LIMIT sorted 2;
  GENERATE lim;
};
DUMP B;
Doreathadoreen answered 3/2, 2012 at 18:4 Comment(3)
You can also just use the TOP() function instead of the ORDER and LIMIT. Its in the piggybank for Pig < 0.8 and buildin in for >= 0.8Callaway
I have similar problem. I am using TOP() but in TOP(2) if input is {10,5,5,1,2} I expect my output to be {10,5,5} but its {10,5} actually How could I solve this?Decanal
Shouldn't the sorting order by DESC ? Because the top 2 are requested ?Unruffled

© 2022 - 2024 — McMap. All rights reserved.