group concat equivalent in pig?
Asked Answered
J

1

5

Trying to get this done on Pig. (Looking for the group_concat() equivalent of MySQL)

In my table, for example, I have this: (3fields- userid, clickcount,pagenumber)

155 | 2 | 12
155 | 3 | 133
155 | 1 | 144
156 | 6 | 1
156 | 7 | 5

The desired output is:

155| 2,3,1 | 12,133,144

156| 6,7 | 1,5

How can I achieve this on PIG?

Jetsam answered 13/9, 2013 at 7:2 Comment(0)
K
9
grouped = GROUP table BY userid;
   X = FOREACH grouped GENERATE group as userid, 
                                table.clickcount as clicksbag, 
                                table.pagenumber as pagenumberbag;

Now X will be:

{(155,{(2),(3),(1)},{(12),(133),(144)},
 (156,{(6),(7)},{(1),(5)}}

Now you need to use the builtin UDF BagToTuple:

output = FOREACH X GENERATE userid, 
                            BagToTuple(clickbag) as clickcounts, 
                            BagToTuple(pagenumberbag) as pagenumbers;

output should now contain what you want. You can merge the output step into the merge step as well:

    output = FOREACH grouped GENERATE group as userid, 
                     BagToTuple(table.clickcount) as clickcounts, 
                     BagToTuple(table.pagenumber) as pagenumbers;
Kasandrakasevich answered 13/9, 2013 at 8:59 Comment(2)
I was using verstion .10. I believe the BagToTuple was available from version .11 onwards . Thanks Raze2dust.Jetsam
So did you upgrade or implemented BagToTuple yourself?Kasandrakasevich

© 2022 - 2024 — McMap. All rights reserved.