What is the difference between GROUP and COGROUP in PIG?
Asked Answered
A

1

6

I understood Group didn't work with multiple tuples and hence we had COGROUP in PIG. However, while checking today the GROUP command works for me. I am using PIG-0.12.0. My commands and outputs are as follows.

grunt> grpvar = GROUP C by $2, B by $2;
grunt> cogrpvar = COGROUP C by $2, B by $2;
grunt> describe grpvar;

grpvar: {group: chararray,C: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)},B: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)}}

grunt> describe cogrpvar;

cogrpvar: {group: chararray,C: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)},B: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)}}

Is GROUP expected to work like this? What is the difference between GROUP and COGROUP them?

Allbee answered 30/7, 2014 at 4:9 Comment(0)
R
8

Yes group is supposed to work like that !

According to the documentation ( http://pig.apache.org/docs/r0.12.0/basic.html#group ) :

Note: The GROUP and COGROUP operators are identical. Both operators work with one or more relations. For readability GROUP is used in statements involving one relation and COGROUP is used in statements involving two or more relations. You can COGROUP up to but no more than 127 relations at a time.

So it is just for readability, no differences between the two.

Rational answered 30/7, 2014 at 7:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.