what is the distinction between an 'outer bag' and an 'inner bag' in pigLatin?
Asked Answered
D

1

7

the manual/documentation uses the language of 'inner bag' and 'outer bag' extensively (say: http://pig.apache.org/docs/r0.11.1/basic.html ), and yet I haven't been able to pin out clearly the precise definition separating the terms.

e.g. all inherently interrelated:

  • If I give you a bag 'foo,' what would you need to know to label foo as an 'inner bag' vs. an 'outer bag'?
  • Is 'any bag' who is not the most outer-bag then ' an inner bag' ?
  • Are the labels of inner and outer always exclusive?
  • In PigLatin, are all 'bags' 'relations' -- or is only 'the most outer bag' a relation? (and inner bags are not relations)

to create a discussable example:

grunt> dump A;      
(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)


grunt> W1 = GROUP A   ALL;         
grunt> W2 = GROUP W1  ALL;
grunt> W3 = GROUP W2  ALL;
grunt> W4 = GROUP W3  ALL;

grunt> describe W4;
W4: {group: chararray,W3: {(group: chararray,W2: {(group: chararray,W1: {(group: chararray,A: {(f1: int,f2: int,f3: int)})})})}}


grunt> illustrate W4;
(1,2,3)
---------------------------------------------------
| A     | f1:int      | f2:int      | f3:int      | 
---------------------------------------------------
|       | 1           | 2           | 3           | 
|       | 8           | 3           | 4           | 
---------------------------------------------------
------------------------------------------------------------------------------------------------
| W1     | group:chararray      | A:bag{:tuple(f1:int,f2:int,f3:int)}                          | 
------------------------------------------------------------------------------------------------
|        | all                  | {(1, 2, 3), (8, 3, 4)}                                       | 
------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------
| W2     | group:chararray      | W1:bag{:tuple(group:chararray,A:bag{:tuple(f1:int,f2:int,f3:int)})}                                         | 
-----------------------------------------------------------------------------------------------------------------------------------------------
|        | all                  | {(all, {(1, 2, 3), (8, 3, 4)})}                                                                             | 
-----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| W3     | group:chararray      | W2:bag{:tuple(group:chararray,W1:bag{:tuple(group:chararray,A:bag{:tuple(f1:int,f2:int,f3:int)})})}                                                        | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|        | all                  | {(all, {(all, {(1, 2, 3), (8, 3, 4)})})}                                                                                                                   | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| W4     | group:chararray      | W3:bag{:tuple(group:chararray,W2:bag{:tuple(group:chararray,W1:bag{:tuple(group:chararray,A:bag{:tuple(f1:int,f2:int,f3:int)})})})}                                                                       | 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|        | all                  | {(all, {(all, {(all, {(1, 2, 3), (8, 3, 4)})})})}                                                                                                                                                         | 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

grunt> dump W4;
(all,{(all,{(all,{(all,{(1,2,3),(4,2,1),(8,3,4),(4,3,3)})})})})

amongst the bags - W1, W2, W3, W4 -- which is inner, which is outer?

Dodgson answered 8/10, 2013 at 1:27 Comment(0)
A
4

The outer bag is actually relation A. This is a little weird, but it'll become clear once you know what an inner bag is. Let's just look at W1, for readability, since having the nested bags does not change the answer.

Schema and output for W1:

W1: {group:chararray, A:bag{:tuple(f1:int,f2:int,f3:int)}}
(all,{(1, 2, 3), (8, 3, 4)})

We can see their is a field in W1 named A which is a bag. This is an inner bag because the bag is a field in the relation.

Remember that bags are just unordered sets of tuples, and we can see this is the output for W1. Now, look at the output of relation A:

(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)

Pig does not guarantee the order of these tuples (unless you ORDER or something). So, if you think about it, relation A is really just an unordered set of tuples. This is an outer bag.

You can find some examples of this here.

Arrowwood answered 8/10, 2013 at 9:13 Comment(1)
this helped, thx. I think I'm now getting it: the bag who is contained in no other bag is 'the outer bag'; and also happens to be 'the relation.' If it contains any bags, then each is 'an inner bag' (and not 'the outer bag').Dodgson

© 2022 - 2024 — McMap. All rights reserved.