If one have data like those:
A = LOAD 'data' AS (a1:int,a2:int,a3:int);
DUMP A;
(1,2,3)
(4,2,1)
And then a cross-join is done on A, A:
B = CROSS A, A;
DUMP B;
(1,2,3)
(4,2,1)
Why is second A optimized out from the query?
info: pig version 0.11
== UPDATE ==
If I sort A like:
C = ORDER A BY a1;
D = CROSS A, C;
It will give a correct cross-join.