Check if an element is present in a bag?
Asked Answered
T

1

8

How can I check in , if a bag contains an element?

Example : In a bag of chararray, how can I check if a token is present?

Thriller answered 15/10, 2014 at 19:9 Comment(0)
R
5

In Apache Pig you can use statements nested in FOREACH see Pig Basics. Here is example from the documentation: A is a bag in B.

X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE COUNT (S.$0);
}

Instead of COUNT you can use IsEmpty and ?: operator

X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE (IsEmpty(S.$0)) ? 'xyz NOT PRESENT' : 'xyz PRESENT') as present, B;
}

Or only to leave the bags that contain the data:

X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE B, S;
}
F = FILTER X BY not IsEmpty(S);
R = FOREACH F GENERATE B;

This will avoid costly join to itself, as extra joins are extra Map Reduce jobs.

Rectrix answered 17/10, 2014 at 6:30 Comment(1)
In PIG 0.15 you can't project B from nested expression. This did not work for me: X = FOREACH B { S = FILTER A BY 'xyz'; GENERATE B, S; }Phagocyte

© 2022 - 2024 — McMap. All rights reserved.