Using IN clause with PIG FILTER
Asked Answered
P

6

5

Does PIG support IN clause?

filtered = FILTER bba BY reason not in ('a','b','c','d');

or should i split it up into multiple OR's?

Thanks!

Presbyterian answered 24/8, 2011 at 16:45 Comment(0)
C
2

I didn't find it in any of the samples in the documentation.

You can get by using AND/OR/NOT

Cantrip answered 24/8, 2011 at 16:58 Comment(3)
@Presbyterian BTW: I did a project on Pig Latin in College and the documentation -at least at the time- was HORRIBLE. I hope is better now.Cantrip
not fully into pig, was just looking to correct some existing code. so i dont know, yet :)Presbyterian
Link takes you to a 404 page. Well played! :)Albertinealbertite
B
6

You can use below udf from Apache DataFu instead. This will help you to avoid writing lot of OR.

https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/util/InUDF.java

Barfly answered 21/3, 2014 at 18:20 Comment(1)
Although PIG now has an inbuilt In operator, it can lead to stack overflow at compilation for large sets. DataFu InUDF seems to be more robustAscribe
A
3

Pig 0.12 added In operator http://www.edureka.co/blog/operators-in-apache-pig-diagnostic-operators/ see bottom of page..release notes. Haven't located it in official docs (apart from bare mention in release notes)

Ascribe answered 9/9, 2014 at 7:0 Comment(1)
Here's a link to it in the official docsMargerymarget
C
2

I didn't find it in any of the samples in the documentation.

You can get by using AND/OR/NOT

Cantrip answered 24/8, 2011 at 16:58 Comment(3)
@Presbyterian BTW: I did a project on Pig Latin in College and the documentation -at least at the time- was HORRIBLE. I hope is better now.Cantrip
not fully into pig, was just looking to correct some existing code. so i dont know, yet :)Presbyterian
Link takes you to a 404 page. Well played! :)Albertinealbertite
P
1

No, Pig doesn't support IN Clause. I had a similar situation. Though you can use AND operator and filter keyword as a work around. like

A= LOAD 'source.txt' AS (user:chararray, age:chararray);

B= FILTER A BY ($1 matches 'tapan') AND ($1 matches 'superman');

However, if the number of filtering required is huge. Then, probably, you can just create a relation that contains all these keywords and do a join to filter wherever the occurrence matches. Hope this helps.

Plicate answered 15/5, 2012 at 8:43 Comment(1)
Wouldn't this filter out everything, since you are looking to get the first field to match Tapan and to match superman at the same time.Hazlett
T
1

We can use IN clause as follows:

A = FILTER alias_name BY col_name IN (val1, val2,...,valn);

DUMP A;
Tisbee answered 12/2, 2017 at 10:15 Comment(0)
D
1

you can do this likes:

X = FILTER bba BY NOT reason IN ('a','b','c','d');

more info

Disheveled answered 7/12, 2017 at 8:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.