Filter a string on the basis of a word
Asked Answered
D

1

7

I have a pig job where in I need to filter the data by finding a word in it,

Here is the snippet

A = LOAD '/home/user/filename' USING PigStorage(',');
B = FOREACH A GENERATE $27,$38;
C = FILTER B BY ( $1 ==  '*Word*');
STORE C INTO '/home/user/out1' USING PigStorage();

The error is in the 3rd line while finding C, I have also tried using

C = FILTER B BY $1 MATCHES '*WORD*'  

Also

C = FILTER B BY $1 MATCHES '\\w+WORD\\w+'  
Dashpot answered 16/9, 2011 at 13:58 Comment(1)
. Any character (may or may not match line terminators) * zero or more times docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/…Seif
C
16

MATCHES uses regular expressions. You should do ... MATCHES '.*WORD.*' instead.

These is an example here finding the word 'apache'.

Congregationalism answered 17/9, 2011 at 2:21 Comment(1)
that is the correct syntax, but I wonder why do we need to add the .* around the word. Why is MATCHES 'WORD' not working?Specter

© 2022 - 2024 — McMap. All rights reserved.