Convert "3" to 3 with PigLatin
Asked Answered
K

3

6

I read in a csv-file that contains fields with numbers like that: "3". Can I convert this fields from "3" to 3 with PigLatin? I need it to use the SUM() - Function.

Thanks for your help!

Kalliekallista answered 8/12, 2010 at 16:4 Comment(3)
Areway ouyay uresay it'sway otnay "eethray"? (I know what you are actually referring to, I'm just messing around :P)Embowed
I wonder if Pig Latin (the "language", not the Hadoop subproject) is primarily an American thing, or primarily something that native English speakers know about? Looks like Christoph is in Germany. Anyway, Christoph, Pig Latin is a play language in which English words are altered by moving the intial sound to the end of the word and adding "ay".Bookbindery
haha, yes I'm sure it's not "three" :-DKalliekallista
K
9

What about just removing the " with REPLACE?

For example:

data =
    LOAD 'data.txt' AS (num:CHARARRAY);

numbers =
    FOREACH data
    GENERATE
        (INT) REPLACE(num, '\\"', '');

Then you can GROUP and SUM.

One advantage is that you can cast the returned string directly to a number (no need to deal with bags). REGEX_EXTRACT could be used to do the same too.

Kesha answered 11/1, 2011 at 19:17 Comment(1)
I had to do a filter where I need to pick all the values which was greater than 1, I did this: inputData = FILTER inputData BY (INT) REPLACE((chararray)value#'val', '\\"', '')>1;. Looks good?Germanophobe
B
6

The TOKENIZE function will split a string on various characters considered to be word separators, one of which is a quote mark. So if you tokenize "3" and take the middle item, it should be just 3.

Bookbindery answered 8/12, 2010 at 16:13 Comment(0)
P
3

You could write a UDF that strips the quotes around it OR use JacobM's approach.

However, afterwards, you should cast the chararray '3' to an int: (int)$1 or (int)myvalue. This way you can use sum.

http://pig.apache.org/docs/r0.5.0/piglatin_reference.html#Cast+Operators

Pyriform answered 8/12, 2010 at 19:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.