I read in a csv-file that contains fields with numbers like that: "3". Can I convert this fields from "3" to 3 with PigLatin? I need it to use the SUM() - Function.
Thanks for your help!
I read in a csv-file that contains fields with numbers like that: "3". Can I convert this fields from "3" to 3 with PigLatin? I need it to use the SUM() - Function.
Thanks for your help!
What about just removing the "
with REPLACE?
For example:
data =
LOAD 'data.txt' AS (num:CHARARRAY);
numbers =
FOREACH data
GENERATE
(INT) REPLACE(num, '\\"', '');
Then you can GROUP
and SUM
.
One advantage is that you can cast the returned string directly to a number (no need to deal with bags). REGEX_EXTRACT could be used to do the same too.
inputData = FILTER inputData BY (INT) REPLACE((chararray)value#'val', '\\"', '')>1;
. Looks good? –
Germanophobe The TOKENIZE
function will split a string on various characters considered to be word separators, one of which is a quote mark. So if you tokenize "3" and take the middle item, it should be just 3.
You could write a UDF that strips the quotes around it OR use JacobM's approach.
However, afterwards, you should cast the chararray '3'
to an int
: (int)$1
or (int)myvalue
. This way you can use sum
.
http://pig.apache.org/docs/r0.5.0/piglatin_reference.html#Cast+Operators
© 2022 - 2024 — McMap. All rights reserved.