Pig non-aggregated warnings output location?
Asked Answered
O

2

8
Pig: 0.8.1-cdh3u2
Hadoop: 0.20.2-cdh3u0

Debugging FIELD_DISCARDED_TYPE_CONVERSION_FAILED warnings, but I can't seem to make individual warnings printed anywhere. Disabling aggregation via -w or aggregate.warnings=false switch removes the summary messages, BUT it does remove the actual warning too, so I can't see what type conversion failed.

There's nothing written in the pig's log for this run, AND there's no place I can locate the logs with the individual warnings. Did I miss anything obvious or it simply doesn't work?

Ophiology answered 14/12, 2011 at 19:58 Comment(4)
I look forward to the answer of this question. I typically find the record manually.Anchovy
I have already close to a 100 million records and adding 1/2 million every day with more than 300 columns in each row. And these are decimal numbers. Without tool support that's worse than looking for a needle in a field of haystacks.Ophiology
The only thing I can think of is to load your data as a chararray into pig, then write a UDF that tries to convert it. If an exception is thrown, return the item (don't return anything otherwise).Anchovy
Thanks for the suggestion. The case is a little different - I do filter our data noise and all the nonsense, so the data type is guaranteed to be a double pig type. Though pig still complains, and I want it to tell me about what value exactly (by using -w switch). But it doesn't seem to print the output I expect.Ophiology
S
0

Hadoop job logs are recorded locally on each compute node. Therefore you first you need to setup your hadoop cluster manager to collect the logfiles onto the distributed files system so that you can analyse them. If you use Hadoop-on-demand (http://hadoop.apache.org/docs/r0.17.0/hod.html) you should be able to do that by specifying something like:

log-destination-uri = hdfs://host123:45678/user/hod/logs

See the HOD documentation at http://hadoop.apache.org/docs/r0.17.0/hod_user_guide.html#Collecting+and+Viewing+Hadoop+Logs

After you have the logs on HDFS you can run a simple PIG query to find the offending conversion. Something like the following should do the trick:

a1= LOAD '*.log' USING PigStorage(']') ;
a2= FILTER a1  by ($1 MATCHES ' WARN.*Unable to interpret value.*');
dump a2;
Shellback answered 14/12, 2012 at 15:54 Comment(0)
S
0

It's difficult to find which data or value is causing issue, but at least you can find which column is creating this issue. Once you find the column you can use Dynamic Invoker which may help you in type conversion.

How to use Dynamic Invoker :
DEFINE ConvertToDouble InvokeForDouble('java.lang.Double.parseDouble', 'String');

ConvertToDouble(column_name);

Simson answered 15/5, 2015 at 16:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.