If while running a simple group by script in pig for large terabytes of data, the script got stuck at say 70%, then what can be done to diagnose the problem?
There are several method to debug a pig script. Simple method is step by step execution of a relation and then verify the result. These commands are useful to debug a pig script.
DUMP - Use the DUMP operator to run (execute) Pig Latin statements and display the results to your screen.
ILLUSTRATE - Use the ILLUSTRATE operator to review how data is transformed through a sequence of Pig Latin statements. ILLUSTRATE allows you to test your programs on small datasets and get faster turnaround times.
EXPLAIN - Use the EXPLAIN operator to review the logical, physical, and map reduce execution plans that are used to compute the specified relationship.
DESCRIBE - Use the DESCRIBE operator to view the schema of a relation. You can view outer relations as well as relations defined in a nested FOREACH statement.
More detail about these commands are available on this link. Also please refer developing and testing a pig script. to know more detail.
If you want to debug whole script during execution then you need to write below code at top of your script
-- set the debug mode on
SET debug 'on'
-- set a job name of your job.
SET job.name 'my job'
This will allow to run your script into debug mode. mode detail on about SET
command is available on this link
When you say the script is stuck at 70%, I assume you mean the MR job is 70% complete.
It's best to look at MR and YARN logs (and if needed, HDFS logs) at that point for more information about what MR/YARN is doing. Logs can be typically found under /var/log/hadoop-mapreduce and /var/log/hadoop-hdfs in Cloudera Manager managed clusters. You may need to examine logs from multiple nodes in the cluster where YARN NodeManagers are running.
In case your script is stuck with a Pig issue (i.e. issue in Pig code, not MR/HDFS code), it is useful to increase the log4j logging level in Pig: pig -d DEBUG is the command line option to set the logging level to DEBUG for example.
© 2022 - 2024 — McMap. All rights reserved.