How can I debug a pig script
Asked Answered
A

2

7

If while running a simple group by script in pig for large terabytes of data, the script got stuck at say 70%, then what can be done to diagnose the problem?

Astrograph answered 12/5, 2015 at 18:14 Comment(0)
B
13

There are several method to debug a pig script. Simple method is step by step execution of a relation and then verify the result. These commands are useful to debug a pig script.

DUMP - Use the DUMP operator to run (execute) Pig Latin statements and display the results to your screen.

ILLUSTRATE - Use the ILLUSTRATE operator to review how data is transformed through a sequence of Pig Latin statements. ILLUSTRATE allows you to test your programs on small datasets and get faster turnaround times.

EXPLAIN - Use the EXPLAIN operator to review the logical, physical, and map reduce execution plans that are used to compute the specified relationship.

DESCRIBE - Use the DESCRIBE operator to view the schema of a relation. You can view outer relations as well as relations defined in a nested FOREACH statement.

More detail about these commands are available on this link. Also please refer developing and testing a pig script. to know more detail.

If you want to debug whole script during execution then you need to write below code at top of your script

-- set the debug mode on 
SET debug 'on'
-- set a job name of your job.
SET job.name 'my job'

This will allow to run your script into debug mode. mode detail on about SET command is available on this link

Branny answered 12/5, 2015 at 18:39 Comment(0)
S
1

When you say the script is stuck at 70%, I assume you mean the MR job is 70% complete.

It's best to look at MR and YARN logs (and if needed, HDFS logs) at that point for more information about what MR/YARN is doing. Logs can be typically found under /var/log/hadoop-mapreduce and /var/log/hadoop-hdfs in Cloudera Manager managed clusters. You may need to examine logs from multiple nodes in the cluster where YARN NodeManagers are running.

In case your script is stuck with a Pig issue (i.e. issue in Pig code, not MR/HDFS code), it is useful to increase the log4j logging level in Pig: pig -d DEBUG is the command line option to set the logging level to DEBUG for example.

Sampan answered 25/2, 2016 at 19:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.