Debugging in PIG UDF
Asked Answered
F

2

7

I am new to Hadoop/PIG. I have a basic question.

Do we have a Logging facility in PIG UDF? I have written a UDF which I need to verify I need to log certain statements to check the flow. Is there a Logging facility available? If yes where are the Pig logs present?

Fennec answered 12/6, 2012 at 21:17 Comment(0)
D
6

Assuming your UDF extends EvalFunc, you can use the Logger returned from EvalFunc.getLogger(). The log output should be visible in the associated Map / Reduce task that pig executes (if the job executes in more than a single stage then you'll have to pick through them to find the associated log entries).

Darciedarcy answered 12/6, 2012 at 23:45 Comment(13)
So the logs will end up in the Map Reduce Task log file? Could I specifically make my Log statements to a separate file?Fennec
Yes they will. You could, but then you'd have to go to each task tracker to view / collect them. I guess you could try and configure a remote logger (logging to a DB for example).Darciedarcy
I don't know for sure, but you could try the PigLogger - that might send things back to the client.Darciedarcy
I am sorry for such a naive question. But, I have used it in the following way: PigLogger pigLogger = this.getPigLogger(); pigLogger.warn(object,String,enum); Am I missing anything here? Or this is it for EvalFunc Logger. I cannot see anything other than warn. Dont we have debug,info,error?Fennec
I tried using this.getLogger.info(String); Should this pop up in the tasktarcker log? I cannot see any log for this.Fennec
the PigLogger is for aggregating warning messages, so yes it only has the warn method (It's a hack, but do the messages propagate to the client shell with PigLogger?). And yes, getLogger().info(String) should show up in the task logs (not the tasktracker logs, but the actual logs for the map or reduce task that is executing your UDF)Darciedarcy
I am sorry, but could you please tell me where should I be configuring these task logs? where can I find these? The only logs that I am aware of are the logs in logs directory of the hadoop/logs. I mean the datanode/tasktracker/namenode/secondarynamenode/jobtracker.Fennec
They are not in the standard log directory, they are attached to each job - find the Pig job in the job tracker web ui, and drill down to an individual map or reduce task, and then view the logsDarciedarcy
After clicking on the map/reduce, I will be redirected to : Task page, where all tasks are listed. When I click on that I see Task logs. Is this where they will be stored? Apologize for such minute details requirement.Fennec
books.google.com/… - Figure 5-4 shows the link you're looking for (Task Logs column, click the All link)Darciedarcy
I used both PigLogger pigLogger = this.getPigLogger(); pigLogger.warn(object,String,enum); and this.getLogger.info(String) And I cannot see any of them in the log file. I mean in the task logs. I clicked on ALL as told by you above(Thanks for that). What am I missing here? Please help.Fennec
Did the pig script execute more than a single MR job? i think you should try @ihadanny idea for local mode.Darciedarcy
Thanks Chris, I can see the Log statements now. Ran a small sample program and I can see the Logs in the Task logs. Thanks a lot.Fennec
U
2

perhaps obvious, but I advise debugging your UDF in local mode before deploying on a cluster/pseudocluster. This way, you can debug it right inside your IDE (eclipse in my case) which is easier than log-debugging.

Uncommon answered 18/6, 2012 at 8:51 Comment(2)
Is there a site or some steps that I can follow to get started on Eclipse. I mean pig on eclipse.Fennec
don't know about a site with steps, but it's simple enough: put the hadoop-core and pig dependencies in your maven pom, and then work with org.apache.pig.PigServer. try pigServer.registerScript(resource.getInputStream(), pigScriptParams, null); and then PigStats stats = pigServer.store("final_output", pigScriptParams.get("output_folder"), pigStoreFunc).getStatistics();Uncommon

© 2022 - 2024 — McMap. All rights reserved.