Connection Error in Apache Pig

Asked 29/7, 2013 at 17:42 Answered 10/1, 2018 at 12:35

I am running Apache Pig .11.1 with Hadoop 2.0.5.

Most simple jobs that I run in Pig work perfectly fine.

However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get these connection errors:

2013-07-29 13:24:08,591 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 
013-07-29 11:57:29,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:30,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:31,422 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.IOException

The strange thing is that after these errors keeping appearing for about 2 minutes, they'll stop, and the correct output shows up at the bottom.

So Hadoop is running fine and computing the proper output. The problem is just these connection errors that keep popping up.

The LIMIT operator always gets this error. It happens on both MapReduce mode and local mode. The GROUP BY operator will work fine on small datasets.

One thing that I have noticed is that whenever this error appears, the job had created and ran multiple JAR files during the job. However, after a few minutes of these message popping up, the correct output finally appears.

Any suggestions on how to get rid of these messages?

Conlon answered 29/7, 2013 at 17:42 Comment(5)

Is your namenode local? If not, it's trying to access it at 0.0.0.0. It might not be picking up the namenode location from core-site.xml or you have a /etc/hosts file that's messed up. – Eyetooth 29/7, 2013 at 18:11

The namenode is local. core-site.xml and /etc/hosts seem to be all configured properly because most of my other Pig/Hadoop jobs work the way they should. Plus, the correct job output appears after the connection errors display for a few minutes. So I think the problem is something else. – Conlon 29/7, 2013 at 19:12

@AndyBotelho Probably it's worth to check the jobhistory server's logs – Fulvous 29/7, 2013 at 20:16

@Andy Botelho Can you let me know 1.) how many node cluster you are running 2.) What is Linux distro you are using 3.) What is the size of your data – Flagging 31/7, 2013 at 4:8

Yes the problem was that the job history server was not running. All we had to do to fix this problem was enter this command into the command prompt: mr-jobhistory-daemon.sh start historyserver This command starts up the job history server. Now if we enter 'jps', we can see that the JobHistoryServer is running and my Pig jobs no longer waste time trying to connect to the server. – Conlon 1/8, 2013 at 14:38

Yes the problem was that the job history server was not running.

All we had to do to fix this problem was enter this command into the command prompt:

mr-jobhistory-daemon.sh start historyserver

This command starts up the job history server. Now if we enter 'jps', we can see that the JobHistoryServer is running and my Pig jobs no longer waste time trying to connect to the server.

Conlon answered 1/8, 2013 at 14:41 Comment(6)

This was very helpful. Pig 0.13 Hadoop 2.3.0 - will not finish successfully at all, it'll just retry 10 times and then start all over again. – Accusal 24/3, 2014 at 19:5

This is a perfect answer ! – Deemster 23/1, 2015 at 14:5

This file is in the sbin directory in my version, FYI – Contractile 13/4, 2015 at 15:15

Go to the bin directory of hive and run this command /home/hadoop/hive/bin – Boat 22/2, 2017 at 7:58

worked for me, thanks a lot, For me mr-jobhistory-daemon.sh file was inside sbin directory – Relict 6/3, 2017 at 19:8

How to handle this scenario in the enterprise level cluster where we have history daemon running on another cluster. I am seeing the issue specifically for very long running jobs – Fredel 30/11, 2017 at 23:46

I think, this problem is related to hadoop mapred-site configuration issue. History Server runs default in localhost, so you need to add your configured host.

<property>
 <name>mapreduce.jobhistory.address</name>
 <value>host:port</value>
</property>

then fire this command -

mr-jobhistory-daemon.sh start historyserver

Prickly answered 21/7, 2014 at 11:25 Comment(0)

I am using Hadoop 2.6.0, so I had to do

$ mr-jobhistory-daemon.sh --config /usr/local/hadoop/etc start historyserver

where, /usr/local/hadoop/etc is my HADOOP_CONF_DIR.

Morganica answered 10/3, 2016 at 11:35 Comment(0)

I am using Hadoop 2.2.0. This problem was due to The History server was not running. I had to start the history server. I used following command to start history server:

[root@localhost ~]$ /usr/lib/hadoop-2.2.0/sbin/mr-jobhistory-daemon.sh start historyserver

Lagniappe answered 10/1, 2018 at 12:35 Comment(0)

Recommended topics

Hot tags