Pig keeps trying to connect to job history server (and fails)
Asked Answered
S

4

8

I'm running a Pig job that fails to connect to the Hadoop job history server.

The task (usually any task with GROUP BY) runs for a while and then it starts with a message like:

2015-04-21 19:05:22,825 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-04-21 19:05:26,721 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-04-21 19:05:29,721 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

It then continues for a while retrying the connection. Sometimes it precedes further with the job. Othertimes it throws this exception:

2015-04-21 19:05:55,822 [main] WARN  org.apache.pig.tools.pigstats.mapreduce.MRJobStats - Unable to get job counters
java.io.IOException: java.io.IOException: java.net.NoRouteToHostException: No Route to Host from  cluster-01/10.10.10.11 to 0.0.0.0:10020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
    at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getCounters(HadoopShims.java:132)
    at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addCounters(MRJobStats.java:284)
    at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:235)
    at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)

I found this question here but in my case the job history server is started. If I run netstat, I find:

tcp        0      0 0.0.0.0:10020           0.0.0.0:*               LISTEN      12073/java       off (0.00/0/0)

Where 12073 is ...

12073 pts/4    Sl     0:07 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dproc_historyserver -Xmx1000m -Djava.library.path=/data/hadoop/hadoop/lib -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/hadoop/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop/logs -Dhadoop.log.file=mapred-hadoop-historyserver-cluster-01.log -Dhadoop.root.logger=INFO,RFA -Dmapred.jobsummary.logger=INFO,JSA -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer

I tried opening the port 10200 in case it was a firewall issue:

ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:10020

... but no luck.

After a few minutes, some of the tasks just arbitrarily continue to the next part.

I'm using Hadoop 2.3 and Pig 0.14.

My question is:

1) What are the possible reasons why Pig cannot connect to the job history server (JHS) given that the JHS is running on the same port that Pig looks for it?

... or failing that ...

2) Is there any way to just tell Pig to stop trying to connect to the JHS and continue with the task?

Simian answered 21/4, 2015 at 22:46 Comment(0)
S
8

It seems that most Hadoop installation/configuration guides neglect to mention configuring the Job History Server. It seems that Pig, in particular, relies on this server. It also seems like the default (local) settings for the JHS won't work in a multi-node cluster.

The solution was to add the hostname of the server into the configuration in mapred-site.xml to make sure it could be accesses from the other machines. (In my version of the file, the lines had to be added as "new" ... there were no previous settings.)

<property>
  <name>mapreduce.jobhistory.address</name>
  <value>cm:10020</value>
  <description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
</property>

Then restart the job history server:

mr-jobhistory-daemon.sh stop historyserver
mr-jobhistory-daemon.sh start historyserver

If you get a bind exception (port in use), it means the stop didn't work. Either

  1. Use ps ax | grep -e JobHistory to get the process and kill it manually with kill -9 [pid]. Then call the start command above again. Or

  2. Use a different port in the configuration

Pig should pick up the new settings automatically. Run a Pig script and hope for the best.

Simian answered 22/4, 2015 at 15:12 Comment(0)
N
1

start history server in hadoop bin using the below command

bin$ ./mr-jobhistory-daemon.sh start historyserver

run pig using the below command

$pig
Nutter answered 19/5, 2017 at 12:14 Comment(0)
N
1

Config mapreduce.jobhistory.address in hadoop/etc/hadoop/mapred-site.xml, then:

mapred --daemon start
Nummary answered 15/5, 2019 at 10:57 Comment(0)
E
1

The solution was the History server was not running:

[user@vm9 sbin]$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/user/hadoop-2.7.7/logs/mapred-user-historyserver-vm9.out
[user@vm9 sbin]$ jps
5683 NameNode
6309 NodeManager
5974 SecondaryNameNode
8075 RunJar
6204 ResourceManager
8509 JobHistoryServer
5821 DataNode
8542 Jps
[user@vm9 sbin]$ 

Now pig can run properly and it will connect to the job history server and the dump command is working fine.

Etamine answered 6/11, 2019 at 20:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.