Hadoop namenode rejecting connections!? What am I doing wrong?
Asked Answered
T

4

4

My configuration:

Server-class machine cluster (4 machines), each with RHEL, 8GB RAM, quad core processors. I setup machine 'B1' to be the master, rest of 'em as slaves (B2,B3,B4). Kicked off dfs-start.sh, name node came up on 53410 on B1. Rest of the nodes are not able to connect to B1 on 53410!

Here's what I did so far:

  1. Tried "telnet B1 53410" from B2, B3, B4 - Connection refused.
  2. Tried ssh to B1 from B2,B3,B4 and viceversa - no problem, works fine.
  3. Changed 53410 to 55410, restarted dfs, same issue - connection refused on this port too.
  4. Disabled firewall (iptables stop) on B1 - tried connecting from B2,B3,B4 - fails on telnet.
  5. Disabled firewall on all nodes, tried again, fails again to connect to 53410.
  6. Checked ftp was working from B2,B3,B4 to B1, stopped ftp service (service vsftpd stop), tried bringing up dfs on standard ftp port (21), namenode comes up, rest of the nodes are failing again. Can't even telnet to the ftp port from B2,B3,B4.
  7. "telnet localhost 53410" works fine on B1.

All nodes are reachable from one another and all /etc/hosts are setup with correct mapping for ip addresses. So, I am pretty much clueless at this point. Why on earth would the namenode reject connections - is there a setting in hadoop conf, that I should be aware of to allow external clients connect remotely on the namenode port?

Tehee answered 15/9, 2011 at 6:45 Comment(1)
could you post the error message?Spiller
T
2

fixed this.. it was an incorrect entry in my /etc/hosts. All nodes were connecting on loopback to master.

Tehee answered 15/9, 2011 at 13:21 Comment(1)
was your host set to 127.0.1.1 by chance? Seems to be a common hadoop config issue.Paulettepauley
F
4

Previous answers were not clear to me. Basically each hadoop servers (node or namenode) will create a server and listen on the IP associated with its lookup name.

Let say you have 3 box (box1, box2, box3), the /etc/hosts file should look like this:

127.0.0.1 localhost
192.168.10.1 box1
192.168.10.2 box2
192.168.10.3 box3

Instead of :

127.0.0.1 localhost box1
192.168.10.2 box2
192.168.10.3 box3
//(this is incorrect, box one will be listening exclusively on 127.0.0.1)
Frodi answered 22/2, 2012 at 20:10 Comment(2)
I had exactly this problem. I could telnet to the port on the master node, but not from the slave node. I knew it wasn't a firewall issue. Then I finally found this post and my troubles were over.Kipkipling
Note that the symptom of this problem was : File jobtracker.info could only be replicated on to 0 nodes instead of 1 . Adding the log message the post [#9987533 does not have failing to contact namenode as a possible cause of this error.Pylon
T
2

fixed this.. it was an incorrect entry in my /etc/hosts. All nodes were connecting on loopback to master.

Tehee answered 15/9, 2011 at 13:21 Comment(1)
was your host set to 127.0.1.1 by chance? Seems to be a common hadoop config issue.Paulettepauley
G
1

Try changing in conf/core-site.xml

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
</property>

from localhost to your machine name?

Glasser answered 15/9, 2011 at 8:17 Comment(0)
S
0

Set the DataNode with the right file permission:

chmod 755 /home/svenkata/hadoop/datanode/
Skip answered 18/2, 2014 at 3:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.