There are 0 datanode(s) running and no node(s) are excluded in this operation
Asked Answered
A

14

38

I have set up a multi node Hadoop Cluster. The NameNode and Secondary namenode runs on the same machine and the cluster has only one Datanode. All the nodes are configured on Amazon EC2 machines.

Following are the configuration files on the master node:

masters
54.68.218.192 (public IP of the master node)

slaves
54.68.169.62 (public IP of the slave node)

core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

Now are the configuration files on the datanode:

core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://54.68.218.192:10001</value>
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>54.68.218.192:10002</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

the jps run on the Namenode give the following:

5696 NameNode
6504 Jps
5905 SecondaryNameNode
6040 ResourceManager

and jps on datanode:

2883 DataNode
3496 Jps
3381 NodeManager

which to me seems right.

Now when I try to run a put command:

hadoop fs -put count_inputfile /test/input/

It gives me the following error:

put: File /count_inputfile._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

The logs on the datanode says the following:

hadoop-datanode log
INFO org.apache.hadoop.ipc.Client: Retrying connect to server:      54.68.218.192/54.68.218.192:10001. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

yarn-nodemanager log:

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

The web UI of node manager(50070) shows that there are 0 live nodes and 0 dead nodes and the dfs used is 100%

I have also disabled IPV6.

On a few websites I found out that I should also edit the /etc/hosts file. I have also edited them and they look like this:

127.0.0.1 localhost
172.31.25.151 ip-172-31-25-151.us-west-2.compute.internal
172.31.25.152 ip-172-31-25-152.us-west-2.compute.internal

Why I am still geting the error?

Ambiguous answered 24/10, 2014 at 9:47 Comment(2)
fs.default.name is an old name for the settings - use fs.defaultFS instead. Also try using the master node's name or IP address instead of localhost.Bracey
Installed telnet on master node and tried to telnet slave on port 9866.. it failed. Later added port 9866 on firewall and this was solved for me. >telnet hadoop-slave-01 9866 >sudo firewall-cmd --add-port=9866/tcp --permanent >sudo firewall-cmd reloadPoulos
E
40

Two things worked for me,

STEP 1 : stop hadoop and clean temp files from hduser

sudo rm -R /tmp/*

also, you may need to delete and recreate /app/hadoop/tmp (mostly when I change hadoop version from 2.2.0 to 2.7.0)

sudo rm -r /app/hadoop/tmp
sudo mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp

STEP 2: format namenode

hdfs namenode -format

Now, I can see DataNode

hduser@prayagupd:~$ jps
19135 NameNode
20497 Jps
19477 DataNode
20447 NodeManager
19902 SecondaryNameNode
20106 ResourceManager
Expectant answered 1/5, 2015 at 3:58 Comment(1)
@prayagupd: I have followed all your steps and jps result shows datanode. BUT the problem is put command shows the same error as the OP. Also namenode web UI shows Live Nodes count 0 (zero). Any help is much appreciated. ThanksDelenadeleon
S
21

I had the same problem after improper shutdown of the node. Also checked in the UI the datanode is not listed.

Now it's working after deleting the files from datanode folder and restarting services.

stop-all.sh

rm -rf /usr/local/hadoop_store/hdfs/datanode/*

start-all.sh

Schalles answered 24/1, 2016 at 5:51 Comment(2)
worked for me. My folder was different though. rm -rf /tmp/hadoop-anshul/dfs/data/*Canine
Thanks ! This worked for me. Although my path was also different!Aerophagia
D
6

@Learner,
I had this problem of datanodes not shown in the Namenode's web UI. Solved it by these steps in Hadoop 2.4.1.

do this for all the nodes (master and slaves)

1. remove all temporary files ( by default in /tmp) - sudo rm -R /tmp/*.
2. Now try connecting to all nodes through ssh by using ssh username@host and add keys in your master using ssh-copy-id -i ~/.ssh/id_rsa.pub username@host to give unrestricted access of slaves to the master (not doing so might be the problem for refusing connections).
3. Format the namenode using hadoop namenode -format and try restarting the daemons.

Diapophysis answered 30/10, 2014 at 17:40 Comment(0)
P
5

On my situation, firewalld service was running. It was on default configuration. And it don't allow the communication between nodes. My hadoop cluster was a test cluster. Because of this, I stopped the service. If your servers are in production, you should allow hadoop ports on firewalld, instead of

service firewalld stop
chkconfig firewalld off
Placeeda answered 8/1, 2017 at 19:50 Comment(0)
M
3

I had obtained the same error, in my case it was due to a bad configuration of the hosts files, first I have modified the hosts file of the master node adding the IPs of the slaves and also in each DataNode, I have modified the Hosts files to indicate the IPs of the NameNode and the rest of slaves.

Same think like this

adilazh1@master:~$ sudo cat /etc/hosts
[sudo] contraseña para adilazh1:
127.0.0.1       localhost
192.168.56.100  master

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.56.101  slave1
192.168.56.102  slave2

Example slave1's hosts file

127.0.0.1       localhost
192.168.56.101  slave1

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.56.100  master
192.168.56.102  slave2
Magog answered 13/4, 2020 at 13:27 Comment(0)
T
2

I had same error. I had not permission to hdfs file system. So I give permission to my user:

chmod 777 /usr/local/hadoop_store/hdfs/namenode
chmod 777 /usr/local/hadoop_store/hdfs/datanode
Tetanic answered 3/6, 2017 at 16:13 Comment(0)
G
2

Value for property {fs.default.name} in core-site.xml, on both the master and slave machine, must point to master machine. So it will be something like this:

<property>
     <name>fs.default.name</name>
     <value>hdfs://master:9000</value>
</property>

where master is the hostname in /etc/hosts file pointing to the master node.

Grewitz answered 23/10, 2017 at 20:3 Comment(0)
P
1

It is probably because the cluster ID of the datanodes and the namenodes or node manager do not match. The cluster ID can be seen in the VERSION file found in both the namenode and datanodes .

This happens when you format your namenode and then restart the cluster but the datanodes still try connecting using the previous clusterID . to be successfully connected you need the correct IP address and also a matching cluster ID on the nodes.

So try reformatting the namenode and datanodes or just configure the datanodes and namenode on newly created folders.

That should solve your problem.

Deleting the files from the current datanodes folder will also remove the old VERSION file and will request for a new VERSION file while reconnecting with the namenode.

Example you datanode directory in the configuration is /hadoop2/datanode

$ rm -rvf /hadoop2/datanode/*

And then restart services If you do reformat your namenode do it before this step. Each time you reformat your namenode it gets a new ID and that ID is randomly generated and will not match the old ID in your datanodes

So every time follow this sequence

if you Format namenode then Delete the contents of datanode directory OR configure datanode on newly created directory Then start your namenode and the datanodes

Parachute answered 25/1, 2016 at 4:22 Comment(0)
B
0

In my situation, I was missing the necessary properties inside hdfs-site.xml (Hadoop 3.0.0) installed using HomeBrew on MacOS. (The file:/// is not a typo.)

<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///usr/local/Cellar/hadoop/hdfs/namenode</value>
</property>

<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///usr/local/Cellar/hadoop/hdfs/datanode</value>
</property>
Blabber answered 10/5, 2018 at 14:57 Comment(0)
A
0

@mustafacanturk solution, disabling the firewall worked for me. I thought that datanodes started because they appeared up when running jps but when trying to upload files i was receiving the message "0 nodes running". In fact neither the web interface to (http://nn1:50070) was working because of the firewall. I disabled the firewall when installing hadoop but for some reason it was up. Neverthelsess sometimes cleaning or recreating the temp folders (hadoop.tmp.dir) or even dfs.data.dir and dfs.namenode.name.dir folders and reformating the name server was the solution.

Archuleta answered 12/12, 2018 at 6:23 Comment(0)
P
0

I have face same issue in my single node cluster.

I have done below steps in order to resolve this issue:
1. Checked datanode log under logs directory and found that namenode clusterId and datanode clusterId are different.
2. Make empty datanode directory:
rm -rvf /hadoop/hdfs/datanode/*
3. stop-all.sh
4. hdfs namenode -format
5. start-all.sh
6. jps
27200 NodeManager
26129 NameNode
26595 SecondaryNameNode
5539 GradleDaemon
2355 Main
2693 GradleDaemon
27389 Jps
26846 ResourceManager
26334 DataNode

This works for me.

Pridemore answered 11/8, 2019 at 15:31 Comment(0)
K
-1

Have you tried clearing the /tmp folder.

Before cleanup a datanode did not come up

86528 SecondaryNameNode
87719 Jps
86198 NameNode
78968 RunJar
79515 RunJar
63964 RunNiFi
63981 NiFi

After cleanup

sudo rm -rf /tmp/*

It worked for me

89200 Jps
88859 DataNode
Kero answered 30/10, 2018 at 17:21 Comment(0)
N
-2

Maybe the service of firewall hasn't been stopped.

Nan answered 23/3, 2019 at 12:40 Comment(1)
This is insufficient as an answer.Abuttal
S
-4

1) Stop all services first using command stop-all.sh

2) Delete all files inside datanode rm -rf /usr/local/hadoop_store/hdfs/datanode/*

3) Then start all services using command start-all.sh

You can check if all of your services are running using jps command

Hope this should work!!!

Stagemanage answered 9/2, 2018 at 11:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.