Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)
Asked Answered
D

9

28

I have 3 data nodes running, while running a job i am getting the following given below error ,

java.io.IOException: File /user/ashsshar/olhcache/loaderMap9b663bd9 could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1325)

This error mainly comes when our DataNode instances have ran out of space or if DataNodes are not running. I tried restarting the DataNodes but still getting the same error.

dfsadmin -reports at my cluster nodes clearly shows a lots of space is available.

I am not sure why this is happending.

Drinkable answered 22/3, 2013 at 13:29 Comment(3)
Do you have the right file permissions on this file?Mitrewort
Make sure the dfs.datanode.address port address is open. I had a similar error happen to me and it turned out that out of the several ports I needed to open, I neglected 50010.Soapwort
Thanks @MarkW, that was my mistake too. Care to add this as an answer?Argentine
M
12

1.Stop all Hadoop daemons

for x in `cd /etc/init.d ; ls hadoop*` ; do sudo service $x stop ; done

2.Remove all files from /var/lib/hadoop-hdfs/cache/hdfs/dfs/name

Eg: devan@Devan-PC:~$ sudo rm -r /var/lib/hadoop-hdfs/cache/

3.Format Namenode

sudo -u hdfs hdfs namenode -format

4.Start all Hadoop daemons

for x in `cd /etc/init.d ; ls hadoop*` ; do sudo service $x start ; done

Stop All Hadoop Service

Madge answered 5/7, 2014 at 7:6 Comment(2)
I run into the same problem, would you please explain why should I do this to solve the problem and if the data would be lost?Hettiehetty
This ain't no solution. -1Reserpine
H
15

I had the same issue, I was running very low on disk space. Freeing up disk solved it.

Hatteras answered 6/4, 2015 at 18:4 Comment(1)
Thanks for this! My one-node system was misconfigured to run from an incorrect partition and it simply didn't have the capcity in it to hold yet another file.Holeproof
M
12

1.Stop all Hadoop daemons

for x in `cd /etc/init.d ; ls hadoop*` ; do sudo service $x stop ; done

2.Remove all files from /var/lib/hadoop-hdfs/cache/hdfs/dfs/name

Eg: devan@Devan-PC:~$ sudo rm -r /var/lib/hadoop-hdfs/cache/

3.Format Namenode

sudo -u hdfs hdfs namenode -format

4.Start all Hadoop daemons

for x in `cd /etc/init.d ; ls hadoop*` ; do sudo service $x start ; done

Stop All Hadoop Service

Madge answered 5/7, 2014 at 7:6 Comment(2)
I run into the same problem, would you please explain why should I do this to solve the problem and if the data would be lost?Hettiehetty
This ain't no solution. -1Reserpine
F
2
  1. Check whether your DataNode is running,use the command:jps.
  2. If it is not running wait sometime and retry.
  3. If it is running, I think you have to re-format your DataNode.
Furuncle answered 14/12, 2013 at 8:59 Comment(0)
C
1

What I usually do when this happens is that I go to tmp/hadoop-username/dfs/ directory and manually delete the data and name folders (assuming you are running in a Linux environment).

Then format the dfs by calling bin/hadoop namenode -format (make sure that you answer with a capital Y when you are asked whether you want to format; if you are not asked, then re-run the command again).

You can then start hadoop again by calling bin/start-all.sh

Chancey answered 22/3, 2013 at 13:46 Comment(1)
This is the only solution to the OP's question that worked for me. I was trying to follow the example in link on my Macbook osx mountain lion 10.8.5, but could not see the datanode being generated after start-all.sh, until I deleted the data and name and namesecondary folders as mentioned above. Thank you!Chagall
E
1

I had this problem and I solved it as bellow:

  1. Find where are your datanode and namenode metadata/data saved; if you cannot find it, simply do this command on mac to find it (there are located in a folder called "tmp")

    find /usr/local/Cellar/ -name "tmp";

    find command is like this: find <"directory"> -name <"any string clue for that directory or file">

  2. After finding that file, cd into it. /usr/local/Cellar//hadoop/hdfs/tmp

    then cd to dfs

    then using -ls command see that data and name directories are located there.

  3. Using remove command, remove them both:

    rm -R data . and rm -R name

  4. Go to bin folder and end everything if you already have not done it:

    sbin/end-dfs.sh

  5. Exit from the server or localhost.

  6. Log into the server again: ssh <"server name">

  7. start the dfs:

    sbin/start-dfs.sh

  8. Format the namenode for being sure:

    bin/hdfs namenode -format

  9. you can now use hdfs commands to upload your data into dfs and run MapReduce jobs.

Edea answered 20/9, 2017 at 3:39 Comment(0)
C
1

In my case, this issue was resolved by opening the firewall port on 50010 on the datanodes.

Corregidor answered 28/2, 2019 at 10:27 Comment(2)
can you be more specific , which protocole should I use , and the name of the programme ...Snodgrass
Thanks. I got the same error message as the OP, while all my datanodes are healthy. It turned out that the master could not connect to those datanodes on port 50010.Bogy
P
0

Very Simple fix for the same issue on Windows 8.1
I used Windows 8.1 OS and Hadoop 2.7.2, Did the following things to overcome this issue.

  1. When I started the hdfs namenode -format, I noticed there is a lock in my directory. please refer the figure below.
    HadoopNameNode
  2. Once I deleted the full folder as shown below, and again I did the hdfs namenode -format. Folder location
    Full Folder Delete
  3. After performing above two steps, I could successfully place my required files in HDFS system. I used start-all.cmd command to start yarn and namenode.
Panjandrum answered 17/6, 2016 at 13:58 Comment(1)
could please explain more your stepsSnodgrass
C
0

In my case the dfs.datanode.du.reserved in the hdfs-site.xml was too large as well as the name node giving out the private ip address of the data node so it could not route properly. The solution to the private ip was to switch the docker container to host network and place the hostname in the host properties of the config files.

This goes over other possibilities Stack Question on replication issue

Club answered 15/1, 2021 at 16:51 Comment(0)
L
0

The answers saying to open port numbered 50010 might only apply to an older version of Hadoop. I'm using Hadoop version 3.3.4 and the port you should open to fix this error is 9866. You need to open this port on all of the Hadoop dataNodes. Here's a code snippet you can use in RHEL 8:

sudo firewall-cmd --zone=public --permanent --add-port 9866/tcp
sudo firewall-cmd --reload
Lannielanning answered 26/10, 2022 at 2:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.