Datanode process not running in Hadoop
Asked Answered
J

29

47

I set up and configured a multi-node Hadoop cluster using this tutorial.

When I type in the start-all.sh command, it shows all the processes initializing properly as follows:

starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-namenode-jawwadtest1.out
jawwadtest1: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest1.out
jawwadtest2: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest2.out
jawwadtest1: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-secondarynamenode-jawwadtest1.out
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-jobtracker-jawwadtest1.out
jawwadtest1: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest1.out
jawwadtest2: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest2.out

However, when I type the jps command, I get the following output:

31057 NameNode
4001 RunJar
6182 RunJar
31328 SecondaryNameNode
31411 JobTracker
32119 Jps
31560 TaskTracker

As you can see, there's no datanode process running. I tried configuring a single-node cluster but got the same problem. Would anyone have any idea what could be going wrong here? Are there any configuration files that are not mentioned in the tutorial or I may have looked over? I am new to Hadoop and am kinda lost and any help would be greatly appreciated.

EDIT: hadoop-root-datanode-jawwadtest1.log:

STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.3
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/$
************************************************************/
2012-08-09 23:07:30,717 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loa$
2012-08-09 23:07:30,734 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:30,735 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:30,736 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:31,018 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:31,024 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:32,366 INFO org.apache.hadoop.ipc.Client: Retrying connect to $
2012-08-09 23:07:37,949 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: $
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(Data$
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransition$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNo$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNod$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode($
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataN$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1$

2012-08-09 23:07:37,951 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: S$
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at jawwadtest1/198.101.220.90
************************************************************/
Jabot answered 9/8, 2012 at 17:57 Comment(8)
On Which node is this jps run ?Funda
Can you look in the /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest1.log file on the jawwadtest1 node and see if there are any error messages you can post back?Fictitious
Razvan, this is all on the master node.Jabot
Chris, Oops my bad! I've added them to the end of the question.Jabot
Did you eventually solve this?Northrup
I've got the same issue right now, has this been solved?Heliostat
Yes, sorry I forgot I hadn't selected a correct answerJabot
I had a similar problem which was being caused because I had installed Hadoop as root. Using chown to bring all the files back to my user, and then running the steps below in the accepted answer, fixed this issue for me.Gen
P
84

You need to do something like this:

  • bin/stop-all.sh (or stop-dfs.sh and stop-yarn.sh in the 2.x serie)
  • rm -Rf /app/tmp/hadoop-your-username/*
  • bin/hadoop namenode -format (or hdfs in the 2.x serie)

the solution was taken from: http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-troubleshooting/. Basically it consists in restarting from scratch, so make sure you won't loose data by formating the hdfs.

Pupillary answered 12/8, 2012 at 17:29 Comment(5)
Indeed. That worked on multi noded! I deleted my tmp directory (as set in core-site.xml) in ALL nodes(master/slaves), formmated all nodes, and worked like charm!Ela
@gilts In your example can i assume that you changed the hadoop.tmp.dir value in core-site.xml to /app/tmp? In other other words, it's a good pattern to point hadoop.tmp.dir` to myproj/tmp?Seal
Though the location of where the files are different on homebrew installtion, this approach worked for me. Thank you.Tourneur
in mac the default temp directory would be at /tmp/hadoop-{username} (2nd step) if you dont have hadoop.tmp.dir configuredKt
I ran into this problem after a reboot. In my case it was enough to execute bin/stop-yarn.sh, bin/stop-dfs.sh and then bin/start-dfs.sh, bin/start-yarn.sh.Wouldbe
A
24

I ran into the same issue. I have created a hdfs folder '/home/username/hdfs' with sub-directories name, data, and tmp which were referenced in config xml files of hadoop/conf.

When I started hadoop and did jps, I couldn't find datanode so I tried to manually start datanode using bin/hadoop datanode. Then I realized from error message that it has permissions issue accessing the dfs.data.dir=/home/username/hdfs/data/ which was referenced in one of the hadoop config files. All I had to do was stop hadoop, delete the contents of /home/username/hdfs/tmp/* directory and then try this command - chmod -R 755 /home/username/hdfs/ and then start hadoop. I could find the datanode!

Ava answered 11/10, 2013 at 19:5 Comment(1)
I had the same issue. For some reason after switching from 1 node to 3, the datanode folder became 700 instead of 755Compensatory
C
18

I faced similar issue while running the datanode. The following steps were useful.

  1. In [hadoop_directory]/sbin directory use ./stop-all.sh to stop all the running services.
  2. Remove the tmp dir using rm -r [hadoop_directory]/tmp (The path configured in [hadoop_directory]/etc/hadoop/core-site.xml)
  3. sudo mkdir [hadoop_directory]/tmp (Make a new tmp directory)
  4. Go to */hadoop_store/hdfs directory where you have created namenode and datanode as sub-directories. (The paths configured in [hadoop_directory]/etc/hadoop/hdfs-site.xml). Use

    rm -r namenode
    
    rm -r datanode
    
  5. In */hadoop_store/hdfs directory use

    sudo mkdir namenode
    
    sudo mkdir datanode
    

In case of permission issue, use

   chmod -R 755 namenode 

   chmod -R 755 datanode
  1. In [hadoop_directory]/bin use

     hadoop namenode -format (To format your namenode)
    
  2. In [hadoop_directory]/sbin directory use ./start-all.sh or ./start-dfs.sh to start the services.
  3. Use jps to check the services running.
Chryselephantine answered 23/1, 2017 at 14:42 Comment(3)
Worked for me too! used mkdir instead of sudo mkdirTransverse
Thanks, Works for me. AlsoCarnap
Using chmod -R 755 datanode made the permissions on my datanode match my namenode, which I thought was wonky, and fixed things for me. In addition to Anirban's instructions, don't forget to check contents and permissions for both your tmpdata and your dfsdata directories, too. Assuming this is a fresh install that isn't quite working yet and you don't have any data to lose, essentially, for all the directories you created during install, check their permissions, nuke any contents, reformat them, then restart the two services. I'm not sure where my install failed, but this did the trick.Gallenz
O
7

Delete the datanode under your hadoop folder then rerun start-all.sh

Olivas answered 13/10, 2018 at 17:47 Comment(0)
N
6

I was having the same problem running a single-node pseudo-distributed instance. Couldn't figure out how to solve it, but a quick workaround is to manually start a DataNode with
hadoop-x.x.x/bin/hadoop datanode

Northrup answered 14/8, 2012 at 22:20 Comment(0)
T
5

Need to follow 3 steps.

(1) Need to go to the logs and check the most recent log (In hadoop- 2.6.0/logs/hadoop-user-datanode-ubuntu.log)

If the error is as

java.io.IOException: Incompatible clusterIDs in /home/kutty/work/hadoop2data/dfs/data: namenode clusterID = CID-c41df580-e197-4db6-a02a-a62b71463089; datanode clusterID = CID-a5f4ba24-3a56-4125-9137-fa77c5bb07b1

i.e. namenode cluster id and datanode cluster id's are not identical.

(2) Now copy the namenode clusterID which is CID-c41df580-e197-4db6-a02a-a62b71463089 in above error

(3) Replace the Datanode cluster ID with Namenode cluster ID in hadoopdata/dfs/data/current/version

clusterID=CID-c41df580-e197-4db6-a02a-a62b71463089

Restart Hadoop. Will run DataNode

Tantalus answered 9/7, 2017 at 18:51 Comment(0)
A
4

Follow these steps and your datanode will start again.

  1. Stop dfs.
  2. Open hdfs-site.xml
  3. Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
  4. Then remove the hadoopdata directory and add the data.dir and name.dir in hdfs-site.xml and again format namenode.
  5. Then start dfs again.
Ammonium answered 28/8, 2015 at 7:25 Comment(0)
P
2

Stop all the services - ./stop-all.sh Format all the hdfs tmp directory from all the master and slave. Don't forget to format from slave.

Format the namenode.(hadoop namenode -format)

Now start the services on namenode. ./bin/start-all.sh

This made a difference for me to start the datanode service.

Primarily answered 29/7, 2016 at 3:10 Comment(0)
M
2
  1. Stop the dfs and yarn first.
  2. Remove the datanode and namenode directories as specified in the core-site.xml file.
  3. Re-create the directories.
  4. Then re-start the dfs and the yarn as follows.

    start-dfs.sh

    start-yarn.sh

    mr-jobhistory-daemon.sh start historyserver

    Hope this works fine.

Metal answered 4/5, 2017 at 12:14 Comment(0)
L
2

Delete the files under $hadoop_User/dfsdata and $hadoop_User/tmpdata then run:

hdfs namenode -format

finally run:

start-all.sh

Then your problem gets solved.

Lockwood answered 30/7, 2020 at 15:51 Comment(0)
D
1

Please control if the the tmp directory property is pointing to a valid directory in core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/data/tmp</value>
</property>

If the directory is misconfigured, the datanode process will not start properly.

Durward answered 18/3, 2014 at 22:15 Comment(0)
C
1

Run Below Commands in Line:-

  1. stop-all.sh (Run Stop All to Stop all the hadoop process)
  2. rm -r /usr/local/hadoop/tmp/ (Your Hadoop tmp directory which you configured in hadoop/conf/core-site.xml)
  3. sudo mkdir /usr/local/hadoop/tmp (Make the same directory again)
  4. hadoop namenode -format (Format your namenode)
  5. start-all.sh (Run Start All to start all the hadoop process)
  6. JPS (It will show the running processes)
Chloechloette answered 20/11, 2014 at 13:0 Comment(0)
S
1

Step 1:- Stop-all.sh

Step 2:- got to this path

cd /usr/local/hadoop/bin

Step 3:- Run that command hadoop datanode

Now DataNode work

Scattering answered 6/9, 2016 at 12:53 Comment(0)
M
1

Error in datanode.log file

$ more /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.log

Shows:

java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop_tmp/hdfs/datanode: namenode clusterID = CID-e4c3fed0-c2ce-4d8b-8bf3-c6388689eb82; datanode clusterID = CID-2fcfefc7-c931-4cda-8f89-1a67346a9b7c

Solution: Stop your cluster and issue the below command & then start your cluster again.

sudo rm -rf  /usr/local/hadoop_tmp/hdfs/datanode/*
Myosotis answered 10/8, 2017 at 10:30 Comment(0)
E
1

Check whether the hadoop.tmp.dir property in the core-site.xml is correctly set. If you set it, navigate to this directory, and remove or empty this directory. If you didn't set it, you navigate to its default folder /tmp/hadoop-${user.name}, likewise remove or empty this directory.

Eunuchoidism answered 2/2, 2018 at 11:0 Comment(0)
H
1

In case of Mac os(Pseudo-distributed mode):

Open terminal

  1. Stop dfs. 'sbin/stop-all.sh'.
  2. cd /tmp
  3. rm -rf hadoop*
  4. Navigate to hadoop directory. Format the hdfs. bin/hdfs namenode -format
  5. sbin/start-dfs.sh
Hy answered 29/10, 2018 at 12:23 Comment(0)
B
0

Try this

  1. stop-all.sh
  2. vi hdfs-site.xml
  3. change the value given for property dfs.data.dir
  4. format namenode
  5. start-all.sh
Bali answered 21/8, 2013 at 11:26 Comment(0)
M
0

I have got details of the issue in the log file like below : "Invalid directory in dfs.data.dir: Incorrect permission for /home/hdfs/dnman1, expected: rwxr-xr-x, while actual: rwxrwxr-x" and from there I identified that the datanote file permission was 777 for my folder. I corrected to 755 and it started working.

Macroclimate answered 5/1, 2014 at 8:39 Comment(0)
C
0

Instead of deleting everything under the "hadoop tmp dir", you can set another one. For example, if your core-site.xml has this property:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/data/tmp</value>
</property>

You can change this to:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/data/tmp2</value>
</property>

and then scp core-site.xml to each node, and then "hadoop namenode -format", and then restart hadoop.

Cyrille answered 30/5, 2014 at 7:15 Comment(0)
P
0

This is for newer version of Hadoop (I am running 2.4.0)

  • In this case stop the cluster sbin/stop-all.sh
  • Then go to /etc/hadoop for config files.

In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir dfs.namenode.data.dir

  • Delete both the directories recursively (rm -r).
  • Now format the namenode via bin/hadoop namenode -format
  • And finally sbin/start-all.sh

Hope this helps.

Pegmatite answered 30/7, 2014 at 16:16 Comment(0)
O
0

You need to check :

/app/hadoop/tmp/dfs/data/current/VERSION and /app/hadoop/tmp/dfs/name/current/VERSION ---

in those two files and that to Namespace ID of name node and datanode.

If and only if data node's NamespaceID is same as name node's NamespaceID then your datanode will run.

If those are different copy the namenode NamespaceID to your Datanode's NamespaceID using vi editor or gedit and save and re run the deamons it will work perfectly.

Okinawa answered 23/9, 2014 at 7:8 Comment(0)
C
0

if formatting the tmp directory is not working then try this:

  1. first stop all the entities like namenode, datanode etc. (you will be having some script or command to do that)
  2. Format tmp directory
  3. Go to /var/cache/hadoop-hdfs/hdfs/dfs/ and delete all the contents in the directory manually
  4. Now format your namenode again
  5. start all the entities then use jps command to confirm that the datanode has been started
  6. Now run whichever application you have

Hope this helps.

Catalyst answered 14/11, 2014 at 5:10 Comment(0)
L
0
  1. I configured hadoop.tmp.dir in conf/core-site.xml

  2. I configured dfs.data.dir in conf/hdfs-site.xml

  3. I configured dfs.name.dir in conf/hdfs-site.xml

  4. Deleted everything under "/tmp/hadoop-/" directory

  5. Changed file permissions from 777 to 755 for directory listed under dfs.data.dir

    And the data node started working.

Loralorain answered 13/4, 2015 at 23:57 Comment(0)
S
0

Even after removing the remaking the directories, the datanode wasn't starting. So, I started it manually using bin/hadoop datanode It did not reach any conclusion. I opened another terminal from the same username and did jps and it showed me the running datanode process. It's working, but I just have to keep the unfinished terminal open by the side.

Sendal answered 28/4, 2017 at 5:49 Comment(0)
F
0

Follow these steps and your datanode will start again.

1)Stop dfs. 2)Open hdfs-site.xml 3)Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.

4)Then start dfs again.

Fantasm answered 25/9, 2017 at 23:36 Comment(0)
L
0

Got the same error. Tried to start and stop dfs several times, cleared all directories that are mentioned in previous answers, but nothing helped.

The issue was resolved only after rebooting OS and configuring Hadoop from the scratch. (configuring Hadoop from the scratch without rebooting didn't work)

Ludwick answered 21/12, 2017 at 9:56 Comment(0)
F
0

Once I was not able to find data node using jps in hadoop, then I deleted the current folder in the hadoop installed directory (/opt/hadoop-2.7.0/hadoop_data/dfs/data) and restarted hadoop using start-all.sh and jps.

This time I could find the data node and current folder was created again.

Fatness answered 1/8, 2018 at 12:16 Comment(1)
How is this different from earlier answers?Korman
C
0

I Have applied some mixed configuration, and its worked for me.
First >>
Stop Hadoop all Services using ${HADOOP_HOME}/sbin/stop-all.sh

Second >>
Check mapred-site.xml which is located at your ${HADOOP_HOME}/etc/hadoop/mapred-site.xml and change the localhost to master.

Third >>
Remove the temporary folder created by hadoop
rm -rf //path//to//your//hadoop//temp//folder

Fourth >>
Add the recursive permission on temp.
sudo chmod -R 777 //path//to//your//hadoop//temp//folder

Fifth >>
Now Start all the services again. And First check that all service including datanode is running. enter image description here

Congratulant answered 14/3, 2019 at 17:24 Comment(0)
H
-1
    mv /usr/local/hadoop_store/hdfs/datanode /usr/local/hadoop_store/hdfs/datanode.backup

    mkdir /usr/local/hadoop_store/hdfs/datanode

    hadoop datanode OR start-all.sh

    jps
Hultin answered 3/8, 2016 at 12:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.