Hadoop Error - All data nodes are aborting
Asked Answered
G

2

10

I am using Hadoop 2.3.0 version. Sometimes when I execute the Map reduce job, the below errors will get displayed.

14/08/10 12:14:59 INFO mapreduce.Job: Task Id : attempt_1407694955806_0002_m_000780_0, Status : FAILED
Error: java.io.IOException: All datanodes 192.168.30.2:50010 are bad. Aborting...
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)


When I try to check the log files for these failed tasks, the log folder for this task will be empty.

I am not able to understand the reason behind this error. Could someone please let me know how to resolve this issue. Thanks for your help.

Gregorygregrory answered 10/8, 2014 at 19:23 Comment(0)
N
4

You seem to be hitting the open file handles limit of your user. This is a pretty common issue, and can be cleared in most cases by increasing the ulimit values (its mostly 1024 by default, easily exhaustible by multi-out jobs like yours).

Nuncle answered 11/8, 2014 at 10:13 Comment(0)
T
0

Setting spark.shuffle.service.enabled to true resolved this issue for me.

spark.dynamicAllocation.enabled allows Spark to assign the executors dynamically to different task. The spark.shuffle.service.enabled when set to false disables the external shuffle service and data is stored only on executors. When the executors is reassigned the data is lost and the exception

java.io.IOException: All datanodes are bad.

is thrown for data request.

Tocology answered 2/5, 2019 at 15:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.