How to resolve 'file could only be replicated to 0 nodes, instead of 1' in hadoop?
Asked Answered
I

3

8

I have a simple hadoop job that crawls websites and caches them to the HDFS. The mapper checks if a URL already exists in the HDFS and if so, uses it otherwise downloads the page and saves it to the HDFS.

If an network error (404, etc) is encountered while downloading the page, then the URL is skipped entirely - not written to the HDFS. Whenever I run a small list ~1000 websites, I always seem to encounter this error which crashes the job repeatedly in my pseudo distributed installation. What could be the problem?

I'm running Hadoop 0.20.2-cdh3u3.

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/raj/cache/9b4edc6adab6f81d5bbb84fdabb82ac0 could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1520)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:665)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
Importation answered 3/4, 2012 at 4:16 Comment(1)
Why dont you simply set "dfs.replication" to 0 in hdfs-site.xml. As you are using pseudo mode, replication of data is of no concern to you.Coleridge
I
2

The problem was an unclosed FileSystem InputStream instance in the mapper that was used for caching input to file system.

Importation answered 13/4, 2012 at 10:12 Comment(0)
A
1

Looking by sources you probabbly get out of space on your local machine (or VM). This exception is caused when system can not find enough nodes for the replication. The class responsible for selecting nodes is ReplicationTargetChooser.

http://javasourcecode.org/html/open-source/hadoop/hadoop-0.20.203.0/org/apache/hadoop/hdfs/server/namenode/ReplicationTargetChooser.java.html

Its main method is chooseTarget (line 67).
After diving into code you will get into isGoodTarget method, which also checks if there is enough space on the node: Line 404.
If you will enable debug logs, you will probabbly see the relevant message.

Ait answered 3/4, 2012 at 6:12 Comment(0)
P
1

Please check the namenode logs, matching the time stamps. If there is an indication about problems with IPC, you are likely running out of "xcievers". In my case, setting dfs.datanode.max.xcievers in hdfs-site.xml to a larger value, i.e. 4096 or 8192, fixed that particular problem for me.

Pagandom answered 18/6, 2012 at 17:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.