Nutch in Windows: Failed to set permissions of path

Asked 3/3, 2013 at 16:53 Answered 3/3, 2014 at 13:49

I'm trying to user Solr with Nutch on a Windows Machine and I'm getting the following error:

Exception in thread "main" java.io.IOException: Failed to set permissions of path: c:\temp\mapred\staging\admin-1654213299\.staging to 0700

From a lot of threads I learned, that hadoop which seems to be used by nutch does some chmod magic that will work on Unix machines, but not on Windows.

This problem exists for more than a year now. I found one thread, where the code line is shown and a fix proposed. Am I really them only one who has this problem? Are all others creating a custom build in order to run nutch on windows? Or is there some option to disable the hadoop stuff or another solution? Maybe another crawler than nutch?

Here's the stack trace of what I'm doing:

    admin@WIN-G1BPD00JH42 /cygdrive/c/solr/apache-nutch-1.6
    $ bin/nutch crawl urls -dir crawl -depth 3 -topN 5 -solr http://localhost:8080/solr-4.1.0
    cygpath: can't convert empty path
    crawl started in: crawl
    rootUrlDir = urls
    threads = 10
    depth = 3
    solrUrl=http://localhost:8080/solr-4.1.0
    topN = 5
    Injector: starting at 2013-03-03 17:43:15
    Injector: crawlDb: crawl/crawldb
    Injector: urlDir: urls
    Injector: Converting injected urls to crawl db entries.
    Exception in thread "main" java.io.IOException: Failed to set permissions of path:         c:\temp\mapred\staging\admin-1654213299\.staging to 0700
        at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
        at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
        at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
        at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

Inheritor answered 3/3, 2013 at 16:53 Comment(2)

Have you tried to run it as administrator? – Fontainebleau 5/3, 2013 at 20:45

This answer helped me to overcome the exception: https://mcmap.net/q/1422110/-failed-to-set-permissions-of-path-tmp – Fulviah 27/12, 2013 at 14:41

It took me a while to get this working but here's the solution which works on nutch 1.7.

Download Hadoop Core 0.20.2 from the maven repository
Replace $NUTCH_HOME/lib/hadoop-core-1.2.0.jar with the downloaded file renaming it with the same name.

That should be it.

Explanation

This issue is caused by hadoop since it assumes you're running on unix and abides by the file permission rules. The issue was resolved in 2011 actually but nutch didn't update the hadoop version they use. The relevant fixes are here and here

Browder answered 7/2, 2014 at 20:49 Comment(0)

We are using Nutch too, but it is not supported for running on Windows, on Cygwin our 1.4 version had similar problems as you had, something like mapreduce too.

We solved it by using a vm (Virtual box) with Ubuntu and a shared directory between Windows and Linux, so we can develop and built on Windows and run Nutch (crawling) on Linux.

Fontainebleau answered 5/3, 2013 at 20:50 Comment(0)

I have Nutch running on windows, no custom build. It's a long time since I haven't used it though. But one thing that took me a while to catch, is that you need to run cygwin as a windows admin to get the necessary rights.

Putandtake answered 4/3, 2013 at 8:59 Comment(3)

Thanks for the hint, but still the same behaviour. Wich nutch version are you using? – Inheritor 5/3, 2013 at 6:54

I had it working with hadoop 19, nutch 1.0/1.1/1.2. Windows Vista. – Putandtake 5/3, 2013 at 9:53

try to do the same manually with cygwin, to be absolutely sure – Putandtake 5/3, 2013 at 9:56

I suggest a different approach. Check this link out. It explains how to swallow the error on Windows, and does not require you to downgrade Hadoop or rebuild Nutch. I tested on Nutch 2.1, but it applies to other versions as well. I also made a simple .bat for starting the crawler and indexer, but it is meant for Nutch 2.x, might not be applicable for Nutch 1.x.

For the sake of posterity, the approach entails:

Making a custom LocalFileSystem implementation:

public class WinLocalFileSystem extends LocalFileSystem {

    public WinLocalFileSystem() {
        super();
        System.err.println("Patch for HADOOP-7682: "+
            "Instantiating workaround file system");
    }

    /**
     * Delegates to <code>super.mkdirs(Path)</code> and separately calls
     * <code>this.setPermssion(Path,FsPermission)</code>
     */
    @Override
    public boolean mkdirs(Path path, FsPermission permission)
            throws IOException {
        boolean result=super.mkdirs(path);
        this.setPermission(path,permission);
        return result;
    }


    /**
     * Ignores IOException when attempting to set the permission     
     */
    @Override
    public void setPermission(Path path, FsPermission permission)
            throws IOException {
        try {
            super.setPermission(path,permission);
        }
        catch (IOException e) {
            System.err.println("Patch for HADOOP-7682: "+
                "Ignoring IOException setting persmission for path \""+path+
                "\": "+e.getMessage());
        }
    }
}

Compiling it and placing the JAR under ${HADOOP_HOME}/lib
And then registering it by modifying ${HADOOP_HOME}/conf/core-site.xml:

fs.file.impl com.conga.services.hadoop.patch.HADOOP_7682.WinLocalFileSystem Enables patch for issue HADOOP-7682 on Windows

Kinchen answered 18/2, 2014 at 15:19 Comment(1)

Works perfectly for Nutch 1 – Mandel 18/6, 2019 at 1:58

You have to change the project dependences hadoop-core and hadoop-tools. I'm using 0.20.2 version and works fine.

Eroto answered 3/3, 2014 at 13:49 Comment(0)

Recommended topics

Hot tags