Hi I am trying to run Apache Nutch 1.2 on Amazon's EMR.
To do this I specifiy an input directory from S3. I get the following error:
Fetcher: java.lang.IllegalArgumentException: This file system object (hdfs://ip-11-202-55-144.ec2.internal:9000) does not support access to the request path 's3n://crawlResults2/segments/20110823155002/crawl_fetch' You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path.
I understand the difference between FileSystem.get(uri, conf)
, and FileSystem.get(conf)
. If I were writing this myself I would FileSystem.get(uri, conf)
however I am trying to use existing Nutch code.
I asked this question, and someone told me that I needed to modify hadoop-site.xml
to include the following properties: fs.default.name
, fs.s3.awsAccessKeyId
, fs.s3.awsSecretAccessKey
. I updated these properties in core-site.xml
(hadoop-site.xml
does not exist), but that didn't make a difference. Does anyone have any other ideas?
Thanks for the help.