When I run nutch 1.10
with the following command, assuming that TestCrawl2
did not previously exist and needs to be created,...
sudo -E bin/crawl -i -D solr.server.url=http://localhost:8983/solr/TestCrawlCore2 urls/ TestCrawl2/ 20
I receive an error on indexing that claims:
Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/opt/apache-nutch-1.10/TestCrawl2/linkdb/current
The linkdb directory exists, but does not contain the 'current' directory. The directory is owned by root so there should be no permissions issues. Because the process exited from an error, the linkdb directory contains .locked
and ..locked.crc
files. If I run the command again, these lock files cause it to exit in the same place. Delete TestCrawl2
directory, rinse, repeat.
Note that the nutch and solr installaions themselves have run previously without problems in a TestCrawl
instance. It's just now that I'm trying a new one that I'm having problems. Any suggestions on troubleshooting this issue?