nutch 1.10 input path does not exist /linkdb/current

When I run nutch 1.10 with the following command, assuming that TestCrawl2 did not previously exist and needs to be created,...

sudo -E bin/crawl -i -D solr.server.url=http://localhost:8983/solr/TestCrawlCore2 urls/ TestCrawl2/ 20

I receive an error on indexing that claims:

Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/opt/apache-nutch-1.10/TestCrawl2/linkdb/current

The linkdb directory exists, but does not contain the 'current' directory. The directory is owned by root so there should be no permissions issues. Because the process exited from an error, the linkdb directory contains .locked and ..locked.crc files. If I run the command again, these lock files cause it to exit in the same place. Delete TestCrawl2 directory, rinse, repeat.

Note that the nutch and solr installaions themselves have run previously without problems in a TestCrawl instance. It's just now that I'm trying a new one that I'm having problems. Any suggestions on troubleshooting this issue?

Recommended topics

Hot tags