Apache Nutch - Problems with Paths
Asked Answered
R

1

9

I am trying to set up Apache Nutch to crawl URLs, following this guide. Being an older guide (The guide is for 1.x, I am using 2.3), I have made the necessary changes to structure. However, when I try to run a crawl, I get this error :

root@IndiStage:~# /usr/local/nutch/framework/apache-nutch-2.3/src/bin/crawl urls FirstCrawl 2
No SOLRURL specified. Skipping indexing.
Injecting seed URLs
/usr/local/nutch/framework/apache-nutch-2.3/src/bin/nutch inject urls -crawlId FirstCrawl
Error: Could not find or load main class org.apache.nutch.crawl.InjectorJob
Error running:
  /usr/local/nutch/framework/apache-nutch-2.3/src/bin/nutch inject urls -crawlId FirstCrawl
Failed with exit value 1.
root@IndiStage:~#

Being new to Ubuntu (14.04), I am finding it hard to manage the directory structure and paths here.

InjectorJob is in /usr/local/nutch/framework/apache-nutch-2.3/src/java/org/apache/nutch/crawl

JAVA_HOME is set to /usr/lib/jvm/java-7-openjdk-amd64

Reachmedown answered 15/11, 2015 at 8:50 Comment(0)
A
3

Make sure that you already compile the Nutch source code. Then, run the crawl command from ${APACHE_NUTCH_HOME}/runtime/local (or ${APACHE_NUTCH_HOME}/runtime/deploy/bin).

Hope this helps,

Le Quoc Do

Anchorage answered 11/3, 2016 at 19:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.