I am new to Nutch. I need to crawl the web (say, a few hundred web pages), read the crawled data and do some analysis.
I followed the link https://wiki.apache.org/nutch/NutchTutorial (and integrated Solr since I may require to search text in future) and ran the crawl using a few URLs as the seed.
Now, I don't find the text/html
data in my local machine. Where can I find the data and what is the best way to read the data in text format?
Versions
- apache-nutch-1.9
- solr-4.10.4