Have you indexed nutch crawl results using elasticsearch before?
Asked Answered
M

4

8

Has anyone had any luck writing custom indexers for nutch to index the crawl results with elasticsearch? Or do you know of any that already exist?

Marlomarlon answered 15/5, 2011 at 23:58 Comment(0)
H
2

Haven't done it but this is definitely doable but would require to piggyback the SOLR code (src/java/org/apache/nutch/indexer/solr) and adapt it to ElasticSearch. Would be a nice contrib to Nutch BTW

Hostetter answered 25/5, 2011 at 15:22 Comment(1)
That's the approach I've taken. I have written my own elasticsearch indexer and my own crawl process as well.Marlomarlon
I
10

I wrote an ElasticSearch plugin that mocks the Solr api. Using this plugin and the standard Nutch Solr indexer you can easily send crawled data into ElasticSearch. Plugin and an example of how to use it with Nutch can be found on GitHub:

https://github.com/mattweber/elasticsearch-mocksolrplugin

Inkster answered 9/2, 2012 at 20:29 Comment(0)
A
3

I know that Nutch will be adding pluggable backends and glad to see it. I had a need to integrate elasticsearch with Nutch 1.3. Code is posted here. Piggybacked off the (src/java/org/apache/nutch/indexer/solr) code.

https://github.com/ctjmorgan/nutch-elasticsearch-indexer

Anecdotic answered 21/11, 2011 at 13:52 Comment(1)
I am new to java so i dont know how to create a package on ubuntu and then rebuild it. I have installed nutch at the location /home/peter/nutch/ so i dont know where to copy the ivy files and java files. Also what settings have to be added to the ivy files??Leasia
H
2

Haven't done it but this is definitely doable but would require to piggyback the SOLR code (src/java/org/apache/nutch/indexer/solr) and adapt it to ElasticSearch. Would be a nice contrib to Nutch BTW

Hostetter answered 25/5, 2011 at 15:22 Comment(1)
That's the approach I've taken. I have written my own elasticsearch indexer and my own crawl process as well.Marlomarlon
C
0

Time goes by and now Nucth is already integrated well with ElasticSearch. Here is a nice tutorial.

Chalone answered 15/1, 2016 at 9:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.