Apache Nutch and Solr integration
Asked Answered
M

1

11

I've tried to follow the nutch tutorial but having a bit of a problem with the schema.xml file.

I was told to the nutch provided schema to my project, essentially this...

cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml ${APACHE_SOLR_HOME}/example/solr/conf/

I have deployed my solr file in Tomcat and the error I get when I go to the Solr dashboard is

collection1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Plugin init failure for [schema.xml] fieldType "text": 
Plugin init failure for [schema.xml] analyzer/filter:
Error loading class 'solr.EnglishPorterFilterFactory'

Which relates to this element in my solrconfig.xml file (I can comment this out but not sure how important this is yet)

<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>

I have edited my solrconfig.xml to try and included a range of jar files that come with solr, specifically

<lib path="/etc/solr/collection1/libs/dist/solr-core-4.2.1.jar" />
<lib path="/etc/solr/collection1/libs/dist/solr-analysis-extras-4.2.1.jar" />

But I don't think they contain the missing class "solr.EnglishPorterFilterFactory"

Does anyone have idea why this might not be working or if I have missed something? I'm not a Java developer btw so no doubt it will be something simple :)

UPDATE After finding out that the schema had some old classes being referenced I had another look in the nutch/conf and tt looks like there is a ${NUTCH_RUNTIME_HOME}/conf/schema-solr4.xml file which seems to work.

Not 100% if this is correct but hey...

Madura answered 11/4, 2013 at 10:2 Comment(0)
I
12

Looks like EnglishPorterFilterFactory is no longer around in 4.x. See the note in it's 3.6.0 documentation:

Deprecated.
  Use SnowballPorterFilterFactory with language="English" instead

A lot of Deprecated stuff went away in 4.0. I'd do what it says, see the documentation for SnowballPorterFilterFactory.

Ineligible answered 11/4, 2013 at 15:30 Comment(1)
Also using "${NUTCH_RUNTIME_HOME}/conf/schema-solr4.xml" instead of old configMadura

© 2022 - 2024 — McMap. All rights reserved.