How to configure SOLR so users can make prefix search by default?

Asked 21/9, 2011 at 7:59 Answered 22/2, 2012 at 19:20

I am using SOLR 3.2. My application issues search queries on SOLR instance, for a text field type. How can i make SOLR to return results like "book", "bookshelf", "bookasd" so on, when user issues a query like "book". Should i append "*" characters to the query string manually or is there a setting in SOLR so it will do prefix searches on the field by default?

This is the schema.xml section for text field type:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenat0All="1" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
    </fieldType>

Lamaism answered 21/9, 2011 at 7:59 Comment(1)

Have you found an answer yet? – Bookout 4/1, 2012 at 16:31

There are several ways to do this, but performance wise you might want to use EdgeNgramFilterFacortory

Pinnati answered 21/9, 2011 at 11:20 Comment(0)

I had the same requirement on a project. I had to implement Suggestion. What i did was defining this suggester fieldType

<fieldType class="solr.TextField" name="suggester">
    <analyzer  type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        
        <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true" outputUnigramsIfNoShingles="false" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" />
    </analyzer>
    <analyzer  type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

I used ShingleFilterFactory because I needed to get suggestion composed of one ore more words.

Then I used faceting queries to get suggestions.

Facet.Limit=10

Facet.Prefix="book"

Facet.Field="Suggester" //this is the field with fieldType="suggester" in which I saved the data

I know it uses facet results but maybe it solves your problem.

If my or Jayendra Patil's answer doesn't provide you a solution you can also take a look at EdgeNGramFilterFactory

Unreasonable answered 21/9, 2011 at 9:4 Comment(0)

You would either have to do the handling on the client side by appending the wildcard characters at the end of the search terms.

The impact :-

Wildcard queries have a performance impact
Wildcard queries do not undergo analysis. So the query time analysis won't be applied to you search terms

The other option is to implement custom query parser with the handling you need.

Steady answered 21/9, 2011 at 8:13 Comment(0)

I'm sure you figured this out by now, but just so there's an answer here:

I handled this by taking the last term and putting an OR with the last term plus a wildcard, e.g. "my favorite book" becomes "my+favorite+(book OR book*)", and would return "my favorite bookshelf". You probably want to do some processing on the input anyway (escaping, etc).

If you are specifically looking for the text typed to match the beginning of the result, then edge n-grams are the way to go, but from reading your question it didn't seem you were really asking for that.

Tournament answered 22/2, 2012 at 19:20 Comment(0)

Recommended topics

Hot tags