Is there way to boost original term more while using Solr synonyms?
Asked Answered
A

1

11

For example I have synonyms laptop,netbook,notebook in index_synonyms.txt

When user search for netbook I want to boost original text more then expanded by synonyms? Is there way to specify this in SynonymFilterFactory? For example use original term twice so his TF will be bigger

Alinaaline answered 18/5, 2012 at 22:44 Comment(0)
B
8

As far as I know, there is no way to do this with the existing SynonymFilterFactory. But following is a trick you can use to get this behavior.

Let's say your field is called title. Create another field which is a copy of this, say title_synonyms. Now ensure that SynonymFilterFactory is used as an analyzer only for title_synonyms (you can do this by using different field types for the two fields — say text and text_synonyms). Search in both these fields but give higher boost to title than title_synonyms.

Here are sample field type definitions:

    <fieldType name="text" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
    </fieldType>

    <fieldType name="text_synonyms" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms_index.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms_query.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
    </fieldType>

And here are sample field definitions:

    <field name="title" type="text" stored="false"
           required="true" multiValued="true"/>
    <field name="title_synonyms" type="text_synonyms" stored="false"
           required="true" multiValued="true"/>

Copy title field to title_synonyms:

<copyField source="title" dest="title_synonyms"/>

If you are using dismax, you can give different boosts to these fields like so:

    <str name="qf">title^10 title_synonyms^1</str>
Bonnette answered 19/5, 2012 at 5:25 Comment(4)
Really nice idea! But in my case I have about 10 fields where synonyms required so... will do this if there are no other workarounds...solr patches etcAlinaaline
If you are using the same synonyms file for all those fields, you can copy all of them into one common synonyms field — you don't need one synonyms field corresponding to each field.Bonnette
But I use fine grained weight to all fields. So synonym for title is more important than synonym for description etc.Alinaaline
Multi word searches are problematic with query time synonyms. See : SynonymFilterFactory documentationTameratamerlane

© 2022 - 2024 — McMap. All rights reserved.