Solr - case-insensitive search do not work
Asked Answered
B

2

5

I want to apply case-insensitive search for field myfield in solr.

I googled a bit for that , and i found that , i need to apply LowerCaseFilterFactory to Field Type and field should be of solr.TextFeild.

I applied that in my schema.xml and re-index the data, then also my search seems to be case-sensitive.

Below is search that i perform.

http://localhost:8080/solr/select?q=myfield:"cloud university"&hl=on&hl.snippets=99&hl.fl=myfield

Below is definition for field type

 <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

and below is my field definition

 <field name="myfield" type="text_en_splitting" indexed="true" stored="true" />

Not sure , what is wrong with this. Please help me to resolve this.

Thanks

EDIT

Debug Query

<lst name="debug">
    <str name="rawquerystring">
        "cloud university" AND guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <str name="querystring">
        "cloud university" AND guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <str name="parsedquery">
        +PhraseQuery(CC:"cloud univers") +guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <str name="parsedquery_toString">
        +CC:"cloud univers" +guid:268406b6-db65-49da-848a-c59248f170db
    </str>
    <lst name="explain">
        <str name="KSYS_20120805_1100">
            12.572915 = (MATCH) sum of: 0.03595598 = weight(CC:"cloud univers" in 1560524), product of: 0.51819557 = queryWeight(CC:"cloud univers"), product of: 8.881522 = idf(CC: cloud=4798 univers=625207) 0.05834536 = queryNorm 0.06938689 = fieldWeight(CC:"cloud univers" in 1560524), product of: 1.0 = tf(phraseFreq=1.0) 8.881522 = idf(CC: cloud=4798 univers=625207) 0.0078125 = fieldNorm(field=CC, doc=1560524) 12.536959 = (MATCH) weight(guid:268406b6-db65-49da-848a-c59248f170db in 1560524), product of: 0.85526216 = queryWeight(guid:268406b6-db65-49da-848a-c59248f170db), product of: 14.658615 = idf(docFreq=1, maxDocs=1709587) 0.05834536 = queryNorm 14.658615 = (MATCH) fieldWeight(guid:268406b6-db65-49da-848a-c59248f170db in 1560524), product of: 1.0 = tf(termFreq(guid:268406b6-db65-49da-848a-c59248f170db)=1) 14.658615 = idf(docFreq=1, maxDocs=1709587) 1.0 = fieldNorm(field=guid, doc=1560524)
        </str>
    </lst>
    <str name="QParser">LuceneQParser</str>
    <lst name="timing">
        <double name="time">60.0</double>
        <lst name="prepare">
            <double name="time">1.0</double>
            <lst name="org.apache.solr.handler.component.QueryComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.FacetComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.HighlightComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.StatsComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.DebugComponent">
                <double name="time">0.0</double>
            </lst>
        </lst>
        <lst name="process">
            <double name="time">59.0</double>
            <lst name="org.apache.solr.handler.component.QueryComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.FacetComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.HighlightComponent">
                <double name="time">57.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.StatsComponent">
                <double name="time">0.0</double>
            </lst>
            <lst name="org.apache.solr.handler.component.DebugComponent">
                <double name="time">2.0</double>
            </lst>
        </lst>
    </lst>
</lst>
Batman answered 22/8, 2012 at 10:29 Comment(14)
The configuration is correct. Did you reload the cores after the changes to the schema xml.Aleishaalejandra
yes @Jayendra, i reload then after changesBatman
can you add debugQuery=on in the url and check the debug information as to what the query looks like.Aleishaalejandra
@Aleishaalejandra thanks for your suggestion , i edited my post and added debug query results . please suggest me on that.Batman
Sorry .. I dont see the edited postAleishaalejandra
sorry ... delayed in editing.Batman
i didn't get that, i searched with cloud university and solr phrase query results in cloud univers. Is that can be reason ??Batman
the query is correct and getting applied. university is transformed to univers cause of stemming filter. However what is guid in query for ?? Probably its not matching thats why no results.Aleishaalejandra
guid is unique id for each document... i applied it , just ti minimize search result.Batman
probably its not matching then. Try without it and you would get search results.Aleishaalejandra
let us continue this discussion in chatBatman
no... @Aleishaalejandra , it doesn't work by removing guid also... :( , and yes don't know if it can be useful or not. but i'll like to tell you that my search term is as 'ClOUD UNIVERSITY' in myfieldBatman
what do you mean by: "then also my search seems to be case-sensitive."??? What is the result you get, and what is the expected result ?Ruddy
@Dorin, in my field one term is like as ClOUD UNIVERSITY (all caps just one l in small case) , if i do search with cloud university then it's not returning me this record.Batman
S
6

You should put solr.LowerCaseFilterFactory before the word delimiter because caps in the middle of lower caps or vice versa triggers the word delimiter

Stainless answered 27/8, 2012 at 9:57 Comment(0)
R
1

I recommend you should use Analysis tool and see how the expression is indexed and how the expression is searched. http://localhost:8983/solr/admin/analysis.jsp?highlight=on

I think there might be a problem with the WordDelimiterFilterFactory ( it is different in query and in index ), but this is just a guess.

Select in the tool field type text_en_splitting and enter at field value index ClOUD UNIVERSITY and at field value query cloud university. Also select Verbose output and see what you get.

Ruddy answered 27/8, 2012 at 9:48 Comment(1)
Thanks @Dorin, i sourly try this out and let you know. :)Batman

© 2022 - 2024 — McMap. All rights reserved.