Does the Sitecore 7 ContentSearch API remove stop words from queries?
Asked Answered
G

2

7

I've found that searches that contain 'of', 'and', 'the', etc. will not return results because Lucene has removed stop words. So if I search for a item that had a title of "Aftermath of the first world war" I will get zero results.

But if I strip 'of' and 'the', then I am searching for "aftermath first world war". I will get the expected document back.

Does the ContentSearch API remove stop words from queries? Is this something one can configure Lucene to remove? Or should I remove these stop words before building my query?

Thanks Adam

Graptolite answered 5/2, 2014 at 17:12 Comment(0)
H
2

You can configure Sitecore Standard Analyzer to accept your own custom set of Stopwords. Create an text file with the stopwords (one stop word per line) and then Make the below config changes in the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config file

<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider">
  <param desc="defaultAnalyzer" type="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net">
    <param hint="version">Lucene_30</param>
      <param desc="stopWords" type="System.IO.FileInfo, mscorlib">
      <param hint="fileName">[FULL_PATH_TO_SITECORE_ROOT_FOLDER]\Data\indexes\stopwords.txt</param>
      </param>
  </param>
</param>   

Further Reading : I have written an blog post about this issue and might be of help http://blog.horizontalintegration.com/2014/03/19/sitecore-standard-analyzer-managing-you-own-stop-words-filter/

Hastate answered 19/3, 2014 at 16:17 Comment(3)
Links to webpages are not good answers even if they answer the question. The answer, including relevant code should be included within the SO site.Wangle
Novocaine88, thanks for the comment, Being a newbie to SO this certainly helps.Hastate
Is there a way to use relative path instead of absolute path to the stopwords.txt file? I tried something like this but it doesn't work: <param>$(dataFolder)/stopwords.txt</param>Beare
C
1

I think this is the same problem with problem from this blog.

Can you try to follow the steps from the blog post?

Other option can be to create a custom analyzer and to give to the constructor your stopWords list. Something like:

public class CustomAnalyzer : Lucene.Net.Analysis.Standard.StandardAnalyzer
{
    private static Hashtable stopWords = new Hashtable()
    {
        {"of", "of"},
        {"stopword2", "stopword2"}
    }; 
    public CustomAnalyzer() : base(Lucene.Net.Util.Version.LUCENE_30, stopWords)
    {      
    }
}

After you modify you need to change your config file. A nice blog post about Analyzer you can find here. P.S.: I didn't try my code if is really working.

Cardamom answered 5/2, 2014 at 19:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.