I am using solr for spell checking/ query correction. I have added solr.PhoneticFilterFactory and solr.NGramFilterFactory in fieldType to perform spell checking. It is working fine but here the problem is that I am getting number of documents of the query. I need only most likely words/documents or in similar words, we can say that nearer words/documents to the query.
Snippet of schema.xml :
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="1000" />
<filter class="solr.LowerCaseFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<filter class="solr.TrimFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
</analyzer>
</fieldType>
Example : For a query "piece". I am getting around 780 NumFound(Number of documents). I need to reduce this counts but with most likely number of documents.
@ MatsLindh:
I tried with different phonetic encoder but I think DoubleMetaphone encoder is good among all. There is any something relevant to threshold by which I can get only the most popular terms/documents for the query. – Milkliveredpiece
? – Duodenary