Max edit distance and suggestion based on word frequency
Asked Answered
H

2

9

I need a spell checker with the following specification:

  • Very scalable.
  • To be able to set a maximum edit distance for the suggested words.
  • To get suggestion based on provided words frequencies (most common word first).

I took a look at Hunspell:
I found the parameter MAXDIFF in the man but doesn't seem to work as expected. Maybe I'm using it the wrong way

file t.aff:

MAXDIFF 1 

file dico.dic:

5  
rouge  
vert  
bleu  
bleue  
orange  

-

NHunspell.Hunspell h = new NHunspell.Hunspell("t.aff", "dico.dic");
List<string> s = h.Suggest("bleuue");

returns the same thing t.aff being empty or not:

bleue
bleu
Harrar answered 2/5, 2011 at 13:51 Comment(0)
H
3

We decided to use Apache Solr, which exactly fulfills our needs.
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck

Harrar answered 9/1, 2012 at 21:23 Comment(0)
M
0

A maxdiff of one should return a few, but still can return more than one.

Even a maxdiff of zero can give more than a single result, but it should lower the change. It depends on the n-gram. Try a maxdiff of zero less results, but this still doesn't guarantee you will get a single suggestion.

For your requirement to sort on the most frequent word, the Google ngram corpus is publicly available.

Maddalena answered 6/11, 2011 at 7:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.