I have a question about building a custom dictionary for hunspell. I'm using a general English dictionary and affix file right now. How can I add user-specified words to that dictionary for each of my users?
create your own word-list and affix file for your language, if that doesn't exist. Well, for papiamentu - Curaçao's native language - such dictionary doesn't exist. But I had a hard time finding out how to create such files, so I am documenting it here: http://www.suares.com/index.php?page_id=25&news_id=233
I'm trying to do the same but haven't found enough information to begin yet.
However, you may want to look at hunspell - format of Hunspell dictionaries and affix files .
UPDATE
If you are working with .NET, you can download Hunspell .NET port. Using it is fairly easy too.
var bee = new Hunspell();
bee.Load("path_to_en_US.aff");
bee.Load("path_to_en_US.dic");
bee.Add("my_custom_word1");
bee.Add("my_custom_word2");
var suggestions = bee.Suggest("misspel_word");
The secret to getting hunspell
to work (at least for me) was to figure out the locations it would search that were owned by me, and put the custom dictionaries there. Also bear in mind that the dictionaries are in a specific format, so you need to obey those rules.
Running hunspell -D
will show you the search path. On MacOS, mine includes /Users/scott/Library/Spelling
so I created that directory and put mine there. Let's say you want to call your dictionary mydict
and your input datafile of words is called dict.txt
. We'll use the path I just showed.
First, copy the default .aff
file. You will see it when you run hunspell -D
as described above. For me, it's in /Library/Spelling/en_US/
. So
cp /Library/Spelling/en_US.aff /Users/scott/Library/Spelling/mydict.aff
Then, every time you update your input list (dict.txt
), do this:
DICT=/Users/scott/Library/Spelling/mydict.dic
cd ~/doc/dict
cat dict.txt | sort | uniq > dict.in
wc -l dict.in > $DICT
cat dict.in >> $DICT
rm dict.in
To run hunspell
, just specify both dictionaries. So for me, because I want a list of misspellings, I use
hunspell -l -d scott,en_US <filename>
-p
option and you only need the list of sorted words. cat dict.txt | sort -u > custom_words
. Then hunspell -l -p custom_words
and it will use the default dictionary, but also include the custom_words from your file. No need to copy the .aff file. –
Lucre I am implementing this type of feature as well. Once you've created the Hunspell object with an associated dictionary you can add individual words to it.
Keep in mind though that these words will only be available for as long as the Hunspell object is alive. Every time you access a new object you will have to add all the user defined words again.
Have a look at the documentation in openoffice
http://www.openoffice.org/lingucomponent/
specially this document http://www.openoffice.org/lingucomponent/dictionary.html
It's a good starting point
© 2022 - 2024 — McMap. All rights reserved.