Spell Checker for Python
Asked Answered
O

12

70

I'm fairly new to Python and NLTK. I am busy with an application that can perform spell checks (replaces an incorrectly spelled word with the correct one). I'm currently using the Enchant library on Python 2.7, PyEnchant and the NLTK library. The code below is a class that handles the correction/replacement.

from nltk.metrics import edit_distance

class SpellingReplacer:
    def __init__(self, dict_name='en_GB', max_dist=2):
        self.spell_dict = enchant.Dict(dict_name)
        self.max_dist = 2

    def replace(self, word):
        if self.spell_dict.check(word):
            return word
        suggestions = self.spell_dict.suggest(word)

        if suggestions and edit_distance(word, suggestions[0]) <= self.max_dist:
            return suggestions[0]
        else:
            return word

I have written a function that takes in a list of words and executes replace() on each word and then returns a list of those words, but spelled correctly.

def spell_check(word_list):
    checked_list = []
    for item in word_list:
        replacer = SpellingReplacer()
        r = replacer.replace(item)
        checked_list.append(r)
    return checked_list

>>> word_list = ['car', 'colour']
>>> spell_check(words)
['car', 'color']

Now, I don't really like this because it isn't very accurate and I'm looking for a way to achieve spelling checks and replacements on words. I also need something that can pick up spelling mistakes like "caaaar"? Are there better ways to perform spelling checks out there? If so, what are they? How does Google do it? Because their spelling suggester is very good.

Any suggestions?

Origami answered 18/12, 2012 at 7:18 Comment(0)
G
63

You can use the autocorrect lib to spell check in python.
Example Usage:

from autocorrect import Speller

spell = Speller(lang='en')

print(spell('caaaar'))
print(spell('mussage'))
print(spell('survice'))
print(spell('hte'))

Result:

caesar
message
service
the
Gt answered 16/1, 2018 at 11:48 Comment(6)
print(spell('Stanger things')) gives Stenger thingsCamphene
This does not appear to be Python-3 compliant? spell = Speller(lang='en') throws TypeError: the JSON object must be str, not 'bytes'Saw
This library is unfortunately not trustworthy. Out of 100 relatively common words, 6 of them were autocorrected to another word: sardine -> marine, stewardess -> stewards, snob -> snow, crutch -> clutch, pelt -> felt, toaster -> coasterMixtec
which is better pyspellchecler or autocorrectPhantasy
This is a pretty bad result. For example, caaaar should be interpreted as car and trim excessive character and recheck literacy. Mussage is phonetically more similar to massage than message, and as other comments suggest.Vries
Just to comment here, the corpus must be tiny, it doesn't even recognise "afterwards" so its useless.Roadrunner
L
40

I'd recommend starting by carefully reading this post by Peter Norvig. (I had to something similar and I found it extremely useful.)

The following function, in particular has the ideas that you now need to make your spell checker more sophisticated: splitting, deleting, transposing, and inserting the irregular words to 'correct' them.

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

Note: The above is one snippet from Norvig's spelling corrector

And the good news is that you can incrementally add to and keep improving your spell-checker.

Hope that helps.

Lagting answered 18/12, 2012 at 17:13 Comment(1)
Here is an open-source, language-independent, trainable spell checker that outperforms Norvig's method and is available in several coding languages.Gagman
B
32

The best way for spell checking in python is by: SymSpell, Bk-Tree or Peter Novig's method.

The fastest one is SymSpell.

This is Method1: Reference link pyspellchecker

This library is based on Peter Norvig's implementation.

pip install pyspellchecker

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

Method2: SymSpell Python

pip install -U symspellpy

Bursarial answered 17/2, 2019 at 18:24 Comment(2)
At least for python3, indexer is deprecated which currently breaks the pyspellchecker moduleSheathing
pyspellchecker is very slow and strips punctuation (but does work on python 3.6)Saw
V
9

Maybe it is too late, but I am answering for future searches. TO perform spelling mistake correction, you first need to make sure the word is not absurd or from slang like, caaaar, amazzzing etc. with repeated alphabets. So, we first need to get rid of these alphabets. As we know in English language words usually have a maximum of 2 repeated alphabets, e.g., hello., so we remove the extra repetitions from the words first and then check them for spelling. For removing the extra alphabets, you can use Regular Expression module in Python.

Once this is done use Pyspellchecker library from Python for correcting spellings.

For implementation visit this link: https://rustyonrampage.github.io/text-mining/2017/11/28/spelling-correction-with-python-and-nltk.html

Visional answered 3/4, 2019 at 10:10 Comment(2)
Removing words with more than 2 repeated lettters is not a good idea. (Oh I just misspelled letters).Merlenemerlin
I did not say anything about removing the whole word, I described to remove extra alphabets from the words. So, lettters to letters. Please read the answer again carefully.Visional
A
4

IN TERMINAL

pip install gingerit

FOR CODE

from gingerit.gingerit import GingerIt
text = input("Enter text to be corrected")
result = GingerIt().parse(text)
corrections = result['corrections']
correctText = result['result']

print("Correct Text:",correctText)
print()
print("CORRECTIONS")
for d in corrections:
  print("________________")  
  print("Previous:",d['text'])  
  print("Correction:",d['correct'])   
  print("`Definiton`:",d['definition'])
 
Adjoin answered 28/3, 2021 at 16:21 Comment(1)
Link: pypi.org/project/gingerit 91 starsTbar
D
3

Try jamspell - it works pretty well for automatic spelling correction:

import jamspell

corrector = jamspell.TSpellCorrector()
corrector.LoadLangModel('en.bin')

corrector.FixFragment('Some sentnec with error')
# u'Some sentence with error'

corrector.GetCandidates(['Some', 'sentnec', 'with', 'error'], 1)
# ('sentence', 'senate', 'scented', 'sentinel')
Devlen answered 28/8, 2020 at 23:0 Comment(2)
did you use it on a windows machine ?Bluegrass
Had issues install on mac: github.com/bakwc/JamSpell/issues/73#issuecomment-1152979889 . It appears to need a manual install of swig.. which I don't really want to do.Tbar
R
2

You can also try:

pip install textblob

from textblob import TextBlob
txt="machne learnig"
b = TextBlob(txt)
print("after spell correction: "+str(b.correct()))

after spell correction: machine learning

Refugee answered 30/11, 2021 at 2:47 Comment(2)
Link: pypi.org/project/textblob thousands of stars. Note that this is a general NLP library.Tbar
>>> textblob.TextBlob("Helo world. How are u doing today?").correct() TextBlob("Felo world. Now are u doing today?") Hmm not really what I was after.Tbar
K
1

spell corrector->

you need to import a corpus on to your desktop if you store elsewhere change the path in the code i have added a few graphics as well using tkinter and this is only to tackle non word errors!!

def min_edit_dist(word1,word2):
    len_1=len(word1)
    len_2=len(word2)
    x = [[0]*(len_2+1) for _ in range(len_1+1)]#the matrix whose last element ->edit distance
    for i in range(0,len_1+1):  
        #initialization of base case values
        x[i][0]=i
        for j in range(0,len_2+1):
            x[0][j]=j
    for i in range (1,len_1+1):
        for j in range(1,len_2+1):
            if word1[i-1]==word2[j-1]:
                x[i][j] = x[i-1][j-1]
            else :
                x[i][j]= min(x[i][j-1],x[i-1][j],x[i-1][j-1])+1
    return x[i][j]
from Tkinter import *


def retrieve_text():
    global word1
    word1=(app_entry.get())
    path="C:\Documents and Settings\Owner\Desktop\Dictionary.txt"
    ffile=open(path,'r')
    lines=ffile.readlines()
    distance_list=[]
    print "Suggestions coming right up count till 10"
    for i in range(0,58109):
        dist=min_edit_dist(word1,lines[i])
        distance_list.append(dist)
    for j in range(0,58109):
        if distance_list[j]<=2:
            print lines[j]
            print" "   
    ffile.close()
if __name__ == "__main__":
    app_win = Tk()
    app_win.title("spell")
    app_label = Label(app_win, text="Enter the incorrect word")
    app_label.pack()
    app_entry = Entry(app_win)
    app_entry.pack()
    app_button = Button(app_win, text="Get Suggestions", command=retrieve_text)
    app_button.pack()
    # Initialize GUI loop
    app_win.mainloop()
Kathrinkathrine answered 18/12, 2012 at 7:18 Comment(0)
U
1

from autocorrect import spell for this you need to install, prefer anaconda and it only works for words, not sentences so that's a limitation u gonna face.

from autocorrect import spell
print(spell('intrerpreter'))
# output: interpreter
Ubana answered 28/12, 2018 at 11:17 Comment(1)
See answer above for issues: https://mcmap.net/q/277969/-spell-checker-for-pythonTbar
D
1

pyspellchecker is the one of the best solutions for this problem. pyspellchecker library is based on Peter Norvig’s blog post. It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. There are two ways to install this library. The official document highly recommends using the pipev package.

  • install using pip
pip install pyspellchecker
  • install from source
git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install

the following code is the example provided from the documentation

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))
Daberath answered 10/9, 2020 at 15:30 Comment(0)
D
1

pip install scuse

from scuse import scuse

obj = scuse()

checkedspell = obj.wordf("spelling you want to check")

print(checkedspell)
Distillery answered 7/9, 2022 at 7:5 Comment(0)
S
0

Spark NLP is another option that I used and it is working excellent. A simple tutorial can be found here. https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/annotation/english/spell-check-ml-pipeline/Pretrained-SpellCheckML-Pipeline.ipynb

Sportsman answered 12/3, 2020 at 14:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.