Using textblob or spacy for correction spelling in french
Asked Answered
S

1

6

I would like to correct the misspelled words of a text in french, it seems that spacy is the most accurate and faster package to do it, but it's to complex, I tried with textblob, but I didn't manage to do it with french words.

It works perfectly in english, but when I try to do the same in french I get the same misspelled words:

#english words 
from textblob import TextBlob
misspelled=["hapenning", "mornin", "windoow", "jaket"]
[str(TextBlob(word).correct()) for word in misspelled]

#french words
misspelled2=["resaissir", "matinnée", "plonbier", "tecnicien"]
[str(TextBlob(word).correct()) for word in misspelled2]

I get this:

#english:
['happening', 'morning', 'window', 'jacket']

#french:
['resaissir', 'matinnée', 'plonbier', 'tecnicien']
Sentence answered 4/11, 2019 at 13:47 Comment(0)
S
11

textblob supports only English. That's why it returns the French words as they are without any corrections. If you want to use it for French, then you need to install textblob-fr. BUT According to its official repository here, textblob-fr doesn't support spell checking.

Besides spaCy doesn't support spell checking with its language models. There is a workaround out there using spacy_hunspell that wraps hunspell (the spell checker of LibreOffice and Mozilla Firefox), but it doesn't support French either.

So, my recommendation is to use pyspellchecker which supports English, French, German, Spanish and Portugues... It can be installed easily via pip like so:

pip install pyspellchecker

English

Here is how to use it with English:

>>> from spellchecker import SpellChecker
>>>
>>> spell = SpellChecker()
>>> misspelled = ["hapenning", "mornin", "windoow", "jaket"]
>>> misspelled = spell.unknown(misspelled)
>>> for word in misspelled:
...     print(word, spell.correction(word))
jaket jacket
windoow window
mornin morning
hapenning happening

French

Here is how to use it with French... It's exactly the same as English with specifying the langauge=fr:

>>> from spellchecker import SpellChecker
>>>
>>> spell = SpellChecker(language='fr')
>>> misspelled = ["resaissir", "matinnée", "plonbier", "tecnicien"]
>>> misspelled = spell.unknown(misspelled)
>>> for word in misspelled:
...     print(word, spell.correction(word))
plonbier plombier
matinnée matinée
tecnicien technicien
resaissir ressaisir
Sontich answered 7/11, 2019 at 20:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.