Autocorrect Spell Checker
Asked Answered
U

1

9

I have a TSV (tab-separated value) file that I need to spell-check for misspellings and combined words (ie 'I love you' vs 'Iloveyou').

I've installed Aspell on my machine and can run it through R using the aspell() function.

files <- "train2.tsv"
 res <- aspell(files)
 str(res)
 summary(res)

However, the output from running it in R is just a list of misspelled words and possible suggestions.

>  summary(res)
Possibly mis-spelled words:
 [1] "amant"        "contaneir"    "creat"        "ddition"      "EssaySet"     "EssayText"    "experiament"  "expireiment"  "expirement"  
[10] "Fipst"        "infomation"   "Inorder"      "measureing"   "mintued"      "neccisary"    "officialy"    "renuminering" "rinsen"      
[19] "sticlenx"     "sucessfully"  "tipe"         "vineager"     "vinigar"      "yar"   

>  str(res)
Classes ‘aspell’ and 'data.frame':      27 obs. of  5 variables:
 $ Original   : chr  "EssaySet" "EssayText" "expirement" "expireiment" ...
 $ File       : chr  "train2.tsv" "train2.tsv" "train2.tsv" "train2.tsv" ...
 $ Line       : int  1 1 3 3 3 3 3 3 6 6 ...
 $ Column     : int  4 27 27 108 132 222 226 280 120 156 ...
 $ Suggestions:List of 27
  ..$ : chr  "Essay Set" "Essay-Set" "Essayist" "Essays" ...
  ..$ : chr  "Essay Text" "Essay-Text" "Essayist" "Sedatest" ...
  ..$ : chr  "experiment" "excrement" "excitement" "experiments" ...
  ..$ : chr  "experiment" "experiments" "experimenter" "excrement" ...
  ..$ : chr  "Amandy" "am ant" "am-ant" "Amanda" ...
  ..$ : chr  "year" "ya" "Yard" "yard" ...

Is there are way to have aspell (or any other spellchecker) automatically correct misspelled words?

Unrivaled answered 7/7, 2012 at 6:0 Comment(0)
R
9

It looks like you can do the following:

s = load_up_users_dictionary()

for word in text_to_check:
    if word not in s:
        new_words = s.suggest( word )
        replace_incorrect_word( word, new_words[0] )#Pick the first word from the returned list.

Just a quick glance over the documentation and that looks like what you would have to do to automatically use the suggested correct spelling.

http://0x80.pl/proj/aspell-python/index-c.html

Edit: Realize that you may not be looking for python code, but this would be the easiest way to do it with python as the question was tagged with python. There is probably a more efficient method of doing it, but it's getting late and this came to mind first.

Rebel answered 7/7, 2012 at 6:42 Comment(2)
I tagged python because I know it's got some good libraries for NLP and figured it would be a good back up if nothing came through in R. Thank you.Unrivaled
Ok, in that sense the above is a simple method of doing it, there is probably a hidden gem in the documentation that would do exactly what you needed.Rebel

© 2022 - 2024 — McMap. All rights reserved.