I have been trying to use spell corrector for my database table to correct the address from one table, for which I have used the reference of http://norvig.com/spell-correct.html Using the Address_mast table as a collection of strings I'm trying to correct and update the corrected string in "customer_master"
ID Address
1 sonal plaza,harley road,sw-309012
2 rose apartment,kell road, juniper, la-293889
3 plot 16, queen's tower, subbden - 399081
4 cognizant plaza, abs road, ziggar - 500234
now from the reference code it has been done only for those words which are "two edits away from word".but I'm trying to do it for 3 or till 4 and at the same time trying to update those corrected words to other table.here is the table which contains misspell words and is to be updated with corrected words
josely apartmt,kell road, juneeper, la-293889
zoonal plaza, harli road,sw-309012
plot 16, queen's tower, subbden - 399081
cognejantt pluza, abs road, triggar - 500234
here is what I have tried
import re
import pyodbc
import numpy as np
from collections import Counter
cnxn = pyodbc.connect('DRIVER={SQLServer};SERVER=localhost;DATABASE=DBM;UID=ADMIN;PWD=s@123;autocommit=True')
cursor = cnxn.cursor()
cursor.execute("select address as data from Address_mast")
for row in cursor.fetchall():
data = np.array(data)
def words(text): return re.findall(r'\w+', text.lower())
WORDS = Counter(words(open('data').read()))
def P(word, N=sum(WORDS.values())):
"Probability of `word`."
return WORDS[word] / N
def correction(word):
"Most probable spelling correction for word."
return max(candidates(word), key=P)
def candidates(word):
"Generate possible spelling corrections for word."
return (known([word]) or known(edits1(word)) or known(edits2(word)) or known(edits3(word)) or known(edits4(word)) or [word])
def known(words):
"The subset of `words` that appear in the dictionary of WORDS."
return set(w for w in words if w in WORDS)
def edits1(word):
"All edits that are one edit away from `word`."
letters = 'abcdefghijklmnopqrstuvwxyz'
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [L + R[1:] for L, R in splits if R]
transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R)>1]
replaces = [L + c + R[1:] for L, R in splits if R for c in letters]
inserts = [L + c + R for L, R in splits for c in letters]
return set(deletes + transposes + replaces + inserts)
def edits2(word):
"All edits that are two edits away from `word`."
return (e2 for e1 in edits1(word) for e2 in edits1(e1))
def edits3(word):
return (e3 for e2 in edits2(word) for e3 in edits1(e2))
def edits4(word):
return (e4 for e3 in edits3(word) for e4 in edits1(e3))
sqlstr = ""
for i in data:
sqlstr=" update customer_master set Address='"+correction(data)+"' where data="+correction(data)
print(str(k) +" Records Completed")
from this I m unable to get proper output, any suggestion on what changes shuld be made..Thanks in advance
functions incandidates()
. Or in what way is your output improper? – Malatya