I have been trying to use spell corrector for my database table to correct the address from one table, for which I have used the reference of http://norvig.com/spell-correct.html Using the Address_mast table as a collection of strings I'm trying to correct and update the corrected string in "customer_master"
Address_mast
ID Address
1 sonal plaza,harley road,sw-309012
2 rose apartment,kell road, juniper, la-293889
3 plot 16, queen's tower, subbden - 399081
4 cognizant plaza, abs road, ziggar - 500234
now from the reference code it has been done only for those words which are "two edits away from word".but I'm trying to do it for 3 or till 4 and at the same time trying to update those corrected words to other table.here is the table which contains misspell words and is to be updated with corrected words
Customer_master
Address_1
josely apartmt,kell road, juneeper, la-293889
zoonal plaza, harli road,sw-309012
plot 16, queen's tower, subbden - 399081
cognejantt pluza, abs road, triggar - 500234
here is what I have tried
import re
import pyodbc
import numpy as np
from collections import Counter
cnxn = pyodbc.connect('DRIVER={SQLServer};SERVER=localhost;DATABASE=DBM;UID=ADMIN;PWD=s@123;autocommit=True')
cursor = cnxn.cursor()
cursor.execute("select address as data from Address_mast")
data=[]
for row in cursor.fetchall():
data.append(row[0])
data = np.array(data)
def words(text): return re.findall(r'\w+', text.lower())
WORDS = Counter(words(open('data').read()))
def P(word, N=sum(WORDS.values())):
"Probability of `word`."
return WORDS[word] / N
def correction(word):
"Most probable spelling correction for word."
return max(candidates(word), key=P)
def candidates(word):
"Generate possible spelling corrections for word."
return (known([word]) or known(edits1(word)) or known(edits2(word)) or known(edits3(word)) or known(edits4(word)) or [word])
def known(words):
"The subset of `words` that appear in the dictionary of WORDS."
return set(w for w in words if w in WORDS)
def edits1(word):
"All edits that are one edit away from `word`."
letters = 'abcdefghijklmnopqrstuvwxyz'
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [L + R[1:] for L, R in splits if R]
transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R)>1]
replaces = [L + c + R[1:] for L, R in splits if R for c in letters]
inserts = [L + c + R for L, R in splits for c in letters]
return set(deletes + transposes + replaces + inserts)
def edits2(word):
"All edits that are two edits away from `word`."
return (e2 for e1 in edits1(word) for e2 in edits1(e1))
def edits3(word):
return (e3 for e2 in edits2(word) for e3 in edits1(e2))
def edits4(word):
return (e4 for e3 in edits3(word) for e4 in edits1(e3))
sqlstr = ""
j=0
k=0
for i in data:
sqlstr=" update customer_master set Address='"+correction(data)+"' where data="+correction(data)
cursor.execute(sqlstr)
j=j+1
k=k+cursor.rowcount
cnxn.commit()
cursor.close()
cnxn.close()
print(str(k) +" Records Completed")
from this I m unable to get proper output, any suggestion on what changes shuld be made..Thanks in advance
edits3
andedits4
functions incandidates()
. Or in what way is your output improper? – Malatya