I have a set of concatenated word and i want to split them into arrays
For example :
split_word("acquirecustomerdata")
=> ['acquire', 'customer', 'data']
I found pyenchant
, but it's not available for 64bit windows.
Then i tried to split each string into sub string and then compare them to wordnet to find a equivalent word. For example :
from nltk import wordnet as wn
def split_word(self, word):
result = list()
while(len(word) > 2):
i = 1
found = True
while(found):
i = i + 1
synsets = wn.synsets(word[:i])
for s in synsets:
if edit_distance(s.name().split('.')[0], word[:i]) == 0:
found = False
break;
result.append(word[:i])
word = word[i:]
print(result)
But this solution is not sure and is too long. So I'm looking for your help.
Thank you
tome
might come out of that. I'd say fix the data source that gave you concatenated words – Reardon