Suppose I have a list:-
person_name = ['zakesh', 'oldman LLC', 'bikash', 'goldman LLC', 'zikash','rakesh']
I am trying to group the list in such a way so the Levenshtein distance between two strings is maximum. For finding out the ratio between two words, I am using a python package fuzzywuzzy.
Examples :-
>>> from fuzzywuzzy import fuzz
>>> combined_list = ['rakesh', 'zakesh', 'bikash', 'zikash', 'goldman LLC', 'oldman LLC']
>>> fuzz.ratio('goldman LLC', 'oldman LLC')
95
>>> fuzz.ratio('rakesh', 'zakesh')
83
>>> fuzz.ratio('bikash', 'zikash')
83
>>>
My end goal:
My end goal is to group the words such that Levenshtein distance between them is more than 80 percent?
My list should look something like this :-
person_name = ['bikash', 'zikash', 'rakesh', 'zakesh', 'goldman LLC', 'oldman LLC'] because the distance between `bikash` and `zikash` is very high so they should be together.
Code:
I am trying to achieve this by sorting but key function should be fuzz.ratio
. Well below code is not working, but I am approaching the problem in this angle.
from fuzzywuzzy import fuzz
combined_list = ['rakesh', 'zakesh', 'bikash', 'zikash', 'goldman LLC', 'oldman LLC']
combined_list.sort(key=lambda x, y: fuzz.ratio(x, y))
print combined_list
Could anyone help me to combine the words so that Levenshtein distance between them is more than 80 percent?
Levenshtein distance
between them is more than80%
– Paeoncmp
tosort
(and related functions). That has been removed from Python 3, but there is a workaround: see cmp_to_key. – OgdoadLevenshtein distance
between them is more than 80 percent? I was initially thinking about sorting, but if you could suggest any other approach I can try that – Paeoncmp_to_key
is working for me. Could anyone suggest some approaches :/ :/ – Paeongroup by
function in python to achieve the result ? – Paeon