I have two lists: one, the interests of the user; and second, the keywords about a book. I want to recommend the book to the user based on his given interests list. I am using the SequenceMatcher
class of Python library difflib
to match similar words like "game", "games", "gaming", "gamer", etc. The ratio
function gives me a number between [0,1] stating how similar the 2 strings are. But I got stuck at one example where I calculated the similarity between "looping" and "shooting". It comes out to be 0.6667
.
for interest in self.interests:
for keyword in keywords:
s = SequenceMatcher(None,interest,keyword)
match_freq = s.ratio()
if match_freq >= self.limit:
#print interest, keyword, match_freq
final_score += 1
break
Is there any other way to perform this kind of matching in Python?
NLTK
, as suggested by Xin and 2ero, is probably the keeniest way to obtain the convenient comparisons you want. My solution may be useful if you want to rapidly improve your code without having to study NLTK, but it isn't based on semantics, so it may fail on one case or another. – Mehta