Need more understanding on python fuzz partial ratio
Asked Answered
A

1

6

I am using python fuzzywuzzy on an enterprise level to match 2 strings. It works fine in most of the cases but giving unexpected results in the below mentioned scenario:

fuzz.partial_ratio('ja rule:mesmerize','ja rule feat. ashanti:mesmerize') gives output 65

and

fuzz.partial_ratio('ja rule:mesmerize','jennifer lopez feat. ja rule:im real ') gives the output 67

Any explanation on why the fuzz score in the second match is better than the first match?

Any help/suggestion is greatly appreciated.

Afroasian answered 13/12, 2018 at 5:37 Comment(1)
Viewers might find this useful (from the developers of fuzzywuzzy) #31807195Bighorn
P
6

fuzzywuzzy uses Levenshtein distance which means it does compare all characters including spaces and symbols such as ':'.

partial_ratio compares two strings, but it is allowed to cut the longer string to the length of the shorter string.

In your case, shorter string is 'ja rule:mesmerize' with length 17. When the string is compared, longer string is cut to that size.

With this information in mind, let's compare your outputs. We can see that the first long string does not have : in the end of 'ja rule' but the second one does. There are many possible other factors, but this could be the main reason for your outcome.

I'm sure more careful analysis will reveal more about the score. The implementation of patial_ratio is found here https://github.com/seatgeek/fuzzywuzzy/blob/master/fuzzywuzzy/fuzz.py#L34.

Puffball answered 13/12, 2018 at 6:22 Comment(2)
Thanks for the explanation!Afroasian
[Oct 2020] It does not cut the longer string. It just switches them according to the github link above.Microelectronics

© 2022 - 2024 — McMap. All rights reserved.