How do I compare the similarity of person names using a metric? [closed]
Asked Answered
S

2

6

I am particularly working on a function to allow the misspelled and aliases of person names. I have done some research & found there are quite a number of algorithms for String metric and phonetic libraries too.

I have tried some and of all those Jaro Winkler gives some good results as below.

compareStrings("elon musk","elon musk"))    --> 1.0 
compareStrings("elonmusk","elon musk"))     --> 0.98
compareStrings("elon mush","elon musk"))    --> 0.99
compareStrings("eln msuk","elon musk"))     --> 0.94
compareStrings("elon","elon musk"))         --> 0.89
compareStrings("musk","elon musk"))         --> 0.0  //This is bad, but can fix that.
compareStrings("mr elon musk","elon musk")) --> 0.81

The above is the implementation from Apache commons Library.I wanted to know if there is any better implementation which serves the purpose better. Any help is appreciated.

Edit: @newuserua_ext @Trasher Thanks, I appreciate for your time. I have gone through all StackExchange Q&A related to this. And posted this question focusing on person names.

Scirrhus answered 9/12, 2016 at 4:49 Comment(7)
When you downvote, please mention the reason. I posted this because I needed help, I couldn't find anything better on Internet.Scirrhus
Check this out (overview section) : github.com/tdebatty/java-string-similarity. Good luck!Altheaalthee
@Thrasher Thank you for the link :) As I mentioned, my question is very particular to person names.Scirrhus
"The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect typos."Altheaalthee
@Thrasher Thank god. Someone is finally understanding. Exactly, I am looking for a better algorithm for validation of "Person names".Scirrhus
I found a paper about find and match personal names: Techniques and Practical IssuesAltheaalthee
I found something similar, maybe it will help you;#955610Spenserian
S
0

Consider Double Metaphone. We use it successfully to find "sounds-like" matches to names. You can find an implementation for Java in Apache Commons:

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/DoubleMetaphone.html

Sinkage answered 9/12, 2016 at 7:12 Comment(0)
V
0

One possibility is the Levenshtein distance, which measures the edit distance of the strings given specific permitted operations. It can be more or less efficiently evaluated using dynamic programming, but is not really suitable for determining phonetic similarity.

Viscera answered 9/12, 2016 at 7:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.