What is the best Fuzzy Matching Algorithm (Fuzzy Logic, N-Gram, Levenstein, Soundex ....,) to process more than 100000 records in less time?
Best Fuzzy Matching Algorithm? [closed]
Asked Answered
I imagine that what @Mitch Wheat meant to say was that it will be very hard to give a definitive answer to this question, since the best solution will be heavily dependent on the characteristics of your input and system architecture. As Tim mentioned in his answer, you ought to read up on the strengths and weaknesses of these algorithms, and then test the ones that seem appropriate for yourself. –
Indemnity
I suggest you read the articles by Navarro mentioned in the Refences section of the Wikipedia article titled Approximate string matching. Making your decision based on actual research is always better than on suggestions by random strangers.. Especially if performance on a known set of records is important to you.
It massively depends on your data. Certain records can be matched better than others. For example postcode is a defined format so can be compared in a different way to normal strings. People can be matched on initials and DOB, or other combinations etc.
© 2022 - 2024 — McMap. All rights reserved.