Soundex seems to be implemented in some DBMS's, but have there been any algorithmic improvements that are definitively better than the current implementation of Soundex?
Yes. As Wikipedia points out, there's Metaphone and Double Metaphone, NYSIIS and more.
Keep in mind that these only works for English, which has its own particular problems with its orthography. It's hardly needed for Spanish, and doesn't make sense for Chinese/Mandarin.
I don't know about "definitively better", but you might want to look at Metaphone (and its variants) and Caverphone. See, e.g., http://www.atomodo.com/code/double-metaphone where there's an implementation of "Double Metaphone" for use with MYSQL.
Metaphone 3 is the third generation of the Metaphone algorithm. It increases the accuracy of phonetic encoding from the 89% of Double Metaphone to 98%, as tested against a database of the most common English words, and names and non-English words familiar in North America. This produces an extremely reliable phonetic encoding for American pronunciations.
Metaphone 3 was designed and developed by Lawrence Philips, who designed and developed the original Metaphone and Double Metaphone algorithms.
© 2022 - 2024 — McMap. All rights reserved.