How to make an International Soundex?

About

Asked 24/9, 2008 at 14:42 Answered 24/9, 2008 at 15:51

E.g. the Soundex algorithm is optimized for English. Is there a more universal algorithm that would apply across large families of languages?

Hulsey answered 24/9, 2008 at 14:42 Comment(0)

SOUNDEX is indeed English-oriented. Two others that take a wider variety of phonetic differences into account are: Double Metaphone and NYSIIS.

They produce encodings into a much larger possible space than SOUNDEX does. Double Metaphone, specifically, includes reductions with the express purpose of handling alternate pronunciations based on more languages than English.

I did a presentation on fuzzy string matching recently, the slides may be helpful.

Abysm answered 24/9, 2008 at 15:51 Comment(5)

The link to your slides is broken (404) – Belted 26/9, 2009 at 5:29

@John: new link seems to be asymmetrical-view.com/talks/#fuzzy-string-matching – Gnostic 4/3, 2011 at 8:54

Thanks, I just updated it to point to the PDF in the related github repo - I hope that stays more constant. Thanks. – Abysm 9/3, 2011 at 22:46

On Slide 38, you're showing percentage similarities that are above %50 - I'm not saying it's wrong, but what formula are you using to calculate the similarity percentage from the edit distance? The formula I've seen 1 / (1 + dist) maxes out at 50% for inexact matches. I know your costs are variable, but 1 / 1.4 != %93 which is the number you show in your slide. Thanks! – Veilleux 25/3, 2011 at 11:56

I may not have the version you do - for me slide 38 is an edit distance grid :( Which words are being compared that you're looking at? The distance formula I usually use is (max(len(a),len(b)) - num_edits) / max(len(a),len(b)). If you're looking at the Text Brew algorithm, it allows different costs for the various edits, I'm pretty sure I used the same formula - there is sample code in the github repo...if you can tell me what's on the slide in question I can probably better answer your question...or email me and we'll figure it out. – Abysm 26/3, 2011 at 22:38

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags