SQL Server's SoundEx function on non-Latin character sets?
Asked Answered
M

4

3

Does SQL Server's (2000) Soundex function work on Asian character sets? I used it in a query and it appears to have not worked properly but I realize that it could be because I don't know how to read Chinese...

Furthermore, are there any other languages where the function might have trouble working on? (Russian for example)

Thank you,
Frank

Musa answered 18/11, 2008 at 20:4 Comment(2)
One language you'll most likely have issues with is Arabic. Most folks who use SOUNDEX typically roll their own solution for that...Acetate
Thank you for the tip. Luckily we don't handle Arabic in this particular database... At least not yet.Musa
S
4

Soundex is fairly specific to English - it may or may not work well on other languages. One example that happened in New Zealand was an attempt at patient name matching using Soundex. Unfortunately pacific island names did not work well with Soundex, in many cases hashing to the same small set of values. A different algorithm had to be used.

Your mileage may vary. On more recent versions of SQL Server you could write a CLR function to do some other computation.

Swithin answered 18/11, 2008 at 20:57 Comment(0)
D
2

By design it works best on English sentences using the ASCII character set. I have used it on a project in Romania where I replaced the Romanian special characters with corresponding ASCII characters that sound more or less the same. It is not perfect but in my case it was a lot better than nothing.

I think you will have no great success with applying SOUNDEX on Asian character sets.

Dutiable answered 18/11, 2008 at 20:11 Comment(0)
A
2

I know that soundex in older versions of SQLServer ignored any non-english characters. I believe it didn't even handle Latin-1, let alone anything more exotic.

I never dealt with soundex much in SQL2k, all I know for certain was that it does not handle Arabic correctly. This likely extends to other non-latin character sets as well.

In any case, a soundex based algorithm is unlikely to yield acceptable results for non-english languages even aside from character set issues. Soundex was specifically designed to handle the English pronunciation of names (mostly those of Western European origin) and does not function particularly well outside of that use. You would often be better off researching any of several variants of soundex or other unrelated phonetic similarity algorithms which are designed to address the language(s) in question.

Anchises answered 18/11, 2008 at 20:44 Comment(0)
T
0

You may use an algorithm like Levenshtein distance. There are various implementations of the algorithm as user-defined functions which you may use within a SELECT statement.

Tight answered 24/8, 2021 at 17:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.