Use SOUNDEX() word by word on SQL Server
Asked Answered
A

5

6

Here is my problem. For example I have a table Products that contains a field, Name:

Products
ID | Name | ..
1  | "USB Key 10Go"
2  | "Intel computer"
3  | "12 inches laptop computer"
...

I'm currently implementing a simple search engine (SQL Server and ASP .NET, C#) for an iPhone web-app and I would like to use the SOUNDEX() SQL Server function.

The thing is, I can't directly use SOUNDEX on the Name field. (This would be irrelevant since there are several words in the name.) I would like to apply the SOUNDEX function to each word from the Name field, and then see if any of them matches the researched keyword.

If someone has any clue how to do this, that would be awesome.

Archiplasm answered 17/12, 2009 at 17:41 Comment(3)
How many records are in your products table. A programming solution (i.e. parsing the name field into separate field and then applying soundex to each) is going to perform slowly in a query. You might do better to change your database structure, if that is a possibility.Syringa
yeah, if you're data set is large by any comparison this is going to take forever to run.Commission
Yes the database is large, that's what I'm currently doing (on the C# .NET code, using regexp to count the words number and add relevance by myself to the search results) but its very slow. I'm looking into FREETEXT right now but got some issue with it (see #1924306).Archiplasm
Z
4

Have you looked into the Full-Text Search feature in SQL Server? I know this is not exactly what you asked for. Its just that the SOUNDEX() function is used to find similar SOUNDING names (EX: SMITH and SMYTHE sound the same). In a search engine, however, how a word sounds is less important than the search words themselves. Full-Text Search also lets you use synonyms (allowing you to specify certain words that mean the same thing within your application's context), and have them automatically considered during your search.

Look at these pages for more information about Full Text Search in SQL Server:

Introduction to Full-Text Search

CONTAINS

CONTAINSTABLE

FREETEXT

FREETEXTTABLE

Zoophobia answered 17/12, 2009 at 18:14 Comment(2)
I've seen FREETEXT, the thing is that I have big issues installing it. When I tried to enable the Fulltext mode but I get an error: USE [AspDotNetStorefront] GO EXEC sp_fulltext_database 'enable' GO CREATE FULLTEXT CATALOG searchcatalog GO -> Msg 7609, Level 17, State 100, Line 1 Full-Text Search is not installed, or a full-text component cannot be loaded.Archiplasm
It appears the FULLTEXT service isn't installed and/or running. If you are running SQL Server Express, the FULLTEXT service isn't included unless you have the version that includes advanced services. Here's the link for 2005 (microsoft.com/downloads/…) and here is the link for 2008 (microsoft.com/express/sql/download). If you did install that version, then check if the FullText Search Service is running (configuration manager, under services).Zoophobia
D
5

Rather than use Soundex you might be better off computing the Levenshtein distance between the two strings. See the Wikipedia article on Levenshtein distance.

There's a TSQL implementation of the Levenshtein distance algorithm here.

Share and enjoy.


EDIT 03-May-2012

Since writing my original response I've learned that Oracle includes the Levenshtein distance and several other "string similarity" functions in the UTL_MATCH package, which I believe is a standard part of the database. Documentation here. Perhaps not directly related to the original post (which was for SQL Server) but perhaps useful as many shops use multiple databases.

Darbies answered 17/12, 2009 at 17:52 Comment(0)
Z
4

Have you looked into the Full-Text Search feature in SQL Server? I know this is not exactly what you asked for. Its just that the SOUNDEX() function is used to find similar SOUNDING names (EX: SMITH and SMYTHE sound the same). In a search engine, however, how a word sounds is less important than the search words themselves. Full-Text Search also lets you use synonyms (allowing you to specify certain words that mean the same thing within your application's context), and have them automatically considered during your search.

Look at these pages for more information about Full Text Search in SQL Server:

Introduction to Full-Text Search

CONTAINS

CONTAINSTABLE

FREETEXT

FREETEXTTABLE

Zoophobia answered 17/12, 2009 at 18:14 Comment(2)
I've seen FREETEXT, the thing is that I have big issues installing it. When I tried to enable the Fulltext mode but I get an error: USE [AspDotNetStorefront] GO EXEC sp_fulltext_database 'enable' GO CREATE FULLTEXT CATALOG searchcatalog GO -> Msg 7609, Level 17, State 100, Line 1 Full-Text Search is not installed, or a full-text component cannot be loaded.Archiplasm
It appears the FULLTEXT service isn't installed and/or running. If you are running SQL Server Express, the FULLTEXT service isn't included unless you have the version that includes advanced services. Here's the link for 2005 (microsoft.com/downloads/…) and here is the link for 2008 (microsoft.com/express/sql/download). If you did install that version, then check if the FullText Search Service is running (configuration manager, under services).Zoophobia
H
2

If you have to do it all in the RDBMS, a UDF would be the best if it's an option.

Otherwise, you could use this technique to at least soundex the first four words individually using PARSENAME:

From How do I split a string so I can access item x?:

PARSENAME(REPLACE('12 inches laptop computer', ' ', '.'), 1)  --return computer
PARSENAME(REPLACE('12 inches laptop computer', ' ', '.'), 2)  --return laptop
...

However: using PARSENAME in this way is a hack and a serious limitation is it only works for a max of 4 parts. If there are 5 or more words PARSENAME will return NULL, so you have to check for that with a conditional and degrade gracefully.

Here's a simplified example (again, without the NULL checks)

SELECT *
FROM Products 
WHERE SOUNDEX(search_input) = SOUNDEX(PARSENAME(REPLACE(Name, ' ', '.'), 4))
  OR SOUNDEX(search_input) = SOUNDEX(PARSENAME(REPLACE(Name, ' ', '.'), 3))
  OR SOUNDEX(search_input) = SOUNDEX(PARSENAME(REPLACE(Name, ' ', '.'), 2))
  OR SOUNDEX(search_input) = SOUNDEX(PARSENAME(REPLACE(Name, ' ', '.'), 1))
Haiduk answered 17/12, 2009 at 18:17 Comment(0)
I
1

The SOUNDEX may be a perfect fit for your purpuse, but please remember that it may not provide good results for anything else than british or american english spoken words! It may even be used on german phonetical spoken words, but it will not work well with ANYTHING else.

Infeasible answered 30/12, 2009 at 10:32 Comment(0)
S
1

You could try storing the metaphone of each word concatenated with hyphens. EG stored_metaphone field could contain something like '-AKTRF-SPLS-'. Then build a query like this:

$where = '(';
$search_sql = array();
$search_terms = explode(' ',$search);
foreach ($search_terms as $term) {
    $search_sql[] = "`stored_metaphone` LIKE '%-".metaphone($term)."-%'";
}
$where .= implode(' OR ',$search_sql);
$where .= ')';

NB this is only the WHERE part of the query.

As far as I know metaphone only works with English. The above sql is working rather well on a number of sites.

Sindysine answered 3/6, 2011 at 13:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.