Implement smart search / Fuzzy string comparison
Asked Answered
K

2

9

I have a web page on an ASP.NET MVC application where customers search for suppliers. The suppliers capture their own details on the website. The client wants a "smart search" feature, where they could search for suppliers and find them even if the supplier spelling is "slightly different" to what is typed in the search box.

I have no idea what the client's notion of "slightly different" is. I've been looking into implementing a custom soundex algorithm. This converts a word into a number based on how it sounds. That number is then used for comparison.

For example:

Zach

Zack

will encode to the same value. Are there any other options I could possible look into?

Kalif answered 25/7, 2014 at 5:39 Comment(1)
this link may help you -a linkMorisco
C
8

You can use Levenshtein distance combined with a 'tags' field on Suppliers in your database for 'smart search' style functionality.

It's pretty basic but works for well for cases such as 'Zack/Zach'.

Adding tags in your database allows you to handle situations where people may search for a supplier by their acronym or other colloquial name.

See How to calculate distance similarity measure of given 2 strings? and http://www.dotnetperls.com/levenshtein for implementation details.

Costotomy answered 25/7, 2014 at 6:28 Comment(2)
Your suggestion helped me find this library on codeplex that implements Levenshtein distance (and many other algorithms) and seems really easy to use. I'm going to give it a try :)Kalif
Glad I could point you in the right direction. Nice find on the FuzzyString library.Costotomy
C
8

What you need is an indexed search with a phonetic analysis filter.

Lucene.NET offers just that.

http://lucene.apache.org/core/4_0_0/analyzers-phonetic/org/apache/lucene/analysis/phonetic/PhoneticFilterFactory.html

How to perform Phonetic and Aproximative search in Lucene.net

See here for the .NET version of Phonetix:
http://sourceforge.net/projects/phonetixnet/

Here some more info on how to implement it in C#:
lucene.net phonetic filter

You can also use a BeiderMorseEncoder, which is designed to handle many languages.

On the subject of finding similarly spelled words, why not using a fuzzy search instead ?
how to do fuzzy search in Lucene.net in asp.net?
Lucene.net Fuzzy Phrase Search

There are also a whole lot of string metrics functions that you could use via CLR-Stored-Procedure: http://anastasiosyal.com/post/2009/01/11/Beyond-SoundEx-Functions-for-Fuzzy-Searching-in-MS-SQL-Server

Covin answered 25/7, 2014 at 5:47 Comment(0)
C
8

You can use Levenshtein distance combined with a 'tags' field on Suppliers in your database for 'smart search' style functionality.

It's pretty basic but works for well for cases such as 'Zack/Zach'.

Adding tags in your database allows you to handle situations where people may search for a supplier by their acronym or other colloquial name.

See How to calculate distance similarity measure of given 2 strings? and http://www.dotnetperls.com/levenshtein for implementation details.

Costotomy answered 25/7, 2014 at 6:28 Comment(2)
Your suggestion helped me find this library on codeplex that implements Levenshtein distance (and many other algorithms) and seems really easy to use. I'm going to give it a try :)Kalif
Glad I could point you in the right direction. Nice find on the FuzzyString library.Costotomy

© 2022 - 2024 — McMap. All rights reserved.