How to detect a typo in a product search and suggest possible corrections?
Asked Answered
E

3

9

Given a very large database of product names, how would you detect possible typos in user searches and suggest possible corrections (Kinda like the way Google presents them)?

E.g.

User enters "fork handels" and presses 'search'.

They get back

"No results. Did you mean 'fork handles'?"

Elijah answered 28/1, 2009 at 9:27 Comment(1)
I did type that at first, but thought it might have obscured the question a bit :)Elijah
P
16

There are several approaches for this problem:

  1. Keeping a table of most popular misspellings in your database. If you need some common misspellings: here)
  2. Using an algorithm based on the edit distance: In information theory and computer science, the edit distance between two strings of characters is the number of operations required to transform one of them into the other. There are several different algorithms to define or calculate this metric. Read the Wikipedia article for the Levenshtein algorithm for example.
  3. If you are using Lucene for full text search, here is a nice article which shows how to implement the "Did you mean" feature.
  4. If you see that feature as simple spell-correction, here are some nice, very short implementations in several languages: How to Write a Spelling Corrector
Poindexter answered 28/1, 2009 at 9:43 Comment(1)
Since link in point 3 is dead, just in case somebody wants to refer,it is present in Oracle Community Archive OR you can look it in WayBackMachineSimile
U
3

You could use a phonetic algorithm, such as Soundex to find matches that sound similar.

PostgreSQL has a module named fuzzystrmatch, with the docs showing examples of using Soundex, Levenshtein, Metaphone, and Double Metaphone.

Urethrectomy answered 28/1, 2009 at 9:37 Comment(0)
L
1

I'm sure I read that google keeps a list of what a user retypes when they get no results. You could keep a mapping of these values (say if the retyped string begins with the same letter).

Lora answered 28/1, 2009 at 9:40 Comment(1)
That is a good idea, though I suspect that it may work for Google thanks in part by the sheer unimaginable volume of requests that they process. A lower-traffic site may struggle to build a useful-sized database.Elijah

© 2022 - 2024 — McMap. All rights reserved.