Thinking sphinx fuzzy search?
Asked Answered
B

3

6

I am implementing sphinx search in my rails application.
I want to search with fuzzy on. It should search for spelling mistakes e.g if is enter search query charact*a*ristics, it should search for charact*e*ristics.

How should I implement this

Baumgardner answered 19/5, 2011 at 9:51 Comment(0)
L
6

Sphinx doesn't naturally allow for spelling mistakes - it doesn't care if the words are spelled correctly or not, it just indexes them and matches them.

There's two options around this - either use thinking-sphinx-raspell to catch spelling errors by users when they search, and offer them the choice to search again with an improved query (much like Google does); or maybe use the soundex or metaphone morphologies so words are indexed in a way that accounts for how they sound. Search on this page for stemming, you'll find the relevant section. Also have a read of Sphinx's documentation on the matter as well.

I've no idea how reliable either option would be - personally, I'd opt for #1.

Larry answered 20/5, 2011 at 1:28 Comment(1)
Thanks pat, I thought of using raspell, but doesn't suits my requirements. I am reading email content and searching for possible product names ordered through the email. I have no way to suggest the user with corrected options. And with raspell it happens that it replaces some abbreviated names to irrelevant alternatives like led(LED) replaced with lid. Tried with soundex and metaphone too, it improved results for me but not accurate.Baumgardner
L
3

By default, Sphinx does not pay any attention to wildcard searching using an asterisk character. You can turn it on, though:

development:
  enable_star: true
  # ... repeat for other environments

See http://pat.github.io/thinking-sphinx/advanced_config.html Wildcard/Star Syntax section.

Landgrave answered 19/5, 2011 at 16:56 Comment(0)
J
2

Yes, Sphinx generaly always uses the extended match modes.

There are the following matching modes available:

SPH_MATCH_ALL, matches all query words (default mode);
SPH_MATCH_ANY, matches any of the query words;
SPH_MATCH_PHRASE, matches query as a phrase, requiring perfect match;
SPH_MATCH_BOOLEAN, matches query as a boolean expression (see Section 5.2, “Boolean query syntax”);
SPH_MATCH_EXTENDED, matches query as an expression in Sphinx internal query language (see Section 5.3, “Extended query syntax”);
SPH_MATCH_EXTENDED2, an alias for SPH_MATCH_EXTENDED;
SPH_MATCH_FULLSCAN, matches query, forcibly using the "full scan" mode as below. NB, any query terms will be ignored, such that filters, filter-ranges and grouping will still be applied, but no text-matching.

SPH_MATCH_EXTENDED2 was used during 0.9.8 and 0.9.9 development cycle, when the internal matching engine was being rewritten (for the sake of additional functionality and better performance). By 0.9.9-release, the older version was removed, and SPH_MATCH_EXTENDED and SPH_MATCH_EXTENDED2 are now just aliases.

enable_star

Enables star-syntax (or wildcard syntax) when searching through prefix/infix indexes. >Optional, default is is 0 (do not use wildcard syntax), for compatibility with 0.9.7. >Known values are 0 and 1.

For example, assume that the index was built with infixes and that enable_star is 1. Searching should work as follows:

"abcdef" query will match only those documents that contain the exact "abcdef" word in them.
"abc*" query will match those documents that contain any words starting with "abc" (including the documents which contain the exact "abc" word only);
"*cde*" query will match those documents that contain any words which have "cde" characters in any part of the word (including the documents which contain the exact "cde" word only).
"*def" query will match those documents that contain any words ending with "def" (including the documents that contain the exact "def" word only).

Example:

enable_star = 1

Jdavie answered 10/8, 2013 at 7:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.