How to implement fuzzy search with SQLite's FTS3?
Asked Answered
F

2

12

I have already integrated search based on the official Android documentation and I'm using the following SQLite schema and query:

CREATE VIRTUAL TABLE Search USING FTS3 (
    _id,
    name,
    location
);

select * from Search where name MATCH ?
-- where ? is the user typed exact "query"
-- or if it doesn't have spaces or stars I append a star to search prefix: "query*"

I'm wondering how can I extend it? to allow the following:

Say I have some items named:

  • My Fancy Item
  • My Secret Item
  • Item #1
  • Your Fancy Item

When the user types blah in the search box the search results would show:

  • my
    • My Fancy Item
    • My Secret Item
  • mfi
    • My Fancy Item
  • fan item, fanit, fit
    • My Fancy Item
    • Your Fancy Item
  • it, item, im, itm
    • My Fancy Item
    • My Secret Item
    • Item #1
    • Your Fancy Item

The results should be ranked based on how good the match is, for example if the letters are farther away they should rank lower than an exact match, like for mfi: "My Fancy Item" should rank last and "MFI thingy" should rank first (if there was such an item).

Note: my min SDK is API level 10, which means it has to work SQLite 3.6.22.

Similar functionality can be found mostly in IDEs:

Fricative answered 19/12, 2014 at 10:36 Comment(10)
There is not 1 way to implement fuzzy search. Your requirements are much too vague to come up with a proper answer.Papst
*l*i*k*e*t*h*i*s* ... but i do worry about efficiency of this solutionPaeon
@popovitsj User types anything and if it matches some part of the name it shows the result.Fricative
If it's that simple than @Paeon is right, but that's hardly useful. It will give a lot of 'unexpected' matches from the user's perspective.Papst
If that's a requirement you should edit it into your question.Papst
@popovitsj if you know a better strategy then please put it as an answer (without re-implementing FTS3 in Java), I'm open to anything that's better than case insensitive "prefix/contains" matching.Fricative
well, i think that there is no build-in rank function ... it is hard to add new functions to SQLite on android(the only way is to recompile - like sqlcipher)Paeon
You can use sim metric for fuzzy search, useful for limited numbers of entry Link: sourceforge.net/projects/simmetricsKweiyang
Did you find a solution/answer to this?Frigg
@Frigg see my new answerFricative
B
4

SQLite's FTS allows searches only for entire words, or for word prefixes.

There is no built-in functionality for fuzzy searches like this. (And the Android database API does not allow you to add custom virtual table implementations.)

Berber answered 19/12, 2014 at 10:47 Comment(1)
Any idea for alternatives?Fricative
F
1

I went with relaxing my criteria to search all word beginnings:

private static String fixQuery(String query) {
    return query.trim().replaceAll("\\s+", "*") + "*";
}

it works pretty well. Not typo-resistant, but it feels natural when I'm using it.

Fricative answered 14/7, 2015 at 17:14 Comment(5)
Can you explain a bit what this does?Frigg
@Frigg "search all word beginnings", \\s+ regex means "any contiguous whitespace" so ab cd ef will become ab*cd*ef* and see the docs what * meansFricative
Thanks. I think that's what an article of a Python-implemented fuzzy search does. I'm using Laravel, and found I could simply use scope queries for this. If that doesn't work, I'll come back to your solution. THank you.Frigg
I think * can only appear at the end of search term it can't appear in the middleEsposito
@MohammadYahia I know it works in the middle as well, I'm using this in my inventory app: fixQuery("but saf") -> "but*saf*" matches "Random Buttons and Safety Pins".Fricative

© 2022 - 2024 — McMap. All rights reserved.