How to implement sublime text like fuzzy search?
Asked Answered
K

7

28

How can i implement a sublime-like fuzzy search on select2?

Example, typing "sta jav sub" would match "Stackoverflow javascript sublime like"

Kalinda answered 4/6, 2013 at 0:10 Comment(0)
K
13

select2 allows you to implement your own "matcher" functions (as seen on their docs), using that and some regexp you can do something like:

$("#element").select2({
    matcher: function(term, text, opt) {
        //We call to uppercase to do a case insensitive match
        //We replace every group of whitespace characters with a .+
        //matching any number of characters
        return text.toUpperCase().match(term.toUpperCase().replace(/\s+/g, '.+'));
    }
});

A matcher function is invoked against every select2 list element when filtering / searching the list, you could implement any kind of custom search using that.

Kalinda answered 4/6, 2013 at 0:12 Comment(0)
H
23

Here's an alternate matching function. http://jsfiddle.net/trevordixon/pXzj3/4/

function match(search, text) {
    search = search.toUpperCase();
    text = text.toUpperCase();

    var j = -1; // remembers position of last found character

    // consider each search character one at a time
    for (var i = 0; i < search.length; i++) {
        var l = search[i];
        if (l == ' ') continue;     // ignore spaces

        j = text.indexOf(l, j+1);     // search for character & update position
        if (j == -1) return false;  // if it's not found, exclude this item
    }
    return true;
}

This one's faster (according to this test in Chrome), which may start to matter if you're filtering a lot of items.

Hapsburg answered 4/6, 2013 at 1:15 Comment(6)
I like your implementation, but yours will not try to match together the characters on the search string, ej ABS would match Any Bachelor Sunrise, which might or might not be what you would expect :)Kalinda
Isn't that exactly what's expected from a fuzzy search?Incurve
@Incurve He means that A B S should match Any Bachelor Sunrise, but ABS should only match, say, ABS Building Co.Voronezh
@Voronezh I think ABS should match both, but one the second match should get a higher ranking than the firstPocahontas
I found that this never searches for the first character because j never equals 0 when doing the indexOf method. It would be better just to use j as is, then after the if condition add j++ to update the position instead of within the indexOf method.Copt
I based my solution on this. just updated it to work with newer select2.Gethsemane
K
13

select2 allows you to implement your own "matcher" functions (as seen on their docs), using that and some regexp you can do something like:

$("#element").select2({
    matcher: function(term, text, opt) {
        //We call to uppercase to do a case insensitive match
        //We replace every group of whitespace characters with a .+
        //matching any number of characters
        return text.toUpperCase().match(term.toUpperCase().replace(/\s+/g, '.+'));
    }
});

A matcher function is invoked against every select2 list element when filtering / searching the list, you could implement any kind of custom search using that.

Kalinda answered 4/6, 2013 at 0:12 Comment(0)
B
9

I wrote some that works very much long Sublime Text's fuzzy match. Achieving this requires a few things.

First, match all characters from a pattern in sequence. Second, score matches such that certain matched characters are worth more points than other.

I came up with a few factors to check for. "CamelCase" letters or letters following a separator (space or underscore) are worth a lot of points. Consecutive matches are worth more. Results found near the start are worth more.

A critically important trick is to find the best matching character. Which is not necessarily the first. Consider fuzzy_match("tk", "The Black Knight"). There are two Ks which could be matched. The second is worth more points because it follows a space.

JavaScript code is below. There is some nuance which is described in more detail in a blog post. There's also an interactive demo. And full source (includes demo, plus C++ implementation) on GitHub.

  • Blog Post
  • Interactive Demo
  • GitHub

    // Returns [bool, score, formattedStr]
    // bool: true if each character in pattern is found sequentially within str
    // score: integer; higher is better match. Value has no intrinsic meaning. Range varies with pattern. 
    //        Can only compare scores with same search pattern.
    // formattedStr: input str with matched characters marked in <b> tags. Delete if unwanted.
    
    function fuzzy_match(pattern, str) {
        // Score consts
        var adjacency_bonus = 5;                // bonus for adjacent matches
        var separator_bonus = 10;               // bonus if match occurs after a separator
        var camel_bonus = 10;                   // bonus if match is uppercase and prev is lower
        var leading_letter_penalty = -3;        // penalty applied for every letter in str before the first match
        var max_leading_letter_penalty = -9;    // maximum penalty for leading letters
        var unmatched_letter_penalty = -1;      // penalty for every letter that doesn't matter
    
        // Loop variables
        var score = 0;
        var patternIdx = 0;
        var patternLength = pattern.length;
        var strIdx = 0;
        var strLength = str.length;
        var prevMatched = false;
        var prevLower = false;
        var prevSeparator = true;       // true so if first letter match gets separator bonus
    
        // Use "best" matched letter if multiple string letters match the pattern
        var bestLetter = null;
        var bestLower = null;
        var bestLetterIdx = null;
        var bestLetterScore = 0;
    
        var matchedIndices = [];
    
        // Loop over strings
        while (strIdx != strLength) {
            var patternChar = patternIdx != patternLength ? pattern.charAt(patternIdx) : null;
            var strChar = str.charAt(strIdx);
    
            var patternLower = patternChar != null ? patternChar.toLowerCase() : null;
            var strLower = strChar.toLowerCase();
            var strUpper = strChar.toUpperCase();
    
            var nextMatch = patternChar && patternLower == strLower;
            var rematch = bestLetter && bestLower == strLower;
    
            var advanced = nextMatch && bestLetter;
            var patternRepeat = bestLetter && patternChar && bestLower == patternLower;
            if (advanced || patternRepeat) {
                score += bestLetterScore;
                matchedIndices.push(bestLetterIdx);
                bestLetter = null;
                bestLower = null;
                bestLetterIdx = null;
                bestLetterScore = 0;
            }
    
            if (nextMatch || rematch) {
                var newScore = 0;
    
                // Apply penalty for each letter before the first pattern match
                // Note: std::max because penalties are negative values. So max is smallest penalty.
                if (patternIdx == 0) {
                    var penalty = Math.max(strIdx * leading_letter_penalty, max_leading_letter_penalty);
                    score += penalty;
                }
    
                // Apply bonus for consecutive bonuses
                if (prevMatched)
                    newScore += adjacency_bonus;
    
                // Apply bonus for matches after a separator
                if (prevSeparator)
                    newScore += separator_bonus;
    
                // Apply bonus across camel case boundaries. Includes "clever" isLetter check.
                if (prevLower && strChar == strUpper && strLower != strUpper)
                    newScore += camel_bonus;
    
                // Update patter index IFF the next pattern letter was matched
                if (nextMatch)
                    ++patternIdx;
    
                // Update best letter in str which may be for a "next" letter or a "rematch"
                if (newScore >= bestLetterScore) {
    
                    // Apply penalty for now skipped letter
                    if (bestLetter != null)
                        score += unmatched_letter_penalty;
    
                    bestLetter = strChar;
                    bestLower = bestLetter.toLowerCase();
                    bestLetterIdx = strIdx;
                    bestLetterScore = newScore;
                }
    
                prevMatched = true;
            }
            else {
                // Append unmatch characters
                formattedStr += strChar;
    
                score += unmatched_letter_penalty;
                prevMatched = false;
            }
    
            // Includes "clever" isLetter check.
            prevLower = strChar == strLower && strLower != strUpper;
            prevSeparator = strChar == '_' || strChar == ' ';
    
            ++strIdx;
        }
    
        // Apply score for last match
        if (bestLetter) {
            score += bestLetterScore;
            matchedIndices.push(bestLetterIdx);
        }
    
        // Finish out formatted string after last pattern matched
        // Build formated string based on matched letters
        var formattedStr = "";
        var lastIdx = 0;
        for (var i = 0; i < matchedIndices.length; ++i) {
            var idx = matchedIndices[i];
            formattedStr += str.substr(lastIdx, idx - lastIdx) + "<b>" + str.charAt(idx) + "</b>";
            lastIdx = idx + 1;
        }
        formattedStr += str.substr(lastIdx, str.length - lastIdx);
    
        var matched = patternIdx == patternLength;
        return [matched, score, formattedStr];
    }
    
Berm answered 29/3, 2016 at 6:55 Comment(0)
W
3

albertein's answer doesn't match Trevor's version because the original function performs matching on character basis and not on word basis. Here's a simpler one matching on character basis:

$("#element").select2({
  matcher: function(term, text, opts) {
    var pattern = term.replace(/\s+/g, '').split('').join('.*');
    text.match(new RegExp(pattern, 'i'))
  }
})
Whiff answered 4/7, 2014 at 9:5 Comment(0)
C
1
var fuzzysearch = function (querystrings, values) {
    return !querystrings.some(function (q) {
        return !values.some(function (v) {
            return v.toLocaleLowerCase().indexOf(q) !== -1;
        });
    });
}

Example searching for title and author in book collection http://jsfiddle.net/runjep/r887etnh/2/

For a 9kb alternative which ranks the search result: http://kiro.me/projects/fuse.html

You may need a polyfill for the 'some' function https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/some

var books = [{
    id: 1,
    title: 'The Great Gatsby',
    author: 'F. Scott Fitzgerald'
}, {
    id: 2,
    title: 'The DaVinci Code',
    author: 'Dan Brown'
}, {
    id: 3,
    title: 'Angels & Demons',
    author: 'Dan Brown'
}];
search = function () {
    var queryarray = document.getElementById('inp').value.trim().toLowerCase().split(' ');
    var res = books.filter(function (b) {
        return fs(queryarray, [b.title, b.author]);
    });
    document.getElementById('res').innerHTML = res.map(function (b) {
        return b.title + ' <i> ' + b.author + '</i>';
    }).join('<br/> ');
}
fs = function (qs, vals) {
    return !qs.some(function (q) {
        return !vals.some(function (v) {
            return v.toLocaleLowerCase().indexOf(q) !== -1;
        });
    });
}
<input id="inp" />
<button id="but" onclick="search()">Search</button>
<div id="res"></div>
Cupcake answered 7/1, 2015 at 19:27 Comment(1)
This is vanilla javascript and not select 2 - sorryCupcake
N
0
function fuzzyMe(term, query) {
  var score = 0;
  var termLength = term.length;
  var queryLength = query.length;
  var highlighting = '';
  var ti = 0;
  // -1 would not work as this would break the calculations of bonus
  // points for subsequent character matches. Something like
  // Number.MIN_VALUE would be more appropriate, but unfortunately
  // Number.MIN_VALUE + 1 equals 1...
  var previousMatchingCharacter = -2;
  for (var qi = 0; qi < queryLength && ti < termLength; qi++) {
    var qc = query.charAt(qi);
    var lowerQc = qc.toLowerCase();

    for (; ti < termLength; ti++) {
      var tc = term.charAt(ti);

      if (lowerQc === tc.toLowerCase()) {
        score++;

        if ((previousMatchingCharacter + 1) === ti) {
          score += 2;
        }

        highlighting += "<em>" + tc + "</em>";
        previousMatchingCharacter = ti;
        ti++;
        break;
      } else {
        highlighting += tc;
      }
    }
  }

  highlighting += term.substring(ti, term.length);

  return {
    score: score,
    term: term,
    query: query,
    highlightedTerm: highlighting
  };
}

The above takes care of the fuzziness. Then you can just iterate over all your select 2 elements

$("#element").select2({
  matcher: function(term, text, opt) {
    return fuzzyMe(term, text).highlightedTerm;
  }
});

Credit for fuzzy code -: https://github.com/bripkens/fuzzy.js

Nomarch answered 9/10, 2015 at 5:9 Comment(0)
G
0

had difficulties with new select2, here what worked

 $("#foo").select2({
   matcher: matcher
 });

function matcher(params, data) {
  // return all opts if seachbox is empty
  if(!params.term) {
    return data;
  } else if(data) {
    var term = params.term.toUpperCase();
    var option = data.text.toUpperCase();
    var j = -1; // remembers position of last found character

    // consider each search character one at a time
    for (var i = 0; i < term.length; i++) {
      var l = term[i];
      if (l == ' ') continue;     // ignore spaces

      j = option.indexOf(l, j+1);     // search for character & update position
      if (j == -1) return false;  // if it's not found, exclude this item
    }
    return data; // return option
  }
}
Gethsemane answered 17/9, 2016 at 23:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.