Is it possible to do a Levenshtein distance in Excel without having to resort to Macros?
Asked Answered
R

4

10

Let me explain.

I have to do some fuzzy matching for a company, so ATM I use a levenshtein distance calculator, and then calculate the percentage of similarity between the two terms. If the terms are more than 80% similar, Fuzzymatch returns "TRUE".

My problem is that I'm on an internship, and leaving soon. The people who will continue doing this do not know how to use excel with macros, and want me to implement what I did as best I can.

So my question is : however inefficient the function may be, is there ANY way to make a standard function in Excel that will calculate what I did before, without resorting to macros ?

Thanks.

Rogue answered 5/7, 2012 at 13:10 Comment(0)
L
16

If you came about this googling something like levenshtein distance google sheets

I threw this together, with the code comment from milot-midia on this gist (https://gist.github.com/andrei-m/982927 - code under MIT license)

  • From Sheets in the header menu, Tools -> Script Editor
  • Name the project
    • The name of the function (not the project) will let you use the func
  • Paste the following code

function Levenshtein(a, b) {
  if(a.length == 0) return b.length; 
  if(b.length == 0) return a.length;

  // swap to save some memory O(min(a,b)) instead of O(a)
  if(a.length > b.length) {
    var tmp = a;
    a = b;
    b = tmp;
  }

  var row = [];
  // init the row
  for(var i = 0; i <= a.length; i++){
    row[i] = i;
  }

  // fill in the rest
  for(var i = 1; i <= b.length; i++){
    var prev = i;
    for(var j = 1; j <= a.length; j++){
      var val;
      if(b.charAt(i-1) == a.charAt(j-1)){
        val = row[j-1]; // match
      } else {
        val = Math.min(row[j-1] + 1, // substitution
                       prev + 1,     // insertion
                       row[j] + 1);  // deletion
      }
      row[j - 1] = prev;
      prev = val;
    }
    row[a.length] = prev;
  }

  return row[a.length];
}

You should be able to run it from a spreadsheet with

=Levenshtein(cell_1,cell_2)

Limerick answered 25/2, 2016 at 18:21 Comment(0)
W
2

While it can't be done in a single formula for any reasonably-sized strings, you can use formulas alone to compute the Levenshtein Distance between strings using a worksheet.

Here is an example that can handle strings up to 15 characters, it could be easily expanded for more:

https://docs.google.com/spreadsheet/ccc?key=0AkZy12yffb5YdFNybkNJaE5hTG9VYkNpdW5ZOWowSFE&usp=sharing

This isn't practical for anything other than ad-hoc comparisons, but it does do a decent job of showing how the algorithm works.

Welltodo answered 22/6, 2013 at 0:11 Comment(0)
I
0

looking at the previous answers to calculating Levenshtein distance, I think it would be impossible to create it as a formula.

Take a look at the code here

Inapprehensible answered 5/7, 2012 at 13:29 Comment(1)
I already have a code, thanks. Actually I have multiple, because I coded one, found the one you linkde, found the one on the MrExcel forums and extracted one fro mthe FuzzyVlookup. I'm just hoping it can be used as a formula... If not, well I'll have to find another way, i guess.Rogue
C
0

Actually, I think I just found a workaround. I was adding it in the wrong part of the code...

Adding this line

  } else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
    val = row[j-1]-0.33;  //transposition

so it now reads

  if(b.charAt(i-1) == a.charAt(j-1)){
    val = row[j-1]; // match
  } else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
    val = row[j-1]-0.33;  //transposition
  } else {
    val = Math.min(row[j-1] + 1, // substitution
                   prev + 1,     // insertion
                   row[j] + 1);  // deletion 
  } 

Seems to fix the problem. Now 'biulding' is 92% accurate and 'bilding' is 88%. (whereas with the original formula 'biulding' was only 75%... despite being closer to the correct spelling of building)

Cornelie answered 17/8, 2018 at 12:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.