Implementing a simple Trie for efficient Levenshtein Distance calculation - Java

Asked 1/2, 2011 at 23:1 Answered 2/7, 2016 at 6:58

Solved java algorithm performance trie levenshtein-distance

UPDATE 3

Done. Below is the code that finally passed all of my tests. Again, this is modeled after Murilo Vasconcelo's modified version of Steve Hanov's algorithm. Thanks to all that helped!

/**
 * Computes the minimum Levenshtein Distance between the given word (represented as an array of Characters) and the
 * words stored in theTrie. This algorithm is modeled after Steve Hanov's blog article "Fast and Easy Levenshtein
 * distance using a Trie" and Murilo Vasconcelo's revised version in C++.
 * 
 * http://stevehanov.ca/blog/index.php?id=114
 * http://murilo.wordpress.com/2011/02/01/fast-and-easy-levenshtein-distance-using-a-trie-in-c/
 * 
 * @param ArrayList<Character> word - the characters of an input word as an array representation
 * @return int - the minimum Levenshtein Distance
 */
private int computeMinimumLevenshteinDistance(ArrayList<Character> word) {

    theTrie.minLevDist = Integer.MAX_VALUE;

    int iWordLength = word.size();
    int[] currentRow = new int[iWordLength + 1];

    for (int i = 0; i <= iWordLength; i++) {
        currentRow[i] = i;
    }

    for (int i = 0; i < iWordLength; i++) {
        traverseTrie(theTrie.root, word.get(i), word, currentRow);
    }
    return theTrie.minLevDist;
}

/**
 * Recursive helper function. Traverses theTrie in search of the minimum Levenshtein Distance.
 * 
 * @param TrieNode node - the current TrieNode
 * @param char letter - the current character of the current word we're working with
 * @param ArrayList<Character> word - an array representation of the current word
 * @param int[] previousRow - a row in the Levenshtein Distance matrix
 */
private void traverseTrie(TrieNode node, char letter, ArrayList<Character> word, int[] previousRow) {

    int size = previousRow.length;
    int[] currentRow = new int[size];
    currentRow[0] = previousRow[0] + 1;

    int minimumElement = currentRow[0];
    int insertCost, deleteCost, replaceCost;

    for (int i = 1; i < size; i++) {

        insertCost = currentRow[i - 1] + 1;
        deleteCost = previousRow[i] + 1;

        if (word.get(i - 1) == letter) {
            replaceCost = previousRow[i - 1];
        } else {
            replaceCost = previousRow[i - 1] + 1;
        }

        currentRow[i] = minimum(insertCost, deleteCost, replaceCost);

        if (currentRow[i] < minimumElement) {
            minimumElement = currentRow[i];
        }
    }

    if (currentRow[size - 1] < theTrie.minLevDist && node.isWord) {
        theTrie.minLevDist = currentRow[size - 1];
    }

    if (minimumElement < theTrie.minLevDist) {

        for (Character c : node.children.keySet()) {
            traverseTrie(node.children.get(c), c, word, currentRow);
        }
    }
}

UPDATE 2

Finally, I've managed to get this to work for most of my test cases. My implementation is practically a direct translation from Murilo's C++ version of Steve Hanov's algorithm. So how should I refactor this algorithm and/or make optimizations? Below is the code...

public int search(String word) {

    theTrie.minLevDist = Integer.MAX_VALUE;

    int size = word.length();
    int[] currentRow = new int[size + 1];

    for (int i = 0; i <= size; i++) {
        currentRow[i] = i;
    }
    for (int i = 0; i < size; i++) {
        char c = word.charAt(i);
        if (theTrie.root.children.containsKey(c)) {
            searchRec(theTrie.root.children.get(c), c, word, currentRow);
        }
    }
    return theTrie.minLevDist;
}
private void searchRec(TrieNode node, char letter, String word, int[] previousRow) {

    int size = previousRow.length;
    int[] currentRow = new int[size];
    currentRow[0] = previousRow[0] + 1;

    int insertCost, deleteCost, replaceCost;

    for (int i = 1; i < size; i++) {

        insertCost = currentRow[i - 1] + 1;
        deleteCost = previousRow[i] + 1;

        if (word.charAt(i - 1) == letter) {
            replaceCost = previousRow[i - 1];
        } else {
            replaceCost = previousRow[i - 1] + 1;
        }
        currentRow[i] = minimum(insertCost, deleteCost, replaceCost);
    }

    if (currentRow[size - 1] < theTrie.minLevDist && node.isWord) {
        theTrie.minLevDist = currentRow[size - 1];
    }

    if (minElement(currentRow) < theTrie.minLevDist) {

        for (Character c : node.children.keySet()) {
            searchRec(node.children.get(c), c, word, currentRow);

        }
    }
}

Thank you everyone who contributed to this question. I tried getting the Levenshtein Automata to work, but I couldn't make it happen.

So I'm looking for suggestions on refactoring and/or optimizations regarding the above code. Please let me know if there's any confusion. As always, I can provide the rest of the source code as needed.

UPDATE 1

So I've implemented a simple Trie data structure and I've been trying to follow Steve Hanov's python tutorial to compute the Levenshtein Distance. Actually, I'm interested in computing the minimum Levenshtein Distance between a given word and the words in the Trie, thus I've been following Murilo Vasconcelos's version of Steve Hanov's algorithm. It's not working very well, but here's my Trie class:

public class Trie {

    public TrieNode root;
    public int minLevDist;

    public Trie() {
        this.root = new TrieNode(' ');
    }

    public void insert(String word) {

        int length = word.length();
        TrieNode current = this.root;

        if (length == 0) {
            current.isWord = true;
        }
        for (int index = 0; index < length; index++) {

            char letter = word.charAt(index);
            TrieNode child = current.getChild(letter);

            if (child != null) {
                current = child;
            } else {
                current.children.put(letter, new TrieNode(letter));
                current = current.getChild(letter);
            }
            if (index == length - 1) {
                current.isWord = true;
            }
        }
    }
}

... and the TrieNode class:

public class TrieNode {

    public final int ALPHABET = 26;

    public char letter;
    public boolean isWord;
    public Map<Character, TrieNode> children;

    public TrieNode(char letter) {
        this.isWord = false;
        this.letter = letter;
        children = new HashMap<Character, TrieNode>(ALPHABET);
    }

    public TrieNode getChild(char letter) {

        if (children != null) {
            if (children.containsKey(letter)) {
                return children.get(letter); 
            }
        }
        return null;
    }
}

Now, I've tried to implement the search as Murilo Vasconcelos has it, but something is off and I need some help debugging this. Please give suggestions on how to refactor this and/or point out where the bugs are. The very first thing I'd like to refactor is the "minCost" global variable, but that's the smallest of things. Anyway, here's the code...

public void search(String word) {

    int size = word.length();
    int[] currentRow = new int[size + 1];

    for (int i = 0; i <= size; i++) {
        currentRow[i] = i;
    }
    for (int i = 0; i < size; i++) {
        char c = word.charAt(i);
        if (theTrie.root.children.containsKey(c)) {
            searchRec(theTrie.root.children.get(c), c, word, currentRow);
        }
    }
}

private void searchRec(TrieNode node, char letter, String word, int[] previousRow) {

    int size = previousRow.length;
    int[] currentRow = new int[size];
    currentRow[0] = previousRow[0] + 1;

    int replace, insertCost, deleteCost;

    for (int i = 1; i < size; i++) {

        char c = word.charAt(i - 1);

        insertCost = currentRow[i - 1] + 1;
        deleteCost = previousRow[i] + 1;
        replace = (c == letter) ? previousRow[i - 1] : (previousRow[i - 1] + 1);

        currentRow[i] = minimum(insertCost, deleteCost, replace);
    }

    if (currentRow[size - 1] < minCost && !node.isWord) {
        minCost = currentRow[size - 1];
    }
    Integer minElement = minElement(currentRow);
    if (minElement < minCost) {

        for (Map.Entry<Character, TrieNode> entry : node.children.entrySet()) {
            searchRec(node, entry.getKey(), word, currentRow);
        }
    }
}

I apologize for the lack of comments. So what am I doing wrong?

INITIAL POST

I've been reading an article, Fast and Easy Levenshtein distance using a Trie, in hopes of figuring out an efficient way to compute the Levenshtein Distance between two Strings. My main goal with this is, given a large set of words, to be able to find the minimal Levenshtein Distance between an input word(s) and this set of words.

In my trivial implementation, I compute the Levenshtein Distance between an input word and the set of words, for each input word, and return the minimum. It works, but it is not efficient...

I've been looking for implementations of a Trie, in Java, and I've come across two seemingly good sources:

Koders.com version
code.google.com version (EDIT: This seems to have moved to github.com/rkapsi)

However, these implementations seem too complicated for what I'm trying to do. As I've been reading through them to understand how they work and how Trie data structures work in general, I've only become more confused.

So how would I implement a simple Trie data structure in Java? My intuition tells me that each TrieNode should store the String it represents and also references to letters of the alphabet, not necessarily all letters. Is my intuition correct?

Once that is implemented, the next task is to compute the Levenshtein Distance. I read through the Python code example in the article above, but I don't speak Python, and my Java implementation runs out of Heap memory once I hit the recursive searching. So how would I compute the Levenshtein Distance using the Trie data structure? I have a trivial implementation, modeled after this source code, but it doesn't use a Trie... it is inefficient.

It would be really nice to see some code in addition to your comments and suggestions. After all, this is a learning process for me... I've never implemented a Trie... so I have plenty to learn from this experience.

Thanks.

p.s. I can provide any source code if need be. Also, I've already read through and tried using a BK-Tree as suggested in Nick Johnson's blog, but its not as efficient as I think it can be... or maybe my implementation is wrong.

Inflect answered 1/2, 2011 at 23:1 Comment(9)

@Inflect - You mentioned Nick Johnson's blog so perhaps you may already have seen his Levenshtein Automata code. The Levenshtein Automata code is the most efficient I've ran across so far. You would just need to convert his Python version to Java. See this: blog.notdot.net/2010/07/… – Presumptive 2/2, 2011 at 0:5

@Inflect - Here's a gist of the Levenshtein Automata: gist.github.com/491973 – Presumptive 2/2, 2011 at 0:11

@Inflect The only way I can think that a Trie would help you is if you're essentially going to implement the same stuff as the Levenshtein Automata anyway. A trie is just a special case of a DFA that recognizes the words in it. – Pharmacopsychosis 2/2, 2011 at 2:54

if (currentRow[size - 1] < minCost && !node.isWord) { this line is wrong. You can only update minCost if there is a word which finish at that node, so the correct is if (currentRow[size - 1] < minCost && node.isWord) { – Onanism 2/2, 2011 at 20:28

@Murilo... that changes results in a StackOverflowError, I believe due to too much recursion. In your C++ version, you have if ((current_row[sz-1] < min_cost) && (tree->word != ""))... what exactly does the second part of that if mean? What does "" represent? – Inflect 2/2, 2011 at 20:40

tree->word == "" means no word finish at that node. So if the current cost is less than the min_cost and one or more words finish at that node, we must update the min_cost with the current cost. – Onanism 3/2, 2011 at 15:22

StackOverflowError may be because your words are very big. Do you know what is the maximum length of your words? Also, you can try to run my code with your data and see if the same error happens. – Onanism 3/2, 2011 at 15:26

@Murilo... the dictionary I use has ~180k words and the maximum length word in this dictionary is 15 characters. But the input might be longer, though not guaranteed. – Inflect 3/2, 2011 at 15:29

So the StackOverflowError isn't because of the recursion... Your maximum recursion depth is 15 which is small. – Onanism 4/2, 2011 at 0:57

I've implemented the algo described on "Fast and Easy Levenshtein distance using a Trie" article in C++ and it is really fast. If you want (understand C++ better than Python), I can past the code in somewhere.

Edit: I posted it on my blog.

Onanism answered 2/2, 2011 at 0:17 Comment(3)

Yeah, I know C++ much better than Python. Unless there is a bunch of tricky, low-level stuff... I think I'll be fine. You can paste here, or alternatively, you can upload the file on gist.github.com – Inflect 2/2, 2011 at 0:22

@Murilo... I've updated my post with an attempt to implement your algorithm in Java. Would you mind taking a look to see if there are any obvious bugs? I'm a little stuck. – Inflect 2/2, 2011 at 20:22

Ok. Also that approach of finding the minimum cost is from my blog wich is a bit different from Steve Hanov's approach. I'll be happy if you cite it :) – Onanism 2/2, 2011 at 20:32

From what I can tell you don't need to improve the efficiency of Levenshtein Distance, you need to store your strings in a structure that stops you needing to run distance computations so many times i.e by pruning the search space.

Since Levenshtein distance is a metric, you can use any of the metric spaces indices which take advantage of triangle inequality - you mentioned BK-Trees, but there are others eg. Vantage Point Trees, Fixed-Queries Trees, Bisector Trees, Spatial Approximation Trees. Here are their descriptions:

Burkhard-Keller Tree

Nodes are inserted into the tree as follows: For the root node pick an arbitary element from the space; add unique edge-labeled children such that the value of each edge is the distance from the pivot to that element; apply recursively, selecting the child as the pivot when an edge already exists.

Fixed-Queries Tree

As with BKTs except: Elements are stored at leaves; Each leaf has multiple elements; For each level of the tree the same pivot is used.

Bisector Tree

Each node contains two pivot elements with their covering radius (maximum distance between the centre element and any of its subtree elements); Filter into two sets those elements which are closest to the ﬁrst pivot and those closest to the second, and recursively build two subtrees from these sets.

Spatial Approximation Tree

Initially all elements are in a bag; Choose an arbitrary element to be the pivot; Build a collection of nearest neighbours within range of the pivot; Put each remaining element into the bag of the nearest element to it from collection just built; Recursively form a subtree from each element of this collection.

Vantage Point Tree

Choose a pivot from the set abitrarily; Calculate the median distance between this pivot and each element of the remaining set; Filter elements from the set into left and right recursive subtrees such that those with distances less than or equal to the median form the left and those greater form the right.

Glassful answered 2/2, 2011 at 1:14 Comment(4)

Very well put together answer... +1... thank you! You are totally correct regarding that I'm looking for a way to not have to compute the Levenshtein Distance as much. I'll look into these trees and try to realize their advantages/disadvantages. However, that would take me a while. In the mean time... do you have any suggestions, or other comments to make on the benefits of using one over the other? – Inflect 2/2, 2011 at 1:19

unfortunately, you are asking just a couple of months too early - My Honours project is exactly that, and I'm not finished yet! – Glassful 2/2, 2011 at 1:23

Any chance for a link to/PDF of your honours project, you mentioned in the comments? /cc @Inflect – Tektite 10/5, 2013 at 13:57

OK, if you really want it... @Tektite robertgmoss.co.uk/hons_project_report.pdf – Glassful 11/5, 2013 at 10:43

Edit: I posted it on my blog.

Onanism answered 2/2, 2011 at 0:17 Comment(3)

Ok. Also that approach of finding the minimum cost is from my blog wich is a bit different from Steve Hanov's approach. I'll be happy if you cite it :) – Onanism 2/2, 2011 at 20:32

Here is an example of Levenshtein Automata in Java (EDIT: moved to github).These will probably also be helpful:

http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/automaton/ http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/

EDIT: The above links seem to have moved to github:

https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/util/automaton https://github.com/apache/lucene-solr/tree/master/lucene/core/src/test/org/apache/lucene/util/automaton

It looks like the experimental Lucene code is based off of the dk.brics.automaton package.

Usage appears to be something similar to below:

LevenshteinAutomata builder = new LevenshteinAutomata(s);
Automaton automata = builder.toAutomaton(n);
boolean result1 = BasicOperations.run(automata, "foo");
boolean result2 = BasicOperations.run(automata, "bar");

Presumptive answered 2/2, 2011 at 0:19 Comment(8)

hahaha... yes! I was wondering what Lev1ParametricDescription and Lev2ParametricDescription were... and other classes – Inflect 2/2, 2011 at 0:26

Take a look at the javaodcs for the dk.brics.automaton package. I believe Lucene just incorporated this package so you may want to use it directly. – Presumptive 2/2, 2011 at 0:31

ahhh... nice! This all looks great. Thank you! It will take me some time to read through it all... it looks super complicated. But it should be interesting to learn about! – Inflect 2/2, 2011 at 0:45

@Taylor... I'm having a hard time including the Lucene source/.jar in Eclipse. It seems like the 3.0.3 version download does not include the Automaton stuff. Any suggestions? – Inflect 2/2, 2011 at 2:21

@Taylor... the dk.brics.automaton package doesn't include the Levenshtein Automata. The links you sent me to apache has different sources than the dk.brics.automaton sources – Inflect 2/2, 2011 at 2:32

+1 - the Lucene stuff was one of the resources I encountered when writing up my article. – Pharmacopsychosis 2/2, 2011 at 2:54

@Nick... Hey... I'm trying to import all the Lucene stuff into Eclipse but its not working. I downloaded Lucene source code, but its missing the Levenshtein Automata. I downloaded the dk.brics.automata but its not merging with the Lucene stuff. Any suggestions? – Inflect 2/2, 2011 at 2:56

@Nice... awesome blog articles regarding the BK-Tree/Levenshtein Automata btw :) – Inflect 2/2, 2011 at 2:56

In many ways, Steve Hanov's algorithm (presented in the first article linked in the question, Fast and Easy Levenshtein distance using a Trie), the ports of the algorithm made by Murilo and you (OP), and quite possibly every pertinent algorithm involving a Trie or similar structure, function much like a Levenshtein Automaton (which has been mentioned several times here) does:

Given:
       dict is a dictionary represented as a DFA (ex. trie or dawg)
       dictState is a state in dict
       dictStartState is the start state in dict
       dictAcceptState is a dictState arrived at after following the transitions defined by a word in dict
       editDistance is an edit distance
       laWord is a word
       la is a Levenshtein Automaton defined for laWord and editDistance
       laState is a state in la
       laStartState is the start state in la
       laAcceptState is a laState arrived at after following the transitions defined by a word that is within editDistance of laWord
       charSequence is a sequence of chars
       traversalDataStack is a stack of (dictState, laState, charSequence) tuples

Define dictState as dictStartState
Define laState as laStartState
Push (dictState, laState, "") on to traversalDataStack
While traversalDataStack is not empty
    Define currentTraversalDataTuple as the the product of a pop of traversalDataStack
    Define currentDictState as the dictState in currentTraversalDataTuple
    Define currentLAState as the laState in currentTraversalDataTuple
    Define currentCharSequence as the charSequence in currentTraversalDataTuple
    For each char in alphabet
        Check if currentDictState has outgoing transition labeled by char
        Check if currentLAState has outgoing transition labeled by char
        If both currentDictState and currentLAState have outgoing transitions labeled by char
            Define newDictState as the state arrived at after following the outgoing transition of dictState labeled by char
            Define newLAState as the state arrived at after following the outgoing transition of laState labeled by char
            Define newCharSequence as concatenation of currentCharSequence and char
            Push (newDictState, newLAState, newCharSequence) on to currentTraversalDataTuple
            If newDictState is a dictAcceptState, and if newLAState is a laAcceptState
                Add newCharSequence to resultSet
            endIf
        endIf
    endFor
endWhile

Steve Hanov's algorithm and its aforementioned derivatives obviously use a Levenshtein distance computation matrix in place of a formal Levenshtein Automaton. Pretty fast, but a formal Levenshtein Automaton can have its parametric states (abstract states which describe the concrete states of the automaton) generated and used for traversal, bypassing any edit-distance-related runtime computation whatsoever. So, it should be run even faster than the aforementioned algorithms.

If you (or anybody else) is interested in a formal Levenshtein Automaton solution, have a look at LevenshteinAutomaton. It implements the aforementioned parametric-state-based algorithm, as well as a pure concrete-state-traversal-based algorithm (outlined above) and dynamic-programming-based algorithms (for both edit distance and neighbor determination). It's maintained by yours truly :) .

Hauler answered 2/7, 2016 at 6:58 Comment(0)

My intuition tells me that each TrieNode should store the String it represents and also references to letters of the alphabet, not necessarily all letters. Is my intuition correct?

No, a trie doesn't represent a String, it represents a set of strings (and all their prefixes). A trie node maps an input character to another trie node. So it should hold something like an array of characters and a corresponding array of TrieNode references. (Maybe not that exact representation, depending on efficiency in your particular use of it.)

Cittern answered 2/2, 2011 at 1:15 Comment(0)

As I see it right, you want to loop over all branches of the trie. That's not that difficult using a recursive function. I'm using a trie as well in my k-nearest neighbor algorithm, using the same kind of function. I don't know Java, however but here's some pseudocode:

function walk (testitem trie)
   make an empty array results
   function compare (testitem children distance)
     if testitem = None
        place the distance and children into results
     else compare(testitem from second position, 
                  the sub-children of the first child in children,
                  if the first item of testitem is equal to that 
                  of the node of the first child of children 
                  add one to the distance (! non-destructive)
                  else just the distance)
        when there are any children left
             compare (testitem, the children without the first item,
                      distance)
    compare(testitem, children of root-node in trie, distance set to 0)
    return the results

Hope it helps.

Orsino answered 3/2, 2011 at 21:8 Comment(1)

Thanks for the response. I have a couple of questions... can you explain the parameters for each function? What do they represent? – Inflect 3/2, 2011 at 21:48

The function walk takes a testitem (for example a indexable string, or an array of characters) and a trie. A trie can be an object with two slots. One specifying the node of the trie, the other the children of that node. The children are tries as well. In python it would be something like:

class Trie(object):
    def __init__(self, node=None, children=[]):
        self.node = node
        self.children = children

Or in Lisp...

(defstruct trie (node nil) (children nil))

Now a trie looks something like this:

(trie #node None
      #children ((trie #node f
                       #children ((trie #node o
                                        #children ((trie #node o
                                                         #children None)))
                                  (trie #node u
                                        #children ((trie #node n
                                                         #children None)))))))

Now the internal function (which you also can write separately) takes the testitem, the children of the root node of the tree (of which the node value is None or whatever), and an initial distance set to 0.

Then we just recursively traverse both branches of the tree, starting left and then right.

Orsino answered 4/2, 2011 at 8:18 Comment(1)

I've already implemented the Trie... I was just confused about what "testitem" was and the initial distance. – Inflect 4/2, 2011 at 8:55

I'll just leave this here in case anyone is looking for yet another treatment of this problem:

http://code.google.com/p/oracleofwoodyallen/wiki/ApproximateStringMatching

Anaplastic answered 24/3, 2012 at 17:43 Comment(0)

I was looking at your latest update 3, the algorithm seem not work well for me.

Let s see you have below test cases:

    Trie dict = new Trie();
    dict.insert("arb");
    dict.insert("area");

    ArrayList<Character> word = new ArrayList<Character>();
    word.add('a');
    word.add('r');
    word.add('c');

In this case, the minimum edit distance between "arc" and the dict should be 1, which is the edit distance between "arc" and "arb", but you algorithms will return 2 instead.

I went through the below code piece:

        if (word.get(i - 1) == letter) {
            replaceCost = previousRow[i - 1];
        } else {
            replaceCost = previousRow[i - 1] + 1;
        }

At least for the first loop, the letter is one of the characters in the word, but instead, you should be compare the nodes in the trie, so there will be one line duplicate with the first character in the word, is that right? each DP matrix has the first line as a duplicate. I executed the exact same code you put on the solution.

Spiracle answered 13/11, 2014 at 5:15 Comment(0)

Well, here's how I did it a long time ago. I stored the dictionary as a trie, which is simply a finite-state-machine restricted to the form of a tree. You can enhance it by not making that restriction. For example, common suffixes can simply be a shared subtree. You could even have loops, to capture stuff like "nation", "national", "nationalize", "nationalization", ...

Keep the trie as absolutely simple as possible. Don't go stuffing strings in it.

Remember, you don't do this to find the distance between two given strings. You use it to find the strings in the dictionary that are closest to one given string. The time it takes depends on how much levenshtein distance you can tolerate. For distance zero, it is simply O(n) where n is the word length. For arbitrary distance, it is O(N) where N is the number of words in the dictionary.

Prophylaxis answered 2/2, 2011 at 17:11 Comment(4)

@Mike... thanks for the advice. I've updated my post with a Trie implementation in Java. I've also included a search method which I'm using to compute the minimum Levenshtein Distance between a given word and the words in the Trie. It doesn't work quite right... would you mind taking a look to see if you can catch any obvious bugs? I'm a bit stuck. – Inflect 2/2, 2011 at 20:25

@Hristo: I think you're doing it a little differently than I did. My approach did not use the matrix with rows. The basic form of the program is a depth-first walk function on the trie, which is then decorated by adding arguments. One argument is the remaining error budget. When descending, if the trie character doesn't match the key character, the budget is reduced by 1 on the subordinate call. Similarly for other kinds of mismatch. Then the routine prunes any calls where the budget is < 0. Then there's an outer loop that does the walk a number of times ... – Prophylaxis 2/2, 2011 at 22:58

@Hristo: ... first with a budget of 0, then with a budget of 1, etc. Whenever the walk hits a match, it appends the match to a result list. The outer loop can stop as soon as some matches appear in the list. Since the time taken for small budget B is exponential in B, it doesn't hurt to do the extra walks with smaller B. In this way, the first matches you get are the lowest-cost. – Prophylaxis 2/2, 2011 at 23:3

@Mike... thanks for the response. So let's say I'm trying to calculate the minimum Levenshtein distance between a word "tihs" and the Trie. Would your algorithm be able to return the value 1, for example, a Levenshtein Distance of 1 between "tihs" and "ties"? – Inflect 2/2, 2011 at 23:13

Correct me if I am wrong but I believe your update3 has an extra loop which is unnecesary and makes the program much slower:

for (int i = 0; i < iWordLength; i++) {
    traverseTrie(theTrie.root, word.get(i), word, currentRow);
}

You ought to call traverseTrie only once because within traverseTrie you are already looping over the whole word. The code should be only as follows:

traverseTrie(theTrie.root, ' ', word, currentRow);

Dougherty answered 6/6, 2015 at 3:5 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

UPDATE 3

UPDATE 2

UPDATE 1

INITIAL POST

Recommended topics

Hot tags