Trie tree match performance in word search
Asked Answered
A

1

17

I have debugging a few similar solutions, but wondering if we could improve Trie Tree to partial match prefix (in search method of class Trie, current search method only check if a full word is matched or not) to even improve performance, which could return from a wrong path earlier? I am not very confident for the idea, so seek for advice earlier.

I post one of the similar solutions. Thanks.


Given a 2D board and a list of words from the dictionary, find all words in the board.

Each word must be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once in a word.

For example, Given words = ["oath","pea","eat","rain"] and board =

[
  ['o','a','a','n'],
  ['e','t','a','e'],
  ['i','h','k','r'],
  ['i','f','l','v']
]

Return ["eat","oath"]

class TrieNode():
    def __init__(self):
        self.children = collections.defaultdict(TrieNode)
        self.isWord = False

class Trie():
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for w in word:
            node = node.children[w]
        node.isWord = True

    def search(self, word):
        node = self.root
        for w in word:
            node = node.children.get(w)
            if not node:
                return False
        return node.isWord

class Solution(object):
    def findWords(self, board, words):
        res = []
        trie = Trie()
        node = trie.root
        for w in words:
            trie.insert(w)
        for i in xrange(len(board)):
            for j in xrange(len(board[0])):
                self.dfs(board, node, i, j, "", res)
        return res

    def dfs(self, board, node, i, j, path, res):
        if node.isWord:
            res.append(path)
            node.isWord = False
        if i < 0 or i >= len(board) or j < 0 or j >= len(board[0]):
            return 
        tmp = board[i][j]
        node = node.children.get(tmp)
        if not node:
            return 
        board[i][j] = "#"
        self.dfs(board, node, i+1, j, path+tmp, res)
        self.dfs(board, node, i-1, j, path+tmp, res)
        self.dfs(board, node, i, j-1, path+tmp, res)
        self.dfs(board, node, i, j+1, path+tmp, res)
        board[i][j] = tmp
Arnaud answered 25/10, 2015 at 8:20 Comment(2)
If anyone have any good thoughts on improve performance and comment on my question, it will be great. Thanks. :)Arnaud
such a familiar question from Leetcode :)Rearmost
R
9

I don't see anything wrong from the Trie part in your code.

But I think the trie's original design already has early returning when detecting any mismatch.

Actually, I usually only use regular dict as a trie instead of defaultDict + TrieNode to avoid making the problem over-complicated. You just need to set a "#" key if a certain node is a valid word. And, during insertion, just do node[w] = {}.

If you do this, your code can be significantly simplified and early returning will be straightforward, as you will not have a "wrong" key in a node at all!

For example, a simple trie containing only 'ab' will look like: {'a': {'b': {'#': {}}}. So when you search for 'cd', as soon as you realized there is no key 'c' in the outermost dict, you can return false. This implementation is similar to yours, but I believe it's easier to understand.

Rearmost answered 28/10, 2015 at 0:53 Comment(6)
Thanks stanleyli, agree with Dictionary based solution, let us talk Trie tree here. For your comments, "So I don't think it's an "optimization" by early returning from a wrong path", why do you think no solution for early return False? If no prefix match, there should be a return to make decision no word in dictionary matches? :)Arnaud
Thanks stanleyli, my question is, for traditional Trie tree, it returns True (full match for a word, which is isWord == True in my sample), or return False which is not match. For partial match, how do you think we should organize return type and logics -- I think we need to return 3 types, full match, partial match or no match? Thanks.Arnaud
Sorry I am confused by what you mean by "partial match" in your comment. I thought what you mean is the example in my reply. Maybe you can give an example to better explain?Rearmost
Thanks stanleyli, I read my question and I feel it is a bit confusing. :) Just confirming I am correct for below understading, suppose we go through first column, o=>e, then since "oe" is not a prefix in the dictionary, the lines of "if not node: return False" will be triggered and we returned earlier to know o=>e can not make a valid path. Correct understanding?Arnaud
@LinMa Yes you are right. But 'oe' is not a good example as you have 'oath' in your words. It won't know until you are trying to search for 'e'. You will get a None in node = node.children.get(w) since it's a defaultDict. That is also why I suggest you to use dict since it's more clear to check it using if w not in node.Rearmost
Thanks stanleyli. Mark as answered. Sorry for the confusion question at the beginning.Arnaud

© 2022 - 2024 — McMap. All rights reserved.