Suffix tree VS Tries - in plain English , what's the difference?
Asked Answered
S

2

5

I've taken a look at this questions , but I still don't see the difference between a Suffix tree and a Trie .

Both have all the substrings of a given string , so where do they differ from one another ?

Soda answered 1/8, 2013 at 6:48 Comment(2)
This gives a fairly detailed explanation in plain English.Plummet
@devnull: You can say that :)Soda
H
18

Suffix tree - a large text is given. Query - search many times any words in the text.
Example: You are implementing your own cool text redactor with solitaire and kittens=) You are going to implement CTRL+F function. Probable implementation - index document (create suffix tree) and when user looks for some word - search it in a tree.

Trie - a large text is given. Query - search many times predefined words in the text.
Example: You are implementing your own cool facebook with poker and Justin Bieber's fans=) You don't want your users to post swear words. Probable implementation - create trie of swear words. When users types some text search forbiiden words and replace them with *.

In general, suffix tree= trie. Suffix tree is a trie of all suffixes of some word. When you want search something in a dictionary use trie. When you search something in a solid text use suffix tree.

Important note - building/rebuilding suffix tree for large text is complex operation. Once you changed your text you have to recreate suffix tree. Rebuilding trie is a trivial operation - just add new word in O(wordLength)

Conclusion
Suffix tree. You know nothing about future queries.Spend time once for creating suffix tree and you are ready to process requests. Known info is a text. Situations where requests are unknown but text is given and not going to change oftenly are candidates for using suffix tree. For example you can't use trie (aho-corasick algo) in CTRL+F implementation - because you just can't give dictionary as the input for aho-corasick's algo based on the trie.

Trie. You know nothing about text where you will perform searching. But you know future queries. Spend time for preprocessing/preparing data structure for your queries and you can perform search queries in any text. For example in replacing forbidden words task you don't know what text user will post but you know forbidden words. Creating suffix tree for each short new post would be too stupid=) UPD As @mightyWOZ noticed in comment, pure trie is not applicable but we can use Aho-Corasick algo which is extension over trie. So, statement is still true for tries - there exist approach (Aho-Corasick) which uses trie as a base, pre-processes queries and then can handle any text.

Hedveh answered 1/8, 2013 at 8:25 Comment(3)
This is wrong "Spend time for preprocessing/preparing data structure for your queries and you can perform search queries in any text." You preprocess the text in which you want to perform search. In the preprocessing step you build the trie by inserting the words to trie. After that you can search for any word. In simple words, tries: can only search for words(spaces or some other delimiter is required). Suffix tree: can search for any substring. Because it contains every suffix of the text.Presnell
Good remark. In fact, I meant Aho–Corasick algo (I mentioned why we can't use it for Ctrl+F) but yes, that should be added to the post.Hedveh
Maybe I didn't get, or meybe things are messed up out of order. You say "trie", and then talk about suffix tree, and before that you say "suffix tree", and talk about trie...Sammer
S
0

Let me try to explain the difference in plain English.

The trie (from 'retrieval') is a tree, just like a binary trees, but where the keys are spread in the tree itself, each node contains part of the key, and so common prefixes are shared for all descendant nodes. Functionally, it's still a tree, where you supply a key to get access to the node. Two notable differences from binary trees are: 1) this one can be more compact if you have a real lot of sortable keys (e.g. a dictionry of words); 2) typical use-case implies that you are interested more in finding if a key existis in it, then having some value assiciated with it; i.e. sometimes there are no values at all, only the keys;

Now the suffix tree.

Technically, it's the same thing, the difference is what you use as a key.

E.g., if you store a word "monkey" in a trie, then your key is this word - "monkey". But if you want to store the same in a suffix tree, then you first need to make a list of all possible suffixes for this word:

  • monkey
  • onkey
  • nkey
  • key
  • ey
  • y

And then you have 6 keys, which are stored in a... trie. The result being a suffix tree.

Possible usage: you want to find where the word "key" is in a text (text being the word "monkey"); you can find it instantly - just look down the tree, it's at offset 3 from the beginning of text.

Yeah, the suffix tree is also expected to have offsets stored as well.

At the topmost level, this is the difference.

(Now the question: how can you represent the long text in a suffix tree? Another answer says you can - but I am not sure about that).

Sammer answered 9/9 at 11:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.