Which node data structure to use for a trie
Asked Answered
M

2

12

I am using trie for the first time.I wanted to know which is the best data structure to use for a trie while deciding which is the next branch that one is supposed to traverse. I was looking among an array,a hashmap and a linked list.

Maelstrom answered 16/9, 2011 at 20:4 Comment(0)
E
12

Each of these options has their advantages and disadvantages.

If you store the child nodes in an array, then you can look up which child to visit extremely efficiently by just indexing into the array. However, the space usage per node will be high: O(|Σ|), where Σ is the set of letters that your words can be formed from, even if most of those children are null.

If you store the child nodes in a linked list, then the time required to find a child will be O(|Σ|), since you may need to scan across all of the nodes of the linked list to find the child you want. On the other hand, the space efficiency will be quite good, because you only store the children that you're using. You could also consider using a fixed-sized array here, which has even better space usage but leads to very expensive insertions and deletions.

If you store the child nodes in a hash table, then the (expected) time to find a child will be O(1) and the memory usage will only be proportional (roughly) to the number of children you have. Interestingly, because you know in advance what values you're going to be hashing, you could consider using a dynamic perfect hash table to ensure worst-case O(1) lookups, at the expense of some precomputation.

Another option would be to store the child nodes in a binary search tree. This gives rise to the ternary search tree data structure. This choice is somewhere between the linked list and hash table options - the space usage is low and you can perform predecessor and successor queries efficiently, but there's a slight increase in the cost of performing a lookup due to the search cost in the BST. If you have a static trie where insertions never occur, you can consider using weight-balanced trees as the BSTs at each point; this gives excellent runtime for searches (O(n + log k), where n is the length of the string to search for and k is the total number of words in the trie).

In short, the array lookups are fastest but its space usage in the worst case is the worst. A statically-sized array has the best memory usage but expensive insertions and deletions. The hash table has decently fast lookups and good memory usage (on average). Binary search trees are somewhere in the middle. I would suggest using the hash table here, though if you put a premium on space and don't care about lookup times the linked list might be better. Also, if your alphabet is small (say, you're making a binary trie), the array overhead won't be too bad and you may want to use that.

Hope this helps!

Efflorescent answered 16/9, 2011 at 20:10 Comment(4)
for binary tries you can actually do (much) better than 2-element arraysKalie
@harold- Good point. In a language like C or C++ there's no space difference between a two-element array and just holding two pointers, though in languages like Java or Python you're absolutely right.Efflorescent
well there is that, but you can do better than that too, with some trickery that lets you skip the leading zeroes of the key and jump straight to the node corresponding to that many leading zero'sKalie
@harold- Ah, yes. Plus, there's vEB trees and y-Fast tries that do even better. :-)Efflorescent
B
0

If you are trying to build trie just for alphabets, I would suggest to use array and then use particia tree (space optimized trie). http://en.wikipedia.org/wiki/Radix_tree

This will allow you to do fast lookup with array and doesn't waste too much of space if branching factor of certain node is low.

Bitter answered 17/9, 2011 at 10:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.