Is a Trie a K-ary tree?

Asked 10/1, 2014 at 4:7 Answered 10/1, 2014 at 11:35

Solved c++algorithm data-structures tree trie

If you look at the node definitions for a simple Trie and a simple K-ary tree, they look the same.

(using C++ notation)

template <size_t K>
trieNode
{
    trieNode *[K]
};

template <size_t K>
KaryNode
{
    KaryNode *[K]
};

At its simplest a K-ary tree has multiple children per node (2 for a binary tree)

And a Trie has "multiple children per node"

It seems that a K-ary tree makes it's choice of child based on comparison( < or > ) of Keys

While a Trie makes it's choice of child based on (unary) equality of sub-spans of the Key

Since neither data structure has made it into any standards, what would be best definition of each, and how would they be differentiated?

Pajamas answered 10/1, 2014 at 4:7 Comment(0)

From the point of view of the shape of the data structure, a trie is clearly an N-ary tree, in the same way that a balanced binary search tree is a binary tree, the difference being in how the data structure manages the data.

A binary search tree is a binary tree with additional constraint that the keys in the nodes are ordered, a balanced binary tree adds on top of that a constraint on the difference between the lengths of different branches.

Similarly, a trie is a N-ary tree with additional constrains that determine how the keys are managed.

Let's try a definition of what a trie is:

A trie is an efficient data structure used to implement a dictionary in which keys are sequences lexicographically. The implementation uses an N-ary tree where the branching factor is the range of valid values for each element in the key sequence^[1] and each node may or not hold a value, but always holds a subsequence of the key being stored ^[2]. For each node in the tree, the concatenation of the subsequences of keys stored in the nodes from the root to any given node represent the key for the value stored, if the node holds a value, and/or a common prefix for all nodes in this subtree.

This layout of data allows for linear lookups on the size of the keys, and sharing the prefix allows for compact representations for many natural languages (like Spanish, where different forms of each verb differ only on the last few suffix characters).

¹: That keys are sequences is an important premise, as the main advantage of the tries is that they split the key into different nodes along the path.

²: Depending on the implementation each node might maintain a single element (character) from the sequence or a combination.

Carousal answered 10/1, 2014 at 4:48 Comment(1)

I am trying to figure out what are the constraints on Trie that make it a Trie – Pajamas 10/1, 2014 at 4:50

A binary tree refers to the shape of the tree without saying anything about how the tree will be used. A binary search tree is a binary tree that is being used in a particular way.

Similarly, a k-ary tree = n-ary tree = multi-way tree refers to the shape of the tree. A trie is a multiway tree that is being used in a particular way.

(But, be careful, just like there are many variations on binary search trees, there are many different variations on tries.)

So, what makes a trie a trie?

A trie is usually used to represent a collection of sequences, such as strings. A particular key is stored, not in a single node like in a binary search tree, but rather split up across many levels of the tree. Here's a picture of a trie containing the strings "can", "car", "cat", and "do".

        .
       / \
     c/   \d
     /     \
    .       .
    |       |
   a|       |o
    |       |
    .       .
   /|\
 n/r| \t
 /  |  \
.   .   .

As you can see, it may easier to think of the characters as being associated with the edges instead of the nodes, but any particular implementation might represent it either way.

The many varieties of tries differ in things like how they handle cases where one key is a prefix of another (eg, "cat" and "catastrophe"), and how/whether to compress long common substrings.

United answered 10/1, 2014 at 11:35 Comment(0)

K-nary tree: each node has at most K children. Trie: the children of each node is not limited to a number (theoretically). In practice of course there's always a limit. For example for an asian word trie, the number of children of each node is limited to the size of asian characters, which is probably say 5000 or 10000.

Cordeelia answered 10/1, 2014 at 4:17 Comment(8)

Generally a given trie is limited to the number of characters in its "alphabet" a large number like 10000, is still a limit – Pajamas 10/1, 2014 at 4:23

Yeah, but you won't implement this type of tree with fixed number K=10000, which is a completely waste of space. You are asking for differentiation. 1. in theory, they are different, trie got not limit in number of children. 2. in implementation, many useful tries won't be implemented with a fixed number, a dynamic structure that supports quick insertion/deletion/searching shall be used. For most k-nary trees, since k is small, the implementation is less changeable. – Cordeelia 10/1, 2014 at 4:24

I'm curious about definitions, not implementations, you can for example use a vector or a linked list instead of an array, and not change what data type they are. – Pajamas 10/1, 2014 at 4:26

From definition, a k-ary tree is a rooted tree in which each node has no more than k children. In practice ppl use it as ordered search tree but it is actually not necessarily to be ordered in definition. For Trie, it is an ordered tree (no limitation of k on children). The formal definition is: A tree for storing strings in which there is one node for every common prefix. The strings are stored in extra leaf nodes. In fact, a trie with each node has less than k children is a k-nary tree from definition. – Cordeelia 10/1, 2014 at 4:43

So only strings, not any other datatype? – Pajamas 10/1, 2014 at 4:48

Whether the number of children is fixed or not is an implementation detail. You can have a trie in which the children nodes are a linked list, or a more complex data structure; but it it would still be a trie (similarly, a n-ary tree can have a fixed number of children nodes or a list or any other data structure) – Eighteenmo 10/1, 2014 at 4:51

@GlennTeitelbaum: No--other data types are allowed. Just for example, storing IP addresses in a trie is pretty common. – Parkerparkhurst 10/1, 2014 at 4:57

It was invented for strings. I think any key can be mapped to "string" (consists of a sequence of "characters") can use Trie. If you read through Taocp volume 3, chapter 6.2 & 6.3, you can find that Knuth differentiate ordered k-nary tree searching & trie searching by the name "searching by comparison of keys" and "digital searching". – Cordeelia 10/1, 2014 at 5:9

Thanks to user534498's comment about Knuth's "Taocp volume 3, chapter 6.2 & 6.3"

Knuth claims - Ch 6.3

A trie is essentially an M-ary tree, whose nodes are M-place vectors with components corresponding to digits or characters. each node on level l represent the set of all keys that begin with a certain sequence of l characters; the node specifies an M-way branch, depending on the (l +1)st character.

K-ary, M-ary and N-ary being synonyms, it seems the answer is yes.

Pajamas answered 10/1, 2014 at 6:35 Comment(1)

This made finding it worth it, Exercise 1 from that chapter: If a tree has leaves, what does a trie have? – Pajamas 10/1, 2014 at 7:14

Recommended topics

Hot tags