Sorting integers with a binary trie?

I haven't actually seen this algorithm ever used before. This is probably because of the huge memory overhead - every bit of the numbers gets blown up into a node containing two pointers and a bit of its own. On a 32-bit system, this is a 64x blowup in memory, and on 64-bit system this is a 128x blowup in memory.

However, this algorithm is extremely closely related to most-significant digit radix sort (also called binary quicksort), which is used frequently in practice. In fact, you can think of binary quicksort as a space-efficient implementation of this algorithm.

The connection is based on the recursive structure of the trie. If you think about the top node of the trie, it will look like this:

           *
          / \
         /   \
    All #s  All #s
     with    with
    MSB 0   MSB 1

If you were to use the standard pre-order traversal of the trie to output all the numbers in sorted order, the algorithm would first print out all numbers starting with a 0 in sorted order, then print out all numbers starting with a 1 in sorted order. (The root node is never printed, since all numbers have the same number of bits in them, and therefore all the numbers are actually stored down at the leaves).

So suppose that you wanted to simulate the effect of building the trie and doing a preorder traversal on it. You would know that this would start off by printing out all the numbers with MSB 0, then would print all the numbers with MSB 1. Accordingly, you could start off by splitting the numbers into two groups - one with all numbers having MSB 0, and one with all numbers having MSB 1. You would then recursively sort all the numbers with MSB 0 and print them out, then recursively sort all the numbers starting with MSB 1 and print them out. This recursive process would continue until eventually you had gone through all of the bits of the numbers, at which point you would just print out each number individually.

The above process is almost identically the binary quicksort algorithm, which works like this:

If there are no numbers left, do nothing.
If there is one number left, print it out.
Otherwise:
- Split the numbers into two groups based on their first bit.
- Recursively sort and print the numbers starting with 0.
- Recurisvely sort and print the numbers starting with 1.

There are some differences between these algorithms. Binary quicksort works by recursively splitting the list into smaller and smaller pieces until everything is sorted, while the trie-based algorithm builds the trie and then reconstructs the numbers. However, you can think of binary quicksort as an optimization of the algorithm that simultaneously builds up and walks the trie.

So in short: I doubt anyone would want to use the trie-based algorithm due to the memory overhead, but it does give a great starting point for deriving the MSD radix sort algorithm!

Hope this helps!

Recommended topics

Hot tags