Sorting integers with a binary trie?
Asked Answered
E

1

6

Since every integer can be represented as a series of bits of some length, it seems like you could sort a bunch of integers in the following way:

  1. Insert each integer into a binary trie (a trie where each node has two children, one labeled 0 and one labeled 1).
  2. Using the standard algorithm for listing all words in a trie in sorted order, list off all the integers in sorted order.

Is this sorting algorithm ever used in practice?

Eberta answered 15/1, 2013 at 5:49 Comment(0)
E
6

I haven't actually seen this algorithm ever used before. This is probably because of the huge memory overhead - every bit of the numbers gets blown up into a node containing two pointers and a bit of its own. On a 32-bit system, this is a 64x blowup in memory, and on 64-bit system this is a 128x blowup in memory.

However, this algorithm is extremely closely related to most-significant digit radix sort (also called binary quicksort), which is used frequently in practice. In fact, you can think of binary quicksort as a space-efficient implementation of this algorithm.

The connection is based on the recursive structure of the trie. If you think about the top node of the trie, it will look like this:

           *
          / \
         /   \
    All #s  All #s
     with    with
    MSB 0   MSB 1

If you were to use the standard pre-order traversal of the trie to output all the numbers in sorted order, the algorithm would first print out all numbers starting with a 0 in sorted order, then print out all numbers starting with a 1 in sorted order. (The root node is never printed, since all numbers have the same number of bits in them, and therefore all the numbers are actually stored down at the leaves).

So suppose that you wanted to simulate the effect of building the trie and doing a preorder traversal on it. You would know that this would start off by printing out all the numbers with MSB 0, then would print all the numbers with MSB 1. Accordingly, you could start off by splitting the numbers into two groups - one with all numbers having MSB 0, and one with all numbers having MSB 1. You would then recursively sort all the numbers with MSB 0 and print them out, then recursively sort all the numbers starting with MSB 1 and print them out. This recursive process would continue until eventually you had gone through all of the bits of the numbers, at which point you would just print out each number individually.

The above process is almost identically the binary quicksort algorithm, which works like this:

  • If there are no numbers left, do nothing.
  • If there is one number left, print it out.
  • Otherwise:
    • Split the numbers into two groups based on their first bit.
    • Recursively sort and print the numbers starting with 0.
    • Recurisvely sort and print the numbers starting with 1.

There are some differences between these algorithms. Binary quicksort works by recursively splitting the list into smaller and smaller pieces until everything is sorted, while the trie-based algorithm builds the trie and then reconstructs the numbers. However, you can think of binary quicksort as an optimization of the algorithm that simultaneously builds up and walks the trie.

So in short: I doubt anyone would want to use the trie-based algorithm due to the memory overhead, but it does give a great starting point for deriving the MSD radix sort algorithm!

Hope this helps!

Eberta answered 15/1, 2013 at 5:49 Comment(2)
There is another issue I believe with the trie comparing to in-place radix sort, the trie is a tree based DS, and as such does not follow the principle of locality very well, while in-place radix sort is basically passing sequentially on the subarrays - which is expected to provide a much better cache performance I believe. So in addition to the space factor - the hidden constant in the time complexity should also be better for in-place radix sort.Amadis
@amit- I hadn't even considered locality - that's an excellent point. MSD radix sort and quicksort both perform very well due to locality, whereas a "jump around a trie" algorithm is probably about as bad as you can get for locality.Eberta

© 2022 - 2024 — McMap. All rights reserved.