Let's start off by analyzing each algorithm independently and seeing if we can determine in what cases they will run in linear time.
First, let's look at counting sort. The algorithm works as follows:
- Find the largest integer to be sorted.
- Create an array with one slot per integer to sort.
- Iterate across the main array, updating the second array with the frequencies of each element.
- Construct the sorted array by filling in the output array with the appropriate number of copies of each element.
The first step can be done in time O(n) by iterating over the main array. Let's suppose that the largest value in the array is U. In that case, step two takes time O(U), since we have to zero out the array elements. The third step takes time O(n). The final step then takes time O(n + U), since we visit each element of the frequencies array exactly once (O(U)) and make a total of n writes to the output array O(n). This means that the total runtime is O(n + U). Note that this depends both on the total number of elements that need to be sorted (larger arrays take longer to sort) and the sizes of the elements in the array (bigger numbers will require proportionally more runtime).
If we want this runtime to be O(n), we need to require that U = O(n). That way, we would have that O(n + U) = O(n + n) = O(n). This means that you need to have the size of the largest element in the array be growing at about the same rate that the length of the array is increasing. For example, if you're guaranteed that all elements in the array are in the range 1 .. n, then counting sort will take time O(n) to complete. It doesn't matter how those elements are distributed within that range.
Radix sort essentially works by repeatedly doing a counting sort over and over again, once for each digit of the numbers in the array to sort. The key idea behind radix sort is to keep the value of U from the previous algorithm as low as possible. By picking some fixed base, the value of U is capped at some constant. For example, if you use binary radix sort and go one bit at a time, each element of the array to sort is, essentially, treated as either a 0 or as a 1. This means that each round of radix sort takes time O(n) to complete. The number of rounds required to sort the entire array is then given by the number of digits in the largest number in the array, which is Θ(log U). This means that the total runtime of the algorithm ends up being O(n log U). Note again that the runtime depends on the number of elements and on the size of the biggest element in the array.
The advantage this time is that log U grows much, much more slowly than U. For example, 2300 is about the total number of atoms in the universe, but lg (2300) = 300, which is quite small.
Some people will claim that log U is a constant on any fixed computer. On a 32-bit computer, all integers are 32-bits (unless you're using arbitrary-precision integer libraries, which we'll ignore for now), and on a 64-bit system all integers are 64 bits. In the first case, log U = 32, and in the second log U = 64. You could claim that for a fixed system, radix sort always takes linear time, since the runtime will be O(n). The constant varies from system to system, though. If you're more theoretically minded and want to be more precise, you could say that radix sort runs in linear time as long as log U = O(1), since that way O(n log U) = O(n). This means that if you have any constant upper bound on the number of bits in the number, you're guaranteed that radix sort will run in linear time.
The bucket sort algorithm is similar to radix sort, except that it distributes elements using the most significant digit rather than the least significant digit. This means that the analysis is pretty much the same as before - as long as log U = O(1), the algorithm runs in linear time.
Hope this helps!