What is the worst case complexity for bucket sort?
Asked Answered
T

5

8

I just read the Wikipedia page about Bucket sort. In this article they say that the worst case complexity is O(n²). But I thought the worst case complexity was O(n + k) where k are the number of buckets. This is how I calculate this complexity:

  1. Add the element to the bucket. Using a linked list this is O(1)
  2. Going through the list and put the elements in the correct bucket = O(n)
  3. Merging the buckets = O(k)
  4. O(1) * O(n) + O(k) = O(n + k)

Am I missing something?

Twentieth answered 20/3, 2012 at 17:42 Comment(0)
A
2

What if the algorithm decides that every element belongs in the same bucket? In that case, the linked list in that bucket needs to be traversed every time an element is added. That takes 1 step, then 2, then 3, 4, 5... n . Thus the time is the sum of all of the numbers from 1 to n which is (n^2 + n)/2, which is O(n^2).

Of course, this is "worst case" (all the elements in one bucket) - the algorithm to calculate which bucket to place an element is generally designed to avoid this behavior.

Agog answered 20/3, 2012 at 17:48 Comment(8)
Not necessarily, you can add to the front of the list each time, giving constant O(1) performance. However, either way, you'll need eventually to sort the individual bucket, which is where (I think) the worst-case O(n^2) performance comes from.Jyoti
my answer is a bit of a simplification - there's a reason why you don't add to the front of the list, which I'll add in an editAgog
This is my understanding, but I'm not 100% confident: The answer comes from the fact that bucket-sort is an attempt to improve on the nlogn lower bound for comparison based-sorts. If you add to the front of the list, you then need to sort within each bucket - which takes us back to the nlogn upper/lower bound of comparison-based sort. So, bucket-sort wants to put the elements into the bucket in-order. In the average case, this is all well and good. But, in its attempt to beat nlogn, this worst-case does appear. Can anyone confirm this to be true/false?Agog
I am sorry, but I think it is wrong. @Jyoti gives the correct reason for it in his answer [IMO] - the recursive call [or a different sort] for each bucket - if the bucket is still the same size [or almost the same size] as the original array - you gained nothing. It is similar to the worst case of quicksort - where the pivot you selected is always the smallest element.Marque
I am basing this statement on the fact that a linked list is assumed to have O(1) insertion [worst case], as can be seen in the wikipedia page of linked listMarque
But what if we create one bucket for each available kind of value. Then all ellements in the buckets will be equal andwe won't need the secound sort.Twentieth
@Nlist, you are technically correct, but also realize that doing that is equivalent to doing insertion sort. You don't get any of the average-performance increases that's at the heart of the motivation behind bucket sort.Jyoti
Still not quite agreeing with this answer. LinkedList insert to the end can be easily made O(1), if not a different data structure like Stack can be used, and it just has to be popped again and will still be a linear time. Why does each bucket have to be sorted again? If the smallest granularity of possible values are known ahead of time? You just make sure there're enough buckets.Bromley
J
10

In order to merge the buckets, they first need to be sorted. Consider the pseudocode given in the Wikipedia article:

function bucketSort(array, n) is
  buckets ← new array of n empty lists
  for i = 0 to (length(array)-1) do
    insert array[i] into buckets[msbits(array[i], k)]
  for i = 0 to n - 1 do
    nextSort(buckets[i])
  return the concatenation of buckets[0], ..., buckets[n-1]

The nextSort(buckets[i]) sorts each of the individual buckets. Generally, a different sort is used to sort the buckets (i.e. insertion sort), as once you get down and size, different, non-recursive sorts often give you better performance.

Now, consider the case where all n elements end up in the same bucket. If we use insertion sort to sort individual buckets, this could lead to the worst case performance of O(n^2). I think the answer must be dependent on the sort you choose to sort the individual buckets.

Jyoti answered 20/3, 2012 at 17:53 Comment(1)
but what if we sort each bucket with merge sort, in that case even if all elements are added to same bucket, it will still be O(nlogn). What is your opinion?Eglanteen
A
2

What if the algorithm decides that every element belongs in the same bucket? In that case, the linked list in that bucket needs to be traversed every time an element is added. That takes 1 step, then 2, then 3, 4, 5... n . Thus the time is the sum of all of the numbers from 1 to n which is (n^2 + n)/2, which is O(n^2).

Of course, this is "worst case" (all the elements in one bucket) - the algorithm to calculate which bucket to place an element is generally designed to avoid this behavior.

Agog answered 20/3, 2012 at 17:48 Comment(8)
Not necessarily, you can add to the front of the list each time, giving constant O(1) performance. However, either way, you'll need eventually to sort the individual bucket, which is where (I think) the worst-case O(n^2) performance comes from.Jyoti
my answer is a bit of a simplification - there's a reason why you don't add to the front of the list, which I'll add in an editAgog
This is my understanding, but I'm not 100% confident: The answer comes from the fact that bucket-sort is an attempt to improve on the nlogn lower bound for comparison based-sorts. If you add to the front of the list, you then need to sort within each bucket - which takes us back to the nlogn upper/lower bound of comparison-based sort. So, bucket-sort wants to put the elements into the bucket in-order. In the average case, this is all well and good. But, in its attempt to beat nlogn, this worst-case does appear. Can anyone confirm this to be true/false?Agog
I am sorry, but I think it is wrong. @Jyoti gives the correct reason for it in his answer [IMO] - the recursive call [or a different sort] for each bucket - if the bucket is still the same size [or almost the same size] as the original array - you gained nothing. It is similar to the worst case of quicksort - where the pivot you selected is always the smallest element.Marque
I am basing this statement on the fact that a linked list is assumed to have O(1) insertion [worst case], as can be seen in the wikipedia page of linked listMarque
But what if we create one bucket for each available kind of value. Then all ellements in the buckets will be equal andwe won't need the secound sort.Twentieth
@Nlist, you are technically correct, but also realize that doing that is equivalent to doing insertion sort. You don't get any of the average-performance increases that's at the heart of the motivation behind bucket sort.Jyoti
Still not quite agreeing with this answer. LinkedList insert to the end can be easily made O(1), if not a different data structure like Stack can be used, and it just has to be popped again and will still be a linear time. Why does each bucket have to be sorted again? If the smallest granularity of possible values are known ahead of time? You just make sure there're enough buckets.Bromley
B
2

If you can guarantee that each bucket represents a unique value (equivalent items), then the worst case time complexity would be O(m+n) as you pointed out.

Bilbrey answered 20/3, 2012 at 17:56 Comment(0)
D
1

Bucket sort assumes that the input is drawn from a uniform distribution. This implies that a few items fall in each bucket. In turn, this leads to a nice average running time of O(n). Indeed, if the n elements are inserted in each bucket so that O(1) elements fall in each different bucket (insertion requires O(1) per item), then sorting a bucket using insertion sort requires, on average, O(1) as well (this is proved in almost all textbooks on algorithms). Since you must sort n buckets, the average complexity is O(n).

Now, assume that the input is not drawn from a uniform distribution. As already pointed out by @mfrankli, this may lead in the worst case to a situation in which all of the items fall for example all in the first bucket. In this case, insertion sort will require in the worst case O(n^2).

Note that you may use the following trick to maintain the same average O(n) complexity, while providing an O(n log n) complexity in the worst case. Instead of using insertion sort, simply use an algorithm with O(n log n) complexity in the worst case: either merge sort or heap sort (but not quick sort, which achieves O(n log n) only on average).

Dessertspoon answered 20/3, 2012 at 19:54 Comment(0)
G
1

This is an add-on answer to @perreal. I tried to post it as a comment but it's too long. @perreal is correctly pointing out when bucket sort makes the most sense. The different answers are making different assumptions about what data is being sorted. E.G. if the keys to be sorted are strings, then the range of possible keys will be too large (larger than the bucket array), and we will have to only use the first character of the string for the bucket positions or some other strategy. The individual buckets will have to be sorted because they hold items with different keys, leading to O(n^2).

But if we are sorting data where the keys are integers in a known range, then the buckets are always already sorted because the keys in the bucket are equal, which leads to the linear time sort. Not only are the buckets sorted, but the sort is stable because we can pull items out of the bucket array in the order they were added.

The thing that I wanted to add is that if you are facing O(n^2) because of the nature of the keys to be sorted, bucket sort might not be the right approach. When you have a range of possible keys that is proportional to the size of the input, then you can take advantage of the linear time bucket sort by having each bucket hold only 1 value of a key.

Gombosi answered 2/1, 2018 at 20:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.