What is the worst case complexity for bucket sort?

T

5

8

I just read the Wikipedia page about Bucket sort. In this article they say that the worst case complexity is O(n²). But I thought the worst case complexity was O(n + k) where k are the number of buckets. This is how I calculate this complexity:

Add the element to the bucket. Using a linked list this is O(1)
Going through the list and put the elements in the correct bucket = O(n)
Merging the buckets = O(k)
O(1) * O(n) + O(k) = O(n + k)

Am I missing something?

Twentieth answered 20/3, 2012 at 17:42 Comment(0)

A

2

What if the algorithm decides that every element belongs in the same bucket? In that case, the linked list in that bucket needs to be traversed every time an element is added. That takes 1 step, then 2, then 3, 4, 5... n . Thus the time is the sum of all of the numbers from 1 to n which is (n^2 + n)/2, which is O(n^2).

Of course, this is "worst case" (all the elements in one bucket) - the algorithm to calculate which bucket to place an element is generally designed to avoid this behavior.

Agog answered 20/3, 2012 at 17:48 Comment(8)

Not necessarily, you can add to the front of the list each time, giving constant O(1) performance. However, either way, you'll need eventually to sort the individual bucket, which is where (I think) the worst-case O(n^2) performance comes from. – Jyoti 20/3, 2012 at 17:55

my answer is a bit of a simplification - there's a reason why you don't add to the front of the list, which I'll add in an edit – Agog 20/3, 2012 at 18:2

This is my understanding, but I'm not 100% confident: The answer comes from the fact that bucket-sort is an attempt to improve on the nlogn lower bound for comparison based-sorts. If you add to the front of the list, you then need to sort within each bucket - which takes us back to the nlogn upper/lower bound of comparison-based sort. So, bucket-sort wants to put the elements into the bucket in-order. In the average case, this is all well and good. But, in its attempt to beat nlogn, this worst-case does appear. Can anyone confirm this to be true/false? – Agog 20/3, 2012 at 18:15

I am sorry, but I think it is wrong. @Jyoti gives the correct reason for it in his answer [IMO] - the recursive call [or a different sort] for each bucket - if the bucket is still the same size [or almost the same size] as the original array - you gained nothing. It is similar to the worst case of quicksort - where the pivot you selected is always the smallest element. – Marque 20/3, 2012 at 18:22

I am basing this statement on the fact that a linked list is assumed to have O(1) insertion [worst case], as can be seen in the wikipedia page of linked list – Marque 20/3, 2012 at 18:30

But what if we create one bucket for each available kind of value. Then all ellements in the buckets will be equal andwe won't need the secound sort. – Twentieth 20/3, 2012 at 18:45

@Nlist, you are technically correct, but also realize that doing that is equivalent to doing insertion sort. You don't get any of the average-performance increases that's at the heart of the motivation behind bucket sort. – Jyoti 20/3, 2012 at 20:2

Still not quite agreeing with this answer. LinkedList insert to the end can be easily made O(1), if not a different data structure like Stack can be used, and it just has to be popped again and will still be a linear time. Why does each bucket have to be sorted again? If the smallest granularity of possible values are known ahead of time? You just make sure there're enough buckets. – Bromley 6/4, 2017 at 6:57

J

10

In order to merge the buckets, they first need to be sorted. Consider the pseudocode given in the Wikipedia article:

function bucketSort(array, n) is
  buckets ← new array of n empty lists
  for i = 0 to (length(array)-1) do
    insert array[i] into buckets[msbits(array[i], k)]
  for i = 0 to n - 1 do
    nextSort(buckets[i])
  return the concatenation of buckets[0], ..., buckets[n-1]

The nextSort(buckets[i]) sorts each of the individual buckets. Generally, a different sort is used to sort the buckets (i.e. insertion sort), as once you get down and size, different, non-recursive sorts often give you better performance.

Now, consider the case where all n elements end up in the same bucket. If we use insertion sort to sort individual buckets, this could lead to the worst case performance of O(n^2). I think the answer must be dependent on the sort you choose to sort the individual buckets.

Jyoti answered 20/3, 2012 at 17:53 Comment(1)

but what if we sort each bucket with merge sort, in that case even if all elements are added to same bucket, it will still be O(nlogn). What is your opinion? – Eglanteen 30/8, 2021 at 14:52