Worst case of the Quicksort algorithm

Asked 7/11, 2016 at 23:24 Answered 8/11, 2016 at 1:47

Solved arrays algorithm sorting time-complexity quicksort

I found many implementations of quick sort algorithm, but at the end I decided to stick to this one:

public static void quickSort(int array[], int start, int end)
        {
            if(end <= start || start >= end) { 

            } else {
            int pivot = array[start];
            int temp = 0 ;
            int i = start+1;

            for(int j = 1; j <= end; j++)  { 
                if(pivot > array[j]) { 
                    temp = array[j];
                    array[j] = array[i];
                    array[i] = temp;
                    i++;
                }

            }
            array[start] = array[i-1];
            array[i-1] = pivot;
            quickSort(array, start, i-2);
            quickSort(array, i, end);
        }}

There are several things I'm confused about.
Why some people suggest taking the first element as a pivot point, others tell to pick the middle element and some will tell that you should pick the last element as your pivot point, wouldn't it be different?
Let's say I'm trying to show why if the array is sorted quick sort will have O(n^2) as the worst case order of growth.
I have the following array:
{1, 2, 3, 4, 5, 6}.
If I pick the first element as my pivot element, would it not compare it to every other element and then will just swap it with itself and will be just O(n)? Then it will proceed further to two lines which are O(logn)

quickSort(array, start, i-2);
quickSort(array, i, end);

So at the end, even if it is an ordered list of integers, it will still be O(nlogn)?

If I decided to pick my last element as my pivot element, would it not be completely different? It will be swapping 6 and 1 and hence it will be doing the operations that are completely different compared to when the pivot element was the first element in the array.

I just don't understand why the worst case is O(n^2).

Any help will be greatly appreciated!

Kanazawa answered 7/11, 2016 at 23:24 Comment(5)

Last, first, middle... it's all the same, unless you have some knowledge about the data to sort. They are all just guesses and can lead to the worst case scenario. The big plus of quickSort is that it is in-place, however it's not stable. – Rataplan 7/11, 2016 at 23:29

I think it will take the first as pivot, compare it to all others, then the second and so on. This gives n^2. But it's only the worst case, average is O(n logn). And for primitive datatypes it doesn't matter whether it's stable or not. – Rataplan 7/11, 2016 at 23:34

@Rataplan That is the thing I don't understand, since this quick sort implementation has only one for loop, hence the value of "i" will only be incrementing when the list is splitted in two sub arrays. Hence, it will not compare each value with other value. – Kanazawa 7/11, 2016 at 23:36

yes, but if one group consists of one element only, then you don't progress that fast... 1 + 2 + 3 + ... + n - 1 = n * (n - 1) / 2, which evaluates to O(n^2). – Rataplan 7/11, 2016 at 23:38

By the way, why end <= start || start >= end? When are these two expressions not equivalent? – Faris 8/11, 2016 at 6:8

The whole point of Quicksort is to find a pivot that partitions the array into two approximately equal pieces. That's where you get the log(n) from.

Suppose there is an array of size n and at each iteration you can partition the array into equal parts. Then we have:

T(n) = 2 * T(n / 2) + O(n)
     = 4 * T(n/4) + 2 * O(n)
.
.
(log(n) steps)
.
.
    = 2^log(n) * T(1) + log(n) * O(n)
    = n * O(1) + O(n * log(n))
    = O(n * log(n))

Now, if we partition the array into sizes say 1 and n-1, we get:

T(n) = T(1) + T(n-1) + O(n) = T(n-1) + O(n)
     = T(n-2) + O(n-1) + O(n)
     = T(n-3) + O(n-2) + O(n-1) + O(n)
.
.
(n-1) steps
.
.
    = T(1) + O(2) + O(3) + ... + O(n)
    = O(1 + 2 + 3 + .... + n)
    = O(n^2)

In the case that you mention, both of the following will not individually be O(log(n)). One will be O(1) and the other will be T(n-1) if the array is sorted. Hence you would get the O(n^2) complexity.

quickSort(array, start, i-2); // should be constant time
quickSort(array, i, end); // should be T(n-1)

And as @MarkRansom mentions below, this is not exclusive to sorted arrays. In general, if you choose pivots in such a way that the array is very unevenly partitioned, you'll run into such worst-case complexities. For example, if the array is not sorted but you always choose the maximum (or minimum depending upon your implementation) for the pivot, you'll run into the same problem.

Roux answered 7/11, 2016 at 23:37 Comment(2)

And don't forget, no matter which pivot point you pick there will be a data arrangement that can give you the worst case. – Cheeseparing 7/11, 2016 at 23:47

Oh, that makes so much sense! I was focusing on that for loop and was thinking that it would be determining my worst case, but I was not taking into much considerations those two lines. Thank you very much! – Kanazawa 8/11, 2016 at 0:22

QuickSort starts by moving everything that's got a higher value than the pivot value to the end of the list, and everything that's got a lower value to the beginning of the list.

If the value at your pivot point is the lowest value in the list, then every value in the list will be moved to the end of the list. However, just determining where to move all of those values requires O(n) work. If you then pick the second-lowest value, and then the third-lowest value, etc., then you'll end up doing O(n) work n/2 times. O(n²/2) simplifies to O(n²).

Why some people suggest taking the first element as a pivot point, others tell to pick the middle element and some will tell that you should pick the last element as your pivot point, wouldn't it be different?

It's all a matter of trying to guess (without scanning the whole list) which element is most likely to be close to the median of your data set, thereby giving you as close to the best-case behavior as possible.

If your data is totally random, then it doesn't matter what you choose--you're equally likely to get a good pivot point, and your chances of consistently choosing the worst pivot point are very slim. Choosing the first or last value is the simplest option that works.
If your data is presorted (or mostly so), choosing the middle is probably going to get you one of the best values, whereas choosing the first or last element will consistently give you the worst pivot points.

In real life, the likelihood of dealing with data that's mostly presorted is high enough that it's probably worth the slightly higher complexity of code. The Wikipedia section on this topic may be worth reading.

Kisser answered 7/11, 2016 at 23:44 Comment(2)

Oh that makes sense, so if my array is sorted and I pick my pivot element to be in the middle, would it still be O(n^2)? – Kanazawa 8/11, 2016 at 0:24

@Kanazawa - using middle value as pivot should be O(n log(n)) for both sorted and reverse sorted data. – Nanny 8/11, 2016 at 1:36

Below is a quicksort that uses median of 3, and limits stack complexity to O(log(n)) by only using recursion on the smaller part, then looping back for the larger part. Worst case time complexity is still O(n^2), but this would require median of 3 to repeatedly choose small or large values. Time complexity can be limited to O(n log(n)) by using median of medians, but the overhead for this makes the average case much slower (I'm wondering if it ends up slower than heap sort. With median of medians, it's definitely slower than merge sort, but a standard merge sort needs a second array the same size or 1/2 the size of the original array).

http://en.wikipedia.org/wiki/Median_of_medians

Introsort solves the worst case time complexity by switching to heap sort based on the level of recursion.

http://en.wikipedia.org/wiki/Introsort

typedef unsigned int uint32_t;

void QuickSort(uint32_t a[], size_t lo, size_t hi) {
    while(lo < hi){
        size_t i = lo, j = (lo+hi)/2, k = hi;
        uint32_t p;
        if (a[k] < a[i])            // median of 3
            std::swap(a[k], a[i]);
        if (a[j] < a[i])
            std::swap(a[j], a[i]);
        if (a[k] < a[j])
            std::swap(a[k], a[j]);
        p = a[j];
        i--;                        // Hoare partition
        k++;
        while (1) {
            while (a[++i] < p);
            while (a[--k] > p);
            if (i >= k)
                break;
            std::swap(a[i], a[k]);
        }
        i = k++;
        // recurse on smaller part, loop on larger part
        if((i - lo) <= (hi - k)){
            QuickSort(a, lo, i);
            lo = k;
        } else {
            QuickSort(a, k, hi);
            hi = i;
        }
    }
}

Nanny answered 8/11, 2016 at 1:47 Comment(0)

Recommended topics

Hot tags