Worst case for QuickSort - when can it occur?
Asked Answered
I

6

42

When analyzing QS, every one always refers to the "almost sorted" worst case. When can such a scenario occur with natural input?

The only example I came up with is re-indexing.

Intemperance answered 10/3, 2010 at 7:27 Comment(4)
When its already sorted.Fissure
that's not entirely right, see Jens' answerMann
@Shira, one example of "almost sorted" natural data is when modeling moving objects. Lets say you have several balls which are affected by gravity and other forces. You take a timestep and sort them along their x-axis position. Now, advance the timestep once more. This data is now "almost sorted." Some balls may have swapped positions along the x-axis in that short time but most of them are still sorted.Cosentino
Or indeed any situation where you have data that you occasionally modify and occasionally sort. This isn't rare at all, particularly in languages like Python and JS where the array type is extremely widely used and has a fast in-place sort operation, and there's no standard tree type. (Another way of saying this is that "re-indexing" is an extremely general operation that many programs do in one form or another.)Panoply
B
42

I think people are confusing Quicksort the partition-based sorting algorithm, and "qsort" the various library implementations.

I prefer to see Quicksort the algorithm as having a pluggable pivot selection algorithm, which is quite essential in analyzing its behavior.

If the first element is always chosen as the pivot, then an already sorted list is the worst-case. Often there's a high probability that the array is already/nearly sorted, so this implementation is rather poor.

Analogously, selecting the last element as the pivot is bad for the same reason.

Some implementations tries to avoid this problem by choosing the middle element as the pivot. This would not perform as badly on already/nearly sorted arrays, but one could still construct an input that would exploit this predictable pivot selection and make it run in quadratic time.

Thus, you get randomized pivot selection algorithms, but even this doesn't guarantee O(N log N).

So other algorithms were developed that would use some information from the sequence before picking a pivot. You can of course scan the whole sequence and find the median, and use that as the pivot. This guarantees O(N log N), but of course slower in practice.

So some corners are cut, and people devised the median-of-3 algorithm. Of course, later even this was exploitable by the so-called median-of-3 "killer".

So more attempts are made at coming up with more "intelligent" pivot selection algorithms that guarantees O(N log N) asymptotic behavior that is still fast enough to be practical, with varying degree of success.

So really, unless one specifies a particular implementation of Quicksort, the question of when the worst case scenario occurs is ill-defined. If you use the so-called median-of-medians pivot selection algorithm, there is no quadratic worst-case scenario.

Most library implementations, however, are likely to forfeit O(N log N) guarantee for much faster sorting in the average case. Some of the really old implementations use the first element as the pivot, which is now well-understood as poor and is no longer a practice widely followed.

Bloodshed answered 10/3, 2010 at 8:45 Comment(2)
No pivot selection algorithm can guarantee O(N.log(N)) processing time. This is obviously wrong. Just think of the trivial case where all entries share the same value.Frankforter
@ErwanLegrand That trivial case is easy to circumvent for any pivot selection technique by minor variation to the algorithm. Simply partion into 3 sets: Less, Equal, Greater. I.e. QSort(List) { (Choose Pivot) Partition(List, Pivot, Less, Equal, Greater); return QSort(Less) + Equal + QSort(Greater); } Basically, there's no point re-sorting items equal to the pivot because you know exactly where they belong in the final output. It turns out that using this approach, if all entries share the same value, performance will be O(n).Discalced
B
34

I believe that the worst case for quicksort depends on the choice of the pivot element at every step. Quicksort has its worst performance, if the pivot is likely to be either the smallest, or the largest element in the list (e.g. the first or last element of an already sorted list).

If, e.g. you choose the middle element of the list, an already sorted list does not have the worst case runtime.

So, if you suspect your scenario is likely to a bad case scenario for quicksort, you can simply change your choice of pivot element to make quicksort perform better.

Note: I know, that this did not give more example of real world occasions for quicksort worst cases. Examples of this depend on the implementation you are working with.

Blisse answered 10/3, 2010 at 7:45 Comment(9)
or you use something like median-of-3 to get a relatively well-chosen pivotMann
or you use a random element. That only goes wrong with a very (very very) small probability, independend of possibly obscure inputs.Blisse
The "already sorted" meme's perhaps so prevalent because lots of people take the first element as the pivot, on the assumption that the list is unsorted. In the case of a sorted list, that's the worst element you could choose.Kaki
A list in reverse sorted order would be the worst case if you choose the first element as pivot. One has to choose the last element to make the already sorted case the worst case. Note: The questioner asks for the "almost sorted" case, which is the worst case only with high probability even if you choose the last element. It could be (with small probability) that the median is the last element, which means that almost sorted can as well be the best case.Mann
@swegi: The problem occurs when the current subarray is not sufficiently evenly divided for recursion. It does not matter which extreme (largest or smallest) pivot is chosen; as long as it is extreme, you get worst case behaviour.Impaste
@Svante: You're completely right, thanks for pointing that out.Mann
The worst case can also be caused by repeated elements depending on the quicksort implementation. If the quicksort partitions into [ < p | >= p] at each step, then having all elements the same will lead to worst case behavior no matter what pivot is chosen because one partition, [ < p], will have zero elements each time. A quicksort that partitions into [ <= p | >= p] will also have. There are modifications to these quicksorts that overcome this difficulty.Trochal
I think I didn't phrase my question well enough.. I didn't mean to ask what sort of input is QS's worst case. I assumes that is is the "almost sorted" case, and the question is - "when do we get an almost sorted input in real life input situations" ? Sorry for the confusion, and thanks for the answers :)Intemperance
@JustinPeel Correct, and the modification you're looking for is partition [<p | =p | >p]. After all, it's abvious that all the elements with equality to p all belong inbetween the < and > partitions. So there's really no point in putting them back into either partition to go through further sort iterations. It's also interesting to note that this modification turns "all elements equal" into the best case by allowing the sort to be done with O(n) complexity - also no matter what pivot is chosen.Discalced
D
8

The actual question was: "When can such a scenario (almost sorted) occur with natural input?".

Although all the answers are dealing with "what causes worst case performance", none have covered "what causes data that meets the worst case performance scenario".

So, to answer the actual question

  • Programmer error: Basically you land up sorting a list twice. Typically this happens because a list is sorted one place in code. And later in another piece of code you know you need the list to be sorted, so you sort it again.

  • Using almost-chronological data: You have data that is generally received in chronological order, but occasionally some elements are out of position. (Consider a multi-threaded environment adding time-stamped elements to a list. Race conditions can cause elements to be added in a different order to which they were time-stamped.) In this situation, if you need sorted data, you must re-sort. Because the order of the data is not guaranteed.

  • Adding items to a list: If you have a sorted list and simply append some items (i.e. without using binary insertion). You would need to re-sort an almost-sorted list.

  • Data from an external source: If you receive data from an external source, there may be no guarantee that it's sorted. So you sort it yourself. However, if the external source is sorted, you will be re-sorting the data.

  • Natural ordering: This is similar to the chronoloigcal data. Basically, the natural order of the data you receive may be sorted. Consider an insurance company adding car registrations. If the authority assiging car registrations does so in a predictable order, newer cars are likely but not guaranteed to have higher registration numbers. Since you're not guaranteed it's sorted - you have to re-sort.

  • Interleaved data: If you receive data from multiple sorted sources with overlapping keys, you could get keys resembling the following: 1 3 2 5 4 7 6 9 8 11 10 13 12 15 14 17 16 19 18. Even though half the elements are out-of-sequence with its neighbour, the list is "almost sorted". Certainly using QuickSort that pivots on the first element would exhibit O(n^2) performance.

Conclusion

So, given all the above scenarios, it's actually quite easy to land up sorting almost-sorted data. And this is exactly why QuickSort that pivots on the first element is actually best avoided. polygene has provided some interesting information on alternate pivoting considerations.

As a side-note: One of the usually worst performing sorting algorithms, actually does quite well with "almost-sorted" data. In the interleaved data above, bubble-sort requires only 9 swap operations. It's performance would actually be O(n).

Discalced answered 11/7, 2014 at 17:38 Comment(0)
S
7

From Quicksort

for quicksort, "worst case" corresponds to already sorted

A list with all the items the same number is already sorted.

Sigmon answered 10/3, 2010 at 7:32 Comment(2)
too bad, but your source is not entirely right, see Jens's answerMann
+1, since if all the numbers are the same you'd get the worst case regardless of how you choose the pivotTankersley
B
3

worst case in quick sort:

  1. All elements of array are same
  2. Array is already sorted in same order
  3. Array is already sorted in reverse order.
Basham answered 28/5, 2013 at 8:31 Comment(0)
E
1

Quick worst case depends on choosing pivot element . so the problem occure only when 1) Array is already sorted in same order. 2) Array is already sorted in reverse order. 3) All elements are same (special case of case 1 and 2)

Explication answered 13/5, 2016 at 6:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.