Comparison between timsort and quicksort
Asked Answered
T

6

87

Why is it that I mostly hear about Quicksort being the fastest overall sorting algorithm when, according to Wikipedia, Timsort seems to perform much better?

Timoteo answered 14/10, 2011 at 15:51 Comment(8)
With a little more thought and some references, this could be a good question.Milkwort
Because people choose to ignore that quicksort is O(n^2) worst case.Gunning
One possible answer would be: You speak to the wrong persons. But as an other answer already implied: qsort is much older, so its used in far more libraries - and you know: Never touch a running system. If the average running time (meaning: in the use cases of the people using it) is not much worse than the run time of a different algorithm (like timsort) the people are too lazy (or have better things to do) than to change something, that does the same in the same time. And in some applications (it seems e.g. python) timsort is already default.Horizon
@Patrick87: The truth is much different. You are ignoring the O(n) best case. It's not about worst cases that basically never happen, it's about best cases that actually do. timsort does a good job when it encounters a sorted range.Froth
@rrenaud worst cases "basically" never happen, but they do "actually" happen, sometimes. It is an important consideration, especially when hitting a worst case O(n<sup>2</sup>) means bad things happen.Longlimbed
@RobNeuhaus: I think the worst case of (a simple implementation of) Quicksort actually happens quite often. Just sort a list that is (almost) sorted already.Principate
@MartinThoma Sorted lists only produce the worst-case result for a really bad implementation of Quicksort. Random or median-of-three partition selection implementations avoid worst-case for sorted or nearly sorted lists. But they still don't achieve the O(n) behavior of Timsort. Timsort's stability is also a key property that people overlook. It is very useful in a lot of situations, and in particular for multi-key sorts.Ahola
I cannot use standard java sort in codeforces programming competitions, because java uses double pivot quicksort for integer and double arrays, and so there exist arrays which require O(n^2) time to run. And some test data is often composed with these arrays, so program takes too long time and fails. So I have to switch to my own mergeSort instead. It cannot happen with timsort algorithm.Blackamoor
S
67

TimSort is a highly optimized mergesort, it is stable and faster than old mergesort.

when comparing with quicksort, it has two advantages:

  1. It is unbelievably fast for nearly sorted data sequence (including reverse sorted data);
  2. The worst case is still O(N*LOG(N)).

To be honest, I don't think #1 is a advantage, but it did impress me.

Here are QuickSort's advantages

  1. QuickSort is very very simple, even a highly tuned implementation, we can write down its pseduo codes within 20 lines;
  2. QuickSort is fastest in most cases;
  3. The memory consumption is LOG(N).

Currently, Java 7 SDK implements timsort and a new quicksort variant: i.e. Dual Pivot QuickSort.

If you need stable sort, try timsort, otherwise start with quicksort.

Slier answered 25/10, 2013 at 10:25 Comment(7)
#1 can be a huge advantage. If you maintain a list of data that you must re-sort frequently (because items are inserted, appended, or modified), having an algorithm that allows you to very cheaply re-order that data is extremely useful. Whether it is useful depends on the situation, for sure, but it is huge in some cases and also feels obvious: nearly sorted lists shouldn't be hard to sort.Ahola
@JeremyWest: If you know that data is already sorted, you should use binary search to insert new values. Don't sort it again and again.Jovial
@EricDuminil Binary search is fast, but insertions in the middle of an array aren't. There are many applications where the simplest (and often most efficient) solution is to re-sort a mostly sorted list when you need it to be sorted, but to let it get unsorted otherwise. Or cases where you read in data that is mostly sorted, and then need to sort it. I'm not suggesting this is always the best solution, but that sometimes it is. And it is one reason why sorts that perform well on mostly sorted lists are preferable, particularly for standard libraries.Ahola
@JeremyWest: Thanks for the insightful answer.Jovial
@Jeremy: if it's mostly sorted, timsort is the algorithm of choice. But binary search + insertion is still O(N + logN), which should beat O(N logN).Wendel
@Groo Yes, if you are only inserting a single item into an already sorted list. That's not the only application I mentioned. Obviously, your mileage will vary depending on the problem you are solving.Ahola
If doing insertions for n items, binary search + insertion is O(N^2 + N logN) = O(N^2) while insert all and sort once is O(N + N logN) = O(N logN).Chemoreceptor
L
29

More or less, it has to do with the fact that Timsort is a hybrid sorting algorithm. This means that while the two underlying sorts it uses (Mergesort and Insertion sort) are both worse than Quicksort for many kinds of data, Timsort only uses them when it is advantageous to do so.

On a slightly deeper level, as Patrick87 states, quicksort is a worst-case O(n2) algorithm. Choosing a good pivot isn't hard, but guaranteeing an O(n log n) quicksort comes at the cost of generally slower sorting on average.

For more detail on Timsort, see this answer, and the linked blog post. It basically assumes that most data is already partially sorted, and constructs "runs" of sorted data that allow for efficient merges using mergesort.

Longlimbed answered 14/10, 2011 at 16:13 Comment(0)
Z
23

Generally speaking quicksort is best algorithm for primitive array. This is due to memory locality and cache.

JDK7 uses TimSort for Object array. Object array only holds object reference. The object itself is stored in Heap. To compare object, we need to read object from heap. This is like reading from one part of the heap for one object, then randomly reading object from another part of heap. There will be a lot of cache miss. I guess for this reason memory locality is not important any more. This is may be why JDK only uses TimSort for Object array instead if primitive array.

This is only my guess.

Zacatecas answered 17/11, 2014 at 2:1 Comment(1)
I assume that is one of the reasons why Timsort is Python's standard sorting algorithm, as everything is an object in Python.Antonioantonius
F
6

Tim Sort is great if you need an order-preserving sort, or if you are sorting a complex array (comparing heap-based objects) rather than a primitive array. As mentioned by others, quicksort benefits significantly from the locality of data and processor caching for primitive arrays.

The fact that the worst case of quicksort is O(n^2) was raised. Fortunately, you can achieve O(n log n) time worst-case with quicksort. The quicksort worst-case occurs when the pivot point is either the smallest or largest value such as when the pivot is the first or last element of an already sorted array.

We can achieve O(n log n) worst-case quicksort by setting the pivot at the median value. Since finding the median value can be done in linear time O(n). Since O(n) + O(n log n) = O(n log n), that becomes the worst-case time complexity.

In practice, however, most implementations find that a random pivot is sufficient so do not search for the median value.

Felice answered 27/9, 2019 at 4:7 Comment(0)
G
5

Here are benchmark numbers from my machine (i7-6700 CPU, 3.4GHz, Ubuntu 16.04, gcc 5.4.0, parameters: SIZE=100000 and RUNS=3):

$ ./demo 
Running tests
stdlib qsort time:                 12246.33 us per iteration
##quick sort time:                  5822.00 us per iteration
merge sort time:                    8244.33 us per iteration
...    
##tim sort time:                    7695.33 us per iteration
in-place merge sort time:           6788.00 us per iteration    
sqrt sort time:                     7289.33 us per iteration    
...
grail sort dyn buffer sort time:    7856.67 us per iteration

The benchmark comes from Swenson's sort project in which he as implemented several sorting algorithms in C. Presumably, his implementations are good enough to be representative, but I haven't investigated them.

So you really can't tell. Benchmark numbers only stay relevant for at most two years and then you have to repeat them. Possibly, timsort beat qsort waaay back in 2011 when the question was asked, but the times have changed. Or qsort was always the fastest, but timsort beat it on non-random data. Or Swenson's code isn't so good and a better programmer would turn the tide in timsort's favor. Or perhaps I suck and didn't use the right CFLAGS when compiling the code. Or... You get the point.

Germ answered 24/9, 2017 at 20:45 Comment(4)
Timsort's performance is dependent on the kind of data being sorted: it is slowest on random data and fastest on sorted data. Strange, then, that this answer and the project readme say nothing about the nature of the data. So I looked at the code and noticed that the data is random. (Quicksort by contrast has consistent speed in most cases, except in specially-crafted adversarial cases and in cases where the quicksort algorithm is implemented badly - e.g. always taking the first or last element as the pivot is a big no-no).Skidway
I would add that the purpose of Timsort was never to beat Quicksort, but rather to be a fast stable sort (Quicksort is not stable) that minimizes comparisons (which are slow in Python). It should, however, beat Quicksort when the data is sorted, or almost sorted. (see also)Skidway
Do you know of any benchmarks in which Timsort is the fastest? I'm opposed to all claims of performance that does not come with benchmark numbers. :) Good point on the Swenson benchmark using randomized data -- I didn't check it very closely.Germ
@Qwertie: TimSort's handling of already (partially) sorted data was also important given that Python does not have an obvious way to merge sorted data, and the non-obvious way (heapq.merge) isn't all that efficient (large parts of it are implemented in Python, not C). So the common way to merge already sorted data, or add unsorted data to sorted data is to just do: sortedlist += newdata; sortedlist.sort() (or one-lined, sortedlist = sorted(sortedlist + newdata)). This would be really inefficient if TimSort didn't use the existing ordering.Sulphonamide
C
0

Timsort is a popular hybrid sorting algorithm designed in 2002 by Tim Peters. It is a combination of insertion sort and merge sort. It is developed to perform well on various kinds of real world data sets. It is a fast, stable and adaptive sorting technique with average and worst-case performance of O(n log n).

How Timsort works

  1. First of all, the input array is split into sub-arrays/blocks known as Run.
  2. A simple Insertion Sort is used to sort each Run.
  3. Merge Sort is used to merge the sorted Runs into a single array.

Advantages of Timsort

  • It performs better on nearly ordered data.
  • It is well-suited to dealing with real-world data.

Quicksort is a highly useful and efficient sorting algorithm that divides a large array of data into smaller ones and it is based on the concept of Divide and Conquer. Tony Hoare designed this sorting algorithm in 1959 with average performance of O(n log n).

How Quicksort works

  1. Pick any element as the pivot.
  2. Divide the array into partitions based on pivots.
  3. Recursively apply quick sort to the left partition.
  4. Recursively apply quick sort to the right partition.

Advantages of Quicksort

  • It performs better on random data as compared to Timsort.
  • It is useful when there is limited space availability.
  • It is the better suited for large data sets.
Cockleboat answered 31/5, 2022 at 11:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.