Why is it that I mostly hear about Quicksort being the fastest overall sorting algorithm when, according to Wikipedia, Timsort seems to perform much better?
TimSort is a highly optimized mergesort, it is stable and faster than old mergesort.
when comparing with quicksort, it has two advantages:
- It is unbelievably fast for nearly sorted data sequence (including reverse sorted data);
- The worst case is still O(N*LOG(N)).
To be honest, I don't think #1 is a advantage, but it did impress me.
Here are QuickSort's advantages
- QuickSort is very very simple, even a highly tuned implementation, we can write down its pseduo codes within 20 lines;
- QuickSort is fastest in most cases;
- The memory consumption is LOG(N).
Currently, Java 7 SDK implements timsort and a new quicksort variant: i.e. Dual Pivot QuickSort.
If you need stable sort, try timsort, otherwise start with quicksort.
More or less, it has to do with the fact that Timsort is a hybrid sorting algorithm. This means that while the two underlying sorts it uses (Mergesort and Insertion sort) are both worse than Quicksort for many kinds of data, Timsort only uses them when it is advantageous to do so.
On a slightly deeper level, as Patrick87 states, quicksort is a worst-case O(n2) algorithm. Choosing a good pivot isn't hard, but guaranteeing an O(n log n) quicksort comes at the cost of generally slower sorting on average.
For more detail on Timsort, see this answer, and the linked blog post. It basically assumes that most data is already partially sorted, and constructs "runs" of sorted data that allow for efficient merges using mergesort.
Generally speaking quicksort is best algorithm for primitive array. This is due to memory locality and cache.
JDK7 uses TimSort for Object array. Object array only holds object reference. The object itself is stored in Heap. To compare object, we need to read object from heap. This is like reading from one part of the heap for one object, then randomly reading object from another part of heap. There will be a lot of cache miss. I guess for this reason memory locality is not important any more. This is may be why JDK only uses TimSort for Object array instead if primitive array.
This is only my guess.
Tim Sort is great if you need an order-preserving sort, or if you are sorting a complex array (comparing heap-based objects) rather than a primitive array. As mentioned by others, quicksort benefits significantly from the locality of data and processor caching for primitive arrays.
The fact that the worst case of quicksort is O(n^2) was raised. Fortunately, you can achieve O(n log n) time worst-case with quicksort. The quicksort worst-case occurs when the pivot point is either the smallest or largest value such as when the pivot is the first or last element of an already sorted array.
We can achieve O(n log n) worst-case quicksort by setting the pivot at the median value. Since finding the median value can be done in linear time O(n). Since O(n) + O(n log n) = O(n log n), that becomes the worst-case time complexity.
In practice, however, most implementations find that a random pivot is sufficient so do not search for the median value.
Here are benchmark numbers from my machine (i7-6700 CPU, 3.4GHz, Ubuntu 16.04, gcc 5.4.0, parameters: SIZE=100000 and RUNS=3):
$ ./demo
Running tests
stdlib qsort time: 12246.33 us per iteration
##quick sort time: 5822.00 us per iteration
merge sort time: 8244.33 us per iteration
...
##tim sort time: 7695.33 us per iteration
in-place merge sort time: 6788.00 us per iteration
sqrt sort time: 7289.33 us per iteration
...
grail sort dyn buffer sort time: 7856.67 us per iteration
The benchmark comes from Swenson's sort project in which he as implemented several sorting algorithms in C. Presumably, his implementations are good enough to be representative, but I haven't investigated them.
So you really can't tell. Benchmark numbers only stay relevant for at most two years and then you have to repeat them. Possibly, timsort beat qsort waaay back in 2011 when the question was asked, but the times have changed. Or qsort was always the fastest, but timsort beat it on non-random data. Or Swenson's code isn't so good and a better programmer would turn the tide in timsort's favor. Or perhaps I suck and didn't use the right CFLAGS
when compiling the code. Or... You get the point.
heapq.merge
) isn't all that efficient (large parts of it are implemented in Python, not C). So the common way to merge already sorted data, or add unsorted data to sorted data is to just do: sortedlist += newdata; sortedlist.sort()
(or one-lined, sortedlist = sorted(sortedlist + newdata)
). This would be really inefficient if TimSort didn't use the existing ordering. –
Sulphonamide Timsort is a popular hybrid sorting algorithm designed in 2002 by Tim Peters. It is a combination of insertion sort and merge sort. It is developed to perform well on various kinds of real world data sets. It is a fast, stable and adaptive sorting technique with average and worst-case performance of O(n log n)
.
How Timsort works
- First of all, the input array is split into sub-arrays/blocks known as Run.
- A simple Insertion Sort is used to sort each Run.
- Merge Sort is used to merge the sorted Runs into a single array.
Advantages of Timsort
- It performs better on nearly ordered data.
- It is well-suited to dealing with real-world data.
Quicksort is a highly useful and efficient sorting algorithm that divides a large array of data into smaller ones and it is based on the concept of Divide and Conquer. Tony Hoare designed this sorting algorithm in 1959 with average performance of O(n log n)
.
How Quicksort works
- Pick any element as the pivot.
- Divide the array into partitions based on pivots.
- Recursively apply quick sort to the left partition.
- Recursively apply quick sort to the right partition.
Advantages of Quicksort
- It performs better on random data as compared to Timsort.
- It is useful when there is limited space availability.
- It is the better suited for large data sets.
© 2022 - 2024 — McMap. All rights reserved.