Why is QuackSort 2x faster than Data.List's sort for random lists?
Asked Answered
Z

1

11

I was looking for the canonical implementation of MergeSort on Haskell to port to HOVM, and I found this StackOverflow answer. When porting the algorithm, I realized something looked silly: the algorithm has a "halve" function that does nothing but split a list in two, using half of the length, before recursing and merging. So I thought: why not make a better use of this pass, and use a pivot, to make each half respectively smaller and bigger than that pivot? That would increase the odds that recursive merge calls are applied to already-sorted lists, which might speed up the algorithm!

I've done this change, resulting in the following code:

import Data.List
import Data.Word

randomList :: Word32 -> Word32 -> [Word32]
randomList seed 0    = []
randomList seed size = seed : randomList (seed * 1664525 + 1013904223) (size - 1)

quacksort :: [Word32] -> [Word32]
quacksort []           = []
quacksort [x]          = [x]
quacksort (p : x : xs) = split p (p : x : xs) [] [] where

  -- Splits the list in two halves of elements smaller/bigger than a pivot
  split p []       as bs = merge (quacksort as) (quacksort bs)
  split p (x : xs) as bs = quack p (p < x) x xs as bs

  -- Helper function for `split`
  quack p False x xs as bs = split p xs (x : as) bs
  quack p True  x xs as bs = split p xs as (x : bs)

  -- Merges two lists as a sorted one
  merge []       ys       = ys
  merge xs       []       = xs
  merge (x : xs) (y : ys) = place (x < y) x xs y ys

  -- Helper function for `merge`
  place False x xs y ys = y : merge (x : xs) ys
  place True  x xs y ys = x : merge xs (y : ys)

main :: IO ()
main = do
  let l = randomList 0 2000000
  let b = quacksort l
  print $ sum b

I then benchmarked it and, to my surprise, it was, indeed, 2x faster than Haskell's official Data.List sort. So I wondered why this isn't used in practice, and, suddenly, I realized the obvious: mergesort does NOT perform better on already sorted lists. D'oh. So the whole assumption behind quacksort was failed. Not only that, it would perform terribly for reversely sorted lists, since it would fail to produce two halves of similar size (except if we could guess a really good pivot). So, quacksort is wack in all cases and should never be used in practice. But, then...

Why the hell does it perform 2x faster than Data.List's sort for random lists?

I can't think of a good reason this should be the case. Making each half smaller/bigger than a pivot doesn't change how many times the merge call must be called, so it shouldn't have any positive effect. But reverting it back to a conventional mergesort does make it 2x slower, so, for some reason, the ordered split helps.

Zedoary answered 25/1, 2022 at 23:11 Comment(17)
Data.List uses the idea of TimSort that makes the assumption that most lists are partly sorted.Chub
@WillemVanOnsem interesting, I didn't know that. Most lists sorted in practice are indeed probably partly sorted.Zedoary
Perhaps there's some specialization going on. Dictionary lookups can be a killer. Is quacksort still faster if you make it polymorphic and put it in a different module?Unmanned
@DanielWagner the thing is, if you split the list in two halves without using a pivot, it will be 2x slower than itself. So for some reason splitting smaller/bigger elements is making a difference, and that's the mystery.Zedoary
This is pretty interesting! I managed to double the speed of quacksort and I have some ideas about the performance reasons. I'll post an answer after I sufficiently explored this.Ism
quacksort is an obfuscated and mildly inefficient unstable quick sort. Your arguments to merge are always fully partitioned so that it's doing the same thing as (++) would, except with extra comparisons thrown in to slow it down. As with any naive-pivot quick sort, it has O(n²) edge cases - and they're going to be the same sorts of cases in which Data.List.sort is O(n). It might be faster on random data, but it would likely be a regression in many real-world applications. And the loss of stability makes it unsuitable for some cases involving sorting in multiple steps.Lamrert
@Lamrert you're correct. I see it now. So this was just a derpy quicksort all along. I guess I couldn't have picked a better name.Zedoary
you can find stable quicksort version e.g. at the bottom of this answer of mine. although it also does three-way partitioning into the smaller the bigger and the equal elements which you don't need here.Fallfish
Note that the original merge sort and the merge sort used in nearly all libraries is some variation of a bottom up merge sort, typically a hybrid of insertion sort to create small runs, then using merge sort on those runs, such as timsort . The top down merge sort you refer to in your question is mostly used for academic purposes, not for actual implementations .Unmuzzle
@Unmuzzle just to clarify, the last time I looked at Haskell library sort code (a year or so ago, maybe more) it was detecting the runs, not creating them.Fallfish
BTW I just noticed the Q says "mergesort does NOT perform better on already sorted lists". this is not true for the built-in sort, because it detects the ordered runs in the input in O(n) time, and when there's only one run detected it is just returned as the result, for the total O(n) time. this works for descending lists as well as ascending.Fallfish
@WillNess it is true for the naive mergesort, though? Isn't the built in sort actually a functional timsort?Zedoary
I don't believe so, no. Timsort stops the partitioning at some small number of items and does insertion sort for those chunks AFAIR. the library sort doesn't do that at all - last time I checked, which was some year+ go. it goes along the input list detecting and collecting the ordered runs. then it proceeds with the pairwise mergings starting from those detected chunks. the naive bottom-up mergesort just breaks the list into singletons instead, and then goes on merging them, in pairwise manner, until only one chunk is left. I'll include a link to a prototypical code shortly.Fallfish
here it is. so yes, the naive one doesn't care whether the input is sorted or not -- it will perform the same number of comparisons in either case, in O(n log n) time. ---- so yeah, I've checked the source just now and it still does that.Fallfish
reading WP page on timsort, it does look similar. except, in the library version there's no minimal run size requirement and hence no insertion sort (like that page describes). so the library sort is much simpler, what's described at that page is much more sophisticated. timsort "strives to perform balanced merges (a merge thus merges runs of similar sizes)." the built-in just does them pairwise, be what may.Fallfish
@WillNess - what about Java's implementation of timsort for an array of objects?Unmuzzle
@Unmuzzle don't know anything about it.Fallfish
F
2

Your split splits the list in two ordered halves, so merge consumes its first argument first and then just produces the second half in full. In other words it is equivalent to ++, doing redundant comparisons on the first half which always turn out to be True.

In the true mergesort the merge actually does twice the work on random data because the two parts are not ordered.

The split though spends some work on the partitioning whereas an online bottom-up mergesort would spend no work there at all. But the built-in sort tries to detect ordered runs in the input, and apparently that extra work is not negligible.

Fallfish answered 27/1, 2022 at 10:46 Comment(2)
Data.List.sort is not usual merge sort. First it finds the ordered runs in the input, resulting in a list of ordered sublists. Then it recursively merges them pairwise until there is only one remaining. That makes a lot of sense when sorting a linked list, but wouldn't make sense for sorting an array.Glabrous
@Glabrous I believe I didn't say anything different in my answer (and also, comments above). if you reacted to the "true" in my "true mergesort" phrase, I meant it in comparison with the OP's version, not in the "true Sieve" kind of way. and yes, the question (and answer) explicitly talk about lists, not arrays.Fallfish

© 2022 - 2024 — McMap. All rights reserved.