I was looking for the canonical implementation of MergeSort on Haskell to port to HOVM, and I found this StackOverflow answer. When porting the algorithm, I realized something looked silly: the algorithm has a "halve" function that does nothing but split a list in two, using half of the length, before recursing and merging. So I thought: why not make a better use of this pass, and use a pivot, to make each half respectively smaller and bigger than that pivot? That would increase the odds that recursive merge calls are applied to already-sorted lists, which might speed up the algorithm!
I've done this change, resulting in the following code:
import Data.List
import Data.Word
randomList :: Word32 -> Word32 -> [Word32]
randomList seed 0 = []
randomList seed size = seed : randomList (seed * 1664525 + 1013904223) (size - 1)
quacksort :: [Word32] -> [Word32]
quacksort [] = []
quacksort [x] = [x]
quacksort (p : x : xs) = split p (p : x : xs) [] [] where
-- Splits the list in two halves of elements smaller/bigger than a pivot
split p [] as bs = merge (quacksort as) (quacksort bs)
split p (x : xs) as bs = quack p (p < x) x xs as bs
-- Helper function for `split`
quack p False x xs as bs = split p xs (x : as) bs
quack p True x xs as bs = split p xs as (x : bs)
-- Merges two lists as a sorted one
merge [] ys = ys
merge xs [] = xs
merge (x : xs) (y : ys) = place (x < y) x xs y ys
-- Helper function for `merge`
place False x xs y ys = y : merge (x : xs) ys
place True x xs y ys = x : merge xs (y : ys)
main :: IO ()
main = do
let l = randomList 0 2000000
let b = quacksort l
print $ sum b
I then benchmarked it and, to my surprise, it was, indeed, 2x faster than Haskell's official Data.List
sort. So I wondered why this isn't used in practice, and, suddenly, I realized the obvious: mergesort does NOT perform better on already sorted lists. D'oh. So the whole assumption behind quacksort was failed. Not only that, it would perform terribly for reversely sorted lists, since it would fail to produce two halves of similar size (except if we could guess a really good pivot). So, quacksort is wack in all cases and should never be used in practice. But, then...
Why the hell does it perform 2x faster than Data.List's sort for random lists?
I can't think of a good reason this should be the case. Making each half smaller/bigger than a pivot doesn't change how many times the merge call must be called, so it shouldn't have any positive effect. But reverting it back to a conventional mergesort does make it 2x slower, so, for some reason, the ordered split helps.
Data.List
uses the idea of TimSort that makes the assumption that most lists are partly sorted. – Chubquacksort
still faster if you make it polymorphic and put it in a different module? – Unmannedquacksort
is an obfuscated and mildly inefficient unstable quick sort. Your arguments tomerge
are always fully partitioned so that it's doing the same thing as(++)
would, except with extra comparisons thrown in to slow it down. As with any naive-pivot quick sort, it has O(n²) edge cases - and they're going to be the same sorts of cases in whichData.List.sort
is O(n). It might be faster on random data, but it would likely be a regression in many real-world applications. And the loss of stability makes it unsuitable for some cases involving sorting in multiple steps. – Lamrertsort
code (a year or so ago, maybe more) it was detecting the runs, not creating them. – Fallfishsort
, because it detects the ordered runs in the input in O(n) time, and when there's only one run detected it is just returned as the result, for the total O(n) time. this works for descending lists as well as ascending. – Fallfish