Why is QuackSort 2x faster than Data.List's sort for random lists?

I was looking for the canonical implementation of MergeSort on Haskell to port to HOVM, and I found this StackOverflow answer. When porting the algorithm, I realized something looked silly: the algorithm has a "halve" function that does nothing but split a list in two, using half of the length, before recursing and merging. So I thought: why not make a better use of this pass, and use a pivot, to make each half respectively smaller and bigger than that pivot? That would increase the odds that recursive merge calls are applied to already-sorted lists, which might speed up the algorithm!

I've done this change, resulting in the following code:

import Data.List
import Data.Word

randomList :: Word32 -> Word32 -> [Word32]
randomList seed 0    = []
randomList seed size = seed : randomList (seed * 1664525 + 1013904223) (size - 1)

quacksort :: [Word32] -> [Word32]
quacksort []           = []
quacksort [x]          = [x]
quacksort (p : x : xs) = split p (p : x : xs) [] [] where

  -- Splits the list in two halves of elements smaller/bigger than a pivot
  split p []       as bs = merge (quacksort as) (quacksort bs)
  split p (x : xs) as bs = quack p (p < x) x xs as bs

  -- Helper function for `split`
  quack p False x xs as bs = split p xs (x : as) bs
  quack p True  x xs as bs = split p xs as (x : bs)

  -- Merges two lists as a sorted one
  merge []       ys       = ys
  merge xs       []       = xs
  merge (x : xs) (y : ys) = place (x < y) x xs y ys

  -- Helper function for `merge`
  place False x xs y ys = y : merge (x : xs) ys
  place True  x xs y ys = x : merge xs (y : ys)

main :: IO ()
main = do
  let l = randomList 0 2000000
  let b = quacksort l
  print $ sum b

I then benchmarked it and, to my surprise, it was, indeed, 2x faster than Haskell's official Data.List sort. So I wondered why this isn't used in practice, and, suddenly, I realized the obvious: mergesort does NOT perform better on already sorted lists. D'oh. So the whole assumption behind quacksort was failed. Not only that, it would perform terribly for reversely sorted lists, since it would fail to produce two halves of similar size (except if we could guess a really good pivot). So, quacksort is wack in all cases and should never be used in practice. But, then...

Why the hell does it perform 2x faster than Data.List's sort for random lists?

I can't think of a good reason this should be the case. Making each half smaller/bigger than a pivot doesn't change how many times the merge call must be called, so it shouldn't have any positive effect. But reverting it back to a conventional mergesort does make it 2x slower, so, for some reason, the ordered split helps.

Recommended topics

Hot tags