Haskell Merge Sort
Asked Answered
A

4

6

This is an implementation of Mergesort using higher order functions,guards,where and recursion.

However getting an error from compiler 6:26: parse error on input ‘=’

 mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
    mergeSort merge xs
        | length xs < 2 = xs
        | otherwise = merge (mergeSort merge first) (mergeSort merge second)
        where first = take half xs 
              second = drop half xs 
               half = (length xs) `div` 2

I can't see whats wrong? or rather I don't understand the compiler.

Anesthetic answered 6/5, 2016 at 23:46 Comment(1)
btw - if you are using length this is an O(n) operation - so you are producing unnecessary overhead, in general if you are using length indexing a lot with (single-linked) lists you are probably using the wrong data structure.Himmler
A
11

Halving a list is not an O(1) operation but O(n), so the given solutions introduce additional costs compared to the imperative version of merge sort. One way to avoid halving is to simply start merging directly by making singletons and then merging every two consecutive lists:

sort :: (Ord a) => [a] -> [a]
sort = mergeAll . map (:[]) 
  where
    mergeAll [] = []
    mergeAll [t] = t
    mergeAll xs  = mergeAll (mergePairs xs)

    mergePairs (x:y:xs) = merge x y:mergePairs xs
    mergePairs xs = xs

where merge is already given by others.

Actinouranium answered 19/10, 2018 at 16:6 Comment(1)
A small but nice optimization: instead of breaking the list up into singletons, break it up into sorted lists of two elements. That saves O(n) allocation without much extra code complexity.Kenwood
A
10

Another msort implementation in Haskell;

merge :: Ord a => [a] -> [a] -> [a]
merge [] ys         = ys
merge xs []         = xs
merge (x:xs) (y:ys) | x < y     = x:merge xs (y:ys)
                    | otherwise = y:merge (x:xs) ys

halve :: [a] -> ([a],[a])
halve xs = (take lhx xs, drop lhx xs)
           where lhx = length xs `div` 2

msort :: Ord a => [a] -> [a]
msort []  = []
msort [x] = [x]
msort  xs = merge (msort left) (msort right)
            where (left,right) = halve xs
Ailyn answered 21/3, 2017 at 14:41 Comment(7)
Shouldn't merge be a tail recursive function for efficiency reasons? I haven't benchmarked it but when just running it in repl.it with a list of 30000 values, tail recursion is a bit faster: repl.it/@faebl/SortingZacarias
the msort is missing the equation "msort [] = []". Otherwise the thing will never stop if the array to sort is empty.Reddick
@Cuban coffee Thanks for the heads up... corrected.Ailyn
@FabianSchneider, no, it should not. Haskell lists are lazy in both their elements and (what matters here) their tails. So constructing a Haskell list the way the above code does is efficient: the list constructor is allocated first, and only later is the tail evaluated.Kenwood
@Kenwood thanks for the info; I thought about it and read a a bit more about laziness and the list constructor and it makes kind of sense. Also "benchmarking" it on repl.it isn't really a reliable thing...Zacarias
It's likely more efficient to halve a list using a two-pointer algorithm than to take the length and divide by two. And it's definitely prettier.Kenwood
Here's a two-pointer list halver.Kenwood
H
9

Haskell is an indentation sensitive programming language, you simply need to fix that (btw. if you are using tabs change that to using spaces).

mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
        | length xs < 2 = xs
        | otherwise = merge (mergeSort merge first) (mergeSort merge second)
        where first = take half xs 
              second = drop half xs 
              half = length xs `div` 2
Himmler answered 7/5, 2016 at 0:6 Comment(2)
that seems to have fixed that error but now when I run the function mergeSort, it says merge not in scope. Does that mean, the code is incomplete and I need to further define what merge actually does?Anesthetic
@Anesthetic this should be an accepted answer if it solved your problem; even if this is from three years ago...Zacarias
F
3

None of these solutions is as smart as Haskell's own solution, which runs on the idea that in the worst case scenario's these proposed algorithms is still run Theta (n log n) even if the list to be sorted is already trivially sorted.

Haskell's solution is to merge lists of strictly decreasing (and increasing values). The simplified code looks like:

mergesort :: Ord a => [a] -> [a] 
mergesort xs = unwrap (until single (pairWith merge) (runs xs)) 

runs :: Ord a => [a] -> [[a]]
runs = foldr op [] 
 where op x [] = [[x]]
       op x ((y:xs):xss) | x <= y    = (x:y:xs):xss
                         | otherwise = [x]:(y:xs):xss`

This will run Theta(n)

Haskell's version is smarter still because it will do an up run and a down run.

As usual I am in awe with the cleverness of Haskell!

Foofaraw answered 23/8, 2021 at 7:52 Comment(1)
The version in Data.List is indeed clever as a general purpose sort. But not every application will expect elements to be partially sorted already. In such applications, it will generally be faster not to try to detect runs of sorted elements. Additionally, there are quite a few applications where it's more important for running time to be predictable than for it to be fast. In these applications too, it's better not to take advantage of sorted runs.Kenwood

© 2022 - 2024 — McMap. All rights reserved.