Efficient heaps in purely functional languages
Asked Answered
E

9

40

As an exercise in Haskell, I'm trying to implement heapsort. The heap is usually implemented as an array in imperative languages, but this would be hugely inefficient in purely functional languages. So I've looked at binary heaps, but everything I found so far describes them from an imperative viewpoint and the algorithms presented are hard to translate to a functional setting. How to efficiently implement a heap in a purely functional language such as Haskell?

Edit: By efficient I mean it should still be in O(n*log n), but it doesn't have to beat a C program. Also, I'd like to use purely functional programming. What else would be the point of doing it in Haskell?

Eremite answered 31/5, 2009 at 19:52 Comment(0)
L
35

There are a number of Haskell heap implementations in an appendix to Okasaki's Purely Functional Data Structures (pdf). (The source code can be downloaded at the link. The book itself is well worth reading.) None of them are binary heaps, per se, but the "leftist" heap is very similar. It has O(log n) insertion, removal, and merge operations. There are also more complicated data structures like skew heaps, binomial heaps, and splay heaps which have better performance.

Luedtke answered 31/5, 2009 at 21:53 Comment(1)
The link seems brokenGnosticism
C
12

Jon Fairbairn posted a functional heapsort to the Haskell Cafe mailing list back in 1997:

http://www.mail-archive.com/[email protected]/msg01788.html

I reproduce it below, reformatted to fit this space. I've also slightly simplified the code of merge_heap.

I'm surprised treefold isn't in the standard prelude since it's so useful. Translated from the version I wrote in Ponder in October 1992 -- Jon Fairbairn

module Treefold where

-- treefold (*) z [a,b,c,d,e,f] = (((a*b)*(c*d))*(e*f))
treefold f zero [] = zero
treefold f zero [x] = x
treefold f zero (a:b:l) = treefold f zero (f a b : pairfold l)
    where 
        pairfold (x:y:rest) = f x y : pairfold rest
        pairfold l = l -- here l will have fewer than 2 elements


module Heapsort where
import Treefold

data Heap a = Nil | Node a [Heap a]
heapify x = Node x []

heapsort :: Ord a => [a] -> [a]    
heapsort = flatten_heap . merge_heaps . map heapify    
    where 
        merge_heaps :: Ord a => [Heap a] -> Heap a
        merge_heaps = treefold merge_heap Nil

        flatten_heap Nil = []
        flatten_heap (Node x heaps) = x:flatten_heap (merge_heaps heaps)

        merge_heap heap Nil = heap
        merge_heap node_a@(Node a heaps_a) node_b@(Node b heaps_b)
            | a < b = Node a (node_b: heaps_a)
            | otherwise = Node b (node_a: heaps_b)
Chappie answered 2/2, 2010 at 19:1 Comment(1)
I don't see how this is the regular heapsort, although it uses heaps. It's still nice, though.Callosity
B
11

You could also use the ST monad, which allows you to write imperative code but expose a purely functional interface safely.

Boogiewoogie answered 31/5, 2009 at 20:20 Comment(1)
Yes, the ST monad gives you the necessary "mutable state". It provides individual mutable variables (a bit like ref in F#) as well as mutable arrays, which are of particular interest for sorting in place.Knobloch
D
8

As an exercise in Haskell, I implemented an imperative heapsort with the ST Monad.

{-# LANGUAGE ScopedTypeVariables #-}

import Control.Monad (forM, forM_)
import Control.Monad.ST (ST, runST)
import Data.Array.MArray (newListArray, readArray, writeArray)
import Data.Array.ST (STArray)
import Data.STRef (newSTRef, readSTRef, writeSTRef)

heapSort :: forall a. Ord a => [a] -> [a]
heapSort list = runST $ do
  let n = length list
  heap <- newListArray (1, n) list :: ST s (STArray s Int a)
  heapSizeRef <- newSTRef n
  let
    heapifyDown pos = do
      val <- readArray heap pos
      heapSize <- readSTRef heapSizeRef
      let children = filter (<= heapSize) [pos*2, pos*2+1]      
      childrenVals <- forM children $ \i -> do
        childVal <- readArray heap i
        return (childVal, i)
      let (minChildVal, minChildIdx) = minimum childrenVals
      if null children || val < minChildVal
        then return ()
        else do
          writeArray heap pos minChildVal
          writeArray heap minChildIdx val
          heapifyDown minChildIdx
    lastParent = n `div` 2
  forM_ [lastParent,lastParent-1..1] heapifyDown
  forM [n,n-1..1] $ \i -> do
    top <- readArray heap 1
    val <- readArray heap i
    writeArray heap 1 val
    writeSTRef heapSizeRef (i-1)
    heapifyDown 1
    return top

btw I contest that if it's not purely functional then there is no point in doing so in Haskell. I think my toy implementation is much nicer than what one would achieve in C++ with templates, passing around stuff to the inner functions.

Dustindustman answered 5/6, 2009 at 23:40 Comment(0)
H
3

And here is a Fibonacci Heap in Haskell:

https://github.com/liuxinyu95/AlgoXY/blob/algoxy/datastruct/heap/other-heaps/src/FibonacciHeap.hs

Here are the pdf file for some other k-ary heaps based on Okasaki's work.

https://github.com/downloads/liuxinyu95/AlgoXY/kheap-en.pdf

Hallowell answered 17/8, 2012 at 8:56 Comment(0)
E
2

Just like in efficient Quicksort algorithms written in Haskell, you need to use monads (state transformers) to do stuff in-place.

Elegance answered 31/5, 2009 at 20:20 Comment(5)
-1: Nobody has ever managed to write an efficient quicksort in Haskell.Sherbet
This one is pretty efficient, gist.github.com/dmjio/3478bd19737e2011b750 @JonHarropBrelje
@JonHarrop, yes. It can run in ST (pure strict state thread w/o sacrificing referential integrity) or I/O (can use concurrency, therefore subject to race conditions like anything else). I could've made it faster if I added unsafe reads/writes.Brelje
@TheInternet: Ok. Parallelism should make it quite a bit faster but Haskell's safe approaches to parallelism will make it much slower.Sherbet
With linear types being added to the GHC, that may no longer be the case. It'll allow something similar to Rust's split_at_mut. Of course, this is what Clean had 20 years ago.Rolanda
N
2

Arrays in Haskell aren't as hugely inefficient as you might think, but typical practice in Haskell would probably be to implement this using ordinary data types, like this:

data Heap a = Empty | Heap a (Heap a) (Heap a)
fromList :: Ord a => [a] -> Heap a
toSortedList :: Ord a => Heap a -> [a]
heapSort = toSortedList . fromList

If I were solving this problem, I might start by stuffing the list elements into an array, making it easier to index them for heap creation.

import Data.Array
fromList xs = heapify 0 where
  size = length xs
  elems = listArray (0, size - 1) xs :: Array Int a
  heapify n = ...

If you're using a binary max heap, you might want to keep track of the size of the heap as you remove elements so you can find the bottom right element in O(log N) time. You could also take a look at other types of heaps that aren't typically implemented using arrays, like binomial heaps and fibonacci heaps.

A final note on array performance: in Haskell there's a tradeoff between using static arrays and using mutable arrays. With static arrays, you have to create new copies of the arrays when you change the elements. With mutable arrays, the garbage collector has a hard time keeping different generations of objects separated. Try implementing the heapsort using an STArray and see how you like it.

Nikolenikoletta answered 31/5, 2009 at 20:50 Comment(1)
Note that the O(n) cost of each array write was a bug fixed in GHC 6.12.Sherbet
B
2

I tried to port standard binary heap into functional settings. There is an article with described idea: A Functional Approach to Standard Binary Heaps. All the source code listings in the article are in Scala. But it might be ported very easy into any other functional language.

Bergstein answered 15/1, 2014 at 17:14 Comment(0)
G
1

Here is a page containing an ML version of HeapSort. It's quite detailed and should provide a good starting point.

http://flint.cs.yale.edu/cs428/coq/doc/Reference-Manual021.html

Grizelda answered 31/5, 2009 at 19:59 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.