Deforestation in a Hylomorphism
Asked Answered
A

1

7

Wikipedia writes about Hylomorphism:

In [...] functional programming a hylomorphism is a recursive function, corresponding to the composition of an anamorphism (which first builds a set of results; also known as 'unfolding') followed by a catamorphism (which then folds these results into a final return value). Fusion of these two recursive computations into a single recursive pattern then avoids building the intermediate data structure. This is an example of deforestation, a program optimization strategy.

(Bold markup by me)

Using the recursion-schemes library I wrote a very simple hylomorphism:

import Data.Functor.Foldable
main :: IO ()
main = putStrLn $ show $ hylosum 1000

hylosum :: Int -> Int
hylosum end = hylo alg coalg 1
  where 
    -- Create list of Int's from 1 to n
    coalg :: Int -> ListF Int Int
    coalg n 
       | n > end = Nil
       | otherwise = Cons n (n + 1)
    -- Sum up a list of Int's
    alg :: ListF Int Int -> Int
    alg Nil  =  0
    alg (Cons a x) = a + x

In the cabal file I instructed GHC to optimize the code:

name:                Hylo
version:             0.1.0.0
synopsis:            Hylomorphisms and Deforestation        
build-type:          Simple
cabal-version:       >=1.10

executable             Hylo
  main-is:             Main.hs
  ghc-options:         -O2
  build-depends:       base >=4.10 && <4.11 , recursion-schemes      
  default-language:    Haskell2010

Using stackage lts-10.0 (GHC 8.2.2) I compile with stack build and run with stack exec Hylo -- +RTS -s and I get:

500500
      84,016 bytes allocated in the heap
       3,408 bytes copied during GC
      44,504 bytes maximum residency (1 sample(s))
      25,128 bytes maximum slop
           2 MB total memory in use (0 MB lost due to fragmentation)

Now I change hylosum 1000 to hylosum 1000000 (1000 times more) and I get:

500000500000
  16,664,864 bytes allocated in the heap
      16,928 bytes copied during GC
  15,756,232 bytes maximum residency (4 sample(s))
      29,224 bytes maximum slop
          18 MB total memory in use (0 MB lost due to fragmentation)

So heap usage goes up from 84 KB to 16,664 KB. This is 200 times more than before. Therefore I think, GHC does not do the deforestation / fusion mentioned in wikipedia!

This is not really a surprise: The anamorphism creates the list items from left to right (from 1 to n) and the catamorphism consumes the items the opposite way from right to left (from n to 1) and it's difficult to see how the hylomorphism can work without creating the whole intermediate list.

Questions: Is GHC able to perform the deforestation? If yes, what do I have to change in my code or cabal file? If yes, how does it really work? if no, where is the issue: in wikipedia, GHC or in the library?

Abortionist answered 6/3, 2018 at 16:12 Comment(3)
Deforestation ~ fusion. See https://mcmap.net/q/363802/-what-is-fusion-in-haskell/…Centripetal
TL;DR this doesn't really happen all that automatically: sometimes GHC's inlining is enough, but usually you'll need some rewrite rules.Centripetal
@Alec: So every user of the hylo function must write these rewrite rules himself? How are these rewrite rules for my case?Abortionist
D
14

The data structure is actually fused away, but the resulting program is not tail recursive. The optimized code basically looks like this (with no Cons or Nil in sight):

h n | n > end = 0
    | otherwise = n + h (n+1)

To evaluate the result you must first evaluate h (n+1) recursively, and then add the result to n. During the recursive call, the value n must remain stored somewhere, hence we observe increased memory usage as end increases.

A tighter loop can be obtained by having the recursive call be in tail position, and carry a constant size accumulator. We want the code to optimize to this instead:

-- with BangPatterns
h n !acc | n > end = acc
         | otherwise = h (n+1) (n + acc)

In hylosum, the call to (+) happens in alg, and we replace that with a call to a continuation that will be built up by hylo.

alg :: ListF Int (Int -> Int) -> Int -> Int
alg Nil acc = acc
alg (Cons n go) !acc = go (n + acc)

With that I see a constant 51kB allocated in the heap.

Dipole answered 6/3, 2018 at 16:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.