Is it possible to make GHC optimize (deforest) generic functions such as catamorphisms?
Asked Answered
P

1

23

I really like the idea of working with catamorphisms/anamorphisms in a generic way, but it seems to me it has a significant performance drawback:

Suppose we want to work with a tree structure in the categorical way - to describe different folding using a generic catamorphism function:

newtype Fix f = Fix { unfix :: f (Fix f) }

data TreeT r = Leaf | Tree r r
instance Functor TreeT where
    fmap f Leaf         = Leaf
    fmap f (Tree l r)   = Tree (f l) (f r)

type Tree = Fix TreeT

catam :: (Functor f) => (f a -> a) -> (Fix f -> a)
catam f = f . fmap (catam f) . unfix

Now we can write functions like:

depth1 :: Tree -> Int
depth1 = catam g
  where
    g Leaf       = 0
    g (Tree l r) = max l r

Unfortunately, this approach has a significant drawback: During the computation, new instances of TreeT Int are created at every level in fmap just to be immediately consumed by g. Compared to the classical definition

depth2 :: Tree -> Int
depth2 (Fix Leaf) = 0
depth2 (Fix (Tree l r)) = max (depth1 l) (depth1 r)

our depth1 will be always slower making unnecessary strain on the GC. One solution would be to use hylomorphisms and combine creation and folding trees together. But often we don't want to do that, we may want a tree to be created on one place and then passed somewhere else to be folded later. Or, to be folder several times with different catamorphisms.

Is there a way to make GHC optimize depth1? Something like inlining catam g and then fusing/deforesting g . fmap ... inside?

Pasia answered 27/10, 2012 at 10:12 Comment(2)
I'm late to this party, but shouldn't there be a +1 somewhere in the Tree case of g (or depth2) for the function to calculate the depth of the tree? Otherwise, I can't see how depth1 or depth2 can return anything but zero.Brownlee
Also, I think depth1 should actually be depth2 in depth2's definition.Brownlee
P
17

I believe I found an answer. I remembered reading Why does GHC make fix so confounding? and that suggested me a solution.

The problem with the former definition of catam is that it is recursive, and so any attempt to INLINE it is ignored. Compiling the original version with -ddump-simpl -ddump-to-file and reading the core:

Main.depth1 = Main.catam_$scatam @ GHC.Types.Int Main.depth3

Main.depth3 =
  \ (ds_dyI :: Main.TreeT GHC.Types.Int) ->
    case ds_dyI of _ {
      Main.Leaf -> Main.depth4;
      Main.Tree l_aah r_aai -> GHC.Classes.$fOrdInt_$cmax l_aah r_aai
    }

Main.depth4 = GHC.Types.I# 0

Rec {
Main.catam_$scatam =
  \ (@ a_ajB)
    (eta_B1 :: Main.TreeT a_ajB -> a_ajB)
    (eta1_X2 :: Main.Fix Main.TreeT) ->
    eta_B1
      (case eta1_X2
            `cast` (Main.NTCo:Fix <Main.TreeT>
                    :: Main.Fix Main.TreeT ~# Main.TreeT (Main.Fix Main.TreeT))
       of _ {
         Main.Leaf -> Main.Leaf @ a_ajB;
         Main.Tree l_aan r_aao ->
           Main.Tree
             @ a_ajB
             (Main.catam_$scatam @ a_ajB eta_B1 l_aan)
             (Main.catam_$scatam @ a_ajB eta_B1 r_aao)
       })
end Rec }

is clearly worse (constructor creation/elimination in catam_$scatam, more function calls) compared to

Main.depth2 =
  \ (w_s1Rz :: Main.Tree) ->
    case Main.$wdepth2 w_s1Rz of ww_s1RC { __DEFAULT ->
    GHC.Types.I# ww_s1RC
    }

Rec {
Main.$wdepth2 [Occ=LoopBreaker] :: Main.Tree -> GHC.Prim.Int#
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType S]
Main.$wdepth2 =
  \ (w_s1Rz :: Main.Tree) ->
    case w_s1Rz
         `cast` (Main.NTCo:Fix <Main.TreeT>
                 :: Main.Fix Main.TreeT ~# Main.TreeT (Main.Fix Main.TreeT))
    of _ {
      Main.Leaf -> 0;
      Main.Tree l_aaj r_aak ->
        case Main.$wdepth2 l_aaj of ww_s1RC { __DEFAULT ->
        case Main.$wdepth2 r_aak of ww1_X1Sh { __DEFAULT ->
        case GHC.Prim.<=# ww_s1RC ww1_X1Sh of _ {
          GHC.Types.False -> ww_s1RC;
          GHC.Types.True -> ww1_X1Sh
        }
        }
        }
    }
end Rec }

But if we define catam as

{-# INLINE catam #-}
catam :: (Functor f) => (f a -> a) -> (Fix f -> a)
catam f = let u = f . fmap u . unfix
          in u

then it is no longer recursive, only u inside is. This way GHC inlines catam in the definition of depth1 and fuses fmap with depth1's g - just what we want:

Main.depth1 =
  \ (w_s1RJ :: Main.Tree) ->
    case Main.$wdepth1 w_s1RJ of ww_s1RM { __DEFAULT ->
    GHC.Types.I# ww_s1RM
    }

Rec {
Main.$wdepth1 [Occ=LoopBreaker] :: Main.Tree -> GHC.Prim.Int#
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType S]
Main.$wdepth1 =
  \ (w_s1RJ :: Main.Tree) ->
    case w_s1RJ
         `cast` (Main.NTCo:Fix <Main.TreeT>
                 :: Main.Fix Main.TreeT ~# Main.TreeT (Main.Fix Main.TreeT))
    of _ {
      Main.Leaf -> 0;
      Main.Tree l_aar r_aas ->
        case Main.$wdepth1 l_aar of ww_s1RM { __DEFAULT ->
        case Main.$wdepth1 r_aas of ww1_X1So { __DEFAULT ->
        case GHC.Prim.<=# ww_s1RM ww1_X1So of _ {
          GHC.Types.False -> ww_s1RM;
          GHC.Types.True -> ww1_X1So
        }
        }
        }
    }
end Rec }

which is now just the same as the dump of depth2.

Pasia answered 28/10, 2012 at 5:27 Comment(1)
It seems that any recursive function can be transformed into a non-recursive function by moving its body to a local binding as in catam above. This looks like a simple step which facilitates optimization. I wonder why GHC does not do it automatically.Mycobacterium

© 2022 - 2024 — McMap. All rights reserved.