Monoidal folds on fixed points
Asked Answered
W

1

5

Given an arbitrary datastructure with a fixed point, can we construct an monoidal algebra without manually specifying all cases?

Assume we are given the datatype Expr as below. Using the recursion-schemes library, we can derive a base functor ExprF, which automatically also has Functor, Foldable and Traversable instances.

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TemplateHaskell #-}

import Data.Semigroup (Sum(..))
import Data.Functor.Foldable
import Data.Functor.Foldable.TH

import Prelude hiding (fail)

data Expr = Var String
          | Const Int
          | Add [Expr]
          | Mult [Expr]
          deriving Show

$(makeBaseFunctor ''Expr)

expr :: Fix ExprF
expr = ana project $ Add [Const 1, Const 2, Mult [Const 3, Const 4], Var "hello"]

Now, let's say we want to count the number of leaves in expr. We can easily write an algebra for such a small datastructure:

alg (ConstF _) = 1
alg (VarF   _) = 1
alg (AddF  xs) = sum xs
alg (MulF  xs) = sum xs

Now, we can call cata alg expr, which returns 5, the correct result.

Let's assume Expr grows really big and complex and we don't want to manually write cases for all data constructors. How does cata know how to combine the results from all cases? I suspect that this is possible using Monoids, possibly in conjunction with the Const functor (not entirely sure about that last part though).

fail = getSum $ foldMap (const (Sum 1) . unfix) $ unfix expr

fail returns 4, whereas we actually have 5 leaves. I assume that the problem lies in the fixed point, because we are only able to peel of one layer of Fixing and therefore the Mult [..] is only counted as one leaf.

Is it possible to somehow generically fold the entire fixed point and collecting the results in a Monoid-like structure without manually specifying all instances? What I want is kind of foldMap but in a more generic way.

I have a feeling I am missing something really obvious.

Warila answered 7/6, 2017 at 0:44 Comment(3)
fail counts the children of the top node of expr. In general, it gets nowhere near the leaves. It's not the alg of a suitable cata.Toting
@Toting I had the same suspicions. Now all I'm looking for is an accumulating catamorphism which I don't have to write around 60 cases (one per constructor) for but rather use monoid properties on recursive cases. I feel like that should be possible but I have no idea how to go about this problem.Warila
What characterises a leaf is that the sum of the leaf counts of its children is zero, because there are no children. Now, given the (sum of the) leaf counts of the children, can you compute the leaf count of the parent?Toting
T
9

Here's the essence of a solution. I've switched on

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable, PatternSynonyms #-}

Let's just recap fixpoints and catamorphisms.

newtype Fix f = In {out :: f (Fix f)}

cata :: Functor f => (f t -> t) -> Fix f -> t
cata alg = alg . fmap (cata alg) . out

The algebra, alg :: f t -> t, takes a node where the children have already been replaced by a t value, then returns the t for the parent. The cata operator works by unpacking the parent node, processing all its children recursively, then applying alg to finish the job.

So, if we want to count leaves in such a structure, we can start like this:

leaves :: (Foldable f, Functor f) => Fix f -> Integer
leaves = cata sumOrOne where
  -- sumOrOne :: f Integer -> Integer

The algebra, sumOrOne can see the number of leaves in each child of the parent node. We can use cata because f is a Functor. And because f is Foldable, we can compute the total number of leaves in the children.

  sumOrOne fl = case sum fl of
    ...

There are then two possibilities: if the parent has no children, its leaf sum will be 0, which we can detect, but that means the parent is itself a leaf, so 1 should be returned. Otherwise, the leaf sum will be nonzero, in which case the parent is not a leaf, so its leaf sum is indeed the total leaf sum of its children. That gives us

leaves :: (Foldable f, Functor f) => Fix f -> Integer
leaves = cata sumOrOne where
  sumOrOne fl{- number of leaves in each child-} = case sum fl of
    0 -> 1  -- no leaves in my children means I am a leaf
    l -> l  -- otherwise, pass on the total

A quick example, based on Hutton's Razor (the expression language with integers and addition, which is often the simplest thing that illustrates the point). The expressions are generated from Hutton's functor.

data HF h = Val Int | h :+: h deriving (Functor, Foldable, Traversable)

I introduce some pattern synonyms to recover the look and feel of a bespoke type.

pattern V x    = In (Val x)
pattern s :+ t = In (s :+: t)

I cook up a quick example expression, with some leaves that are three levels deep.

example :: Fix HF
example = (V 1 :+ V 2) :+ ((V 3 :+ V 4) :+ V 5)

Sure enough

Ok, modules loaded: Leaves.
*Leaves> leaves example
5

An alternative approach is to be functorial and foldable in substructures of interest, in this case, stuff at leaves. (We get exactly the free monads.)

data Tree f x = Leaf x | Node (f (Tree f x)) deriving (Functor, Foldable)

Once you've made the leaf/node separation part of your basic construction, you can visit the leaves directly with foldMap. Throwing in a bit of Control.Newtype, we get

ala' Sum foldMap (const 1)  :: Foldable f => f x -> Integer

which is below the Fairbairn Threshold (i.e., short enough not to need a name and all the clearer for not having one).

The trouble, of course, is that data structures are often functorial in "substructures of interest" in multiple interesting but conflicting ways. Haskell isn't always the best at letting us access "found functoriality": we somehow have to predict the functoriality we need when we parametrise a data type at declaration time. But there's still time to change all that...

Toting answered 7/6, 2017 at 9:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.