How to reduce code duplication when dealing with recursive sum types
Asked Answered
U

2

50

I am currently working on a simple interpreter for a programming language and I have a data type like this:

data Expr
  = Variable String
  | Number Int
  | Add [Expr]
  | Sub Expr Expr

And I have many functions that do simple things like:

-- Substitute a value for a variable
substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue = go
  where
    go (Variable x)
      | x == name = Number newValue
    go (Add xs) =
      Add $ map go xs
    go (Sub x y) =
      Sub (go x) (go y)
    go other = other

-- Replace subtraction with a constant with addition by a negative number
replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd = go
  where
    go (Sub x (Number y)) =
      Add [go x, Number (-y)]
    go (Add xs) =
      Add $ map go xs
    go (Sub x y) =
      Sub (go x) (go y)
    go other = other

But in each of these functions, I have to repeat the part that calls the code recursively with just a small change to one part of the function. Is there any existing way to do this more generically? I would rather not have to copy and paste this part:

    go (Add xs) =
      Add $ map go xs
    go (Sub x y) =
      Sub (go x) (go y)
    go other = other

And just change a single case each time because it seems inefficient to duplicate code like this.

The only solution I could come up with is to have a function that calls a function first on the whole data structure and then recursively on the result like this:

recurseAfter :: (Expr -> Expr) -> Expr -> Expr
recurseAfter f x =
  case f x of
    Add xs ->
      Add $ map (recurseAfter f) xs
    Sub x y ->
      Sub (recurseAfter f x) (recurseAfter f y)
    other -> other

substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue =
  recurseAfter $ \case
    Variable x
      | x == name -> Number newValue
    other -> other

replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd =
  recurseAfter $ \case
    Sub x (Number y) ->
      Add [x, Number (-y)]
    other -> other

But I feel like there should probably be a simpler way to do this already. Am I missing something?

Untwine answered 17/10, 2019 at 18:57 Comment(9)
Make a "lifted" version of the code. Where you use parameters (functions) that decide what to do. Then you can make specific function by passing functions to the lifted version.Anthropomorphous
I think your language could be simplified. Define Add :: Expr -> Expr -> Expr instead of Add :: [Expr] -> Expr, and get rid of Sub altogether.Sporades
I'm just using this definition as a simplified version; while that would work in this case, I need to be able to contain lists of expressions for other parts of the language as wellUntwine
Such as? Most, if not all, chained operators can be reduced to nested binary operators.Sporades
I need to pattern match on a list of expressions: pastebin.com/WDZaMEFq (also instead of variables by name I'm using de Bruijn indices)Untwine
I think your recurseAfter is ana in disguise. You might want to look at anamorphisms and recursion-schemes. That being said, I think your final solution is as short as it can be. Switching to the official recursion-schemes anamorphisms won't save much.Utilitarianism
that's recursion-schemes, FYI.Labonte
Hi @Scott, can you share git link to your repository, I need to share an example of code clone or duplication in my research along with the reference!Eisk
I ended up going with a different approach in the end. I never finished the project, but the code is on GitHub. Here's a link to the relevant typeclass: github.com/scott2000/boat/blob/…Untwine
U
41

Congratulations, you just rediscovered anamorphisms!

Here's your code, rephrased so that it works with the recursion-schemes package. Alas, it's not shorter, since we need some boilerplate to make the machinery work. (There might be some automagic way to avoid the boilerplate, e.g. using generics. I simply do not know.)

Below, your recurseAfter is replaced with the standard ana.

We first define your recursive type, as well as the functor it is the fixed point of.

{-# LANGUAGE DeriveFunctor, TypeFamilies, LambdaCase #-}
{-# OPTIONS -Wall #-}
module AnaExpr where

import Data.Functor.Foldable

data Expr
  = Variable String
  | Number Int
  | Add [Expr]
  | Sub Expr Expr
  deriving (Show)

data ExprF a
  = VariableF String
  | NumberF Int
  | AddF [a]
  | SubF a a
  deriving (Functor)

Then we connect the two with a few instances so that we can unfold Expr into the isomorphic ExprF Expr, and fold it back.

type instance Base Expr = ExprF
instance Recursive Expr where
   project (Variable s) = VariableF s
   project (Number i) = NumberF i
   project (Add es) = AddF es
   project (Sub e1 e2) = SubF e1 e2
instance Corecursive Expr where
   embed (VariableF s) = Variable s
   embed (NumberF i) = Number i
   embed (AddF es) = Add es
   embed (SubF e1 e2) = Sub e1 e2

Finally, we adapt your original code, and add a couple of tests.

substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue = ana $ \case
    Variable x | x == name -> NumberF newValue
    other                  -> project other

testSub :: Expr
testSub = substituteName "x" 42 (Add [Add [Variable "x"], Number 0])

replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd = ana $ \case
    Sub x (Number y) -> AddF [x, Number (-y)]
    other            -> project other

testReplace :: Expr
testReplace = replaceSubWithAdd 
   (Add [Sub (Add [Variable "x", Sub (Variable "y") (Number 34)]) (Number 10), Number 4])

An alternative could be to define ExprF a only, and then derive type Expr = Fix ExprF. This saves some of the boilerplate above (e.g. the two instances), at the cost of having to use Fix (VariableF ...) instead of Variable ..., as well as the analogous for the other constructors.

One could further alleviate that using pattern synonyms (at the cost of a little more boilerplate, though).


Update: I finally found the automagic tool, using template Haskell. This makes the whole code reasonably short. Note that the ExprF functor and the two instances above still exist under the hood, and we still have to use them. We only save the hassle of having to define them manually, but that alone saves a lot of effort.

{-# LANGUAGE DeriveFunctor, DeriveTraversable, TypeFamilies, LambdaCase, TemplateHaskell #-}
{-# OPTIONS -Wall #-}
module AnaExpr where

import Data.Functor.Foldable
import Data.Functor.Foldable.TH

data Expr
  = Variable String
  | Number Int
  | Add [Expr]
  | Sub Expr Expr
  deriving (Show)

makeBaseFunctor ''Expr

substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue = ana $ \case
    Variable x | x == name -> NumberF newValue
    other                  -> project other

testSub :: Expr
testSub = substituteName "x" 42 (Add [Add [Variable "x"], Number 0])

replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd = ana $ \case
    Sub x (Number y) -> AddF [x, Number (-y)]
    other            -> project other

testReplace :: Expr
testReplace = replaceSubWithAdd 
   (Add [Sub (Add [Variable "x", Sub (Variable "y") (Number 34)]) (Number 10), Number 4])
Utilitarianism answered 17/10, 2019 at 20:42 Comment(2)
Do you really have to define Expr explicitly, rather than something like type Expr = Fix ExprF?Sporades
@Sporades I briefly mentioned that as an alternative. It's a bit inconvenient to have to use double constructors for everything: Fix + the real constructor. Using the last approach with TH automation is nicer, IMO.Utilitarianism
A
20

As an alternative approach, this is also a typical use case for the uniplate package. It can use Data.Data generics rather than Template Haskell to generate the boilerplate, so if you derive Data instances for your Expr:

import Data.Data

data Expr
  = Variable String
  | Number Int
  | Add [Expr]
  | Sub Expr Expr
  deriving (Show, Data)

then the transform function from Data.Generics.Uniplate.Data applies a function recursively to each nested Expr:

import Data.Generics.Uniplate.Data

substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue = transform f
  where f (Variable x) | x == name = Number newValue
        f other = other

replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd = transform f
  where f (Sub x (Number y)) = Add [x, Number (-y)]
        f other = other

Note that in replaceSubWithAdd in particular, the function f is written to perform a non-recursive substitution; transform makes it recursive in x :: Expr, so it's doing the same magic to the helper function as ana does in @chi's answer:

> substituteName "x" 42 (Add [Add [Variable "x"], Number 0])
Add [Add [Number 42],Number 0]
> replaceSubWithAdd (Add [Sub (Add [Variable "x", 
                     Sub (Variable "y") (Number 34)]) (Number 10), Number 4])
Add [Add [Add [Variable "x",Add [Variable "y",Number (-34)]],Number (-10)],Number 4]
> 

This is no shorter than @chi's Template Haskell solution. One potential advantage is that uniplate provides some additional functions that may be helpful. For example, if you use descend in place of transform, it transforms only the immediate children which can give you control over where the recursion happens, or you can use rewrite to re-transform the result of transformations until you reach a fixed point. One potential disadvantage is that "anamorphism" sounds way cooler than "uniplate".

Full program:

{-# LANGUAGE DeriveDataTypeable #-}

import Data.Data                     -- in base
import Data.Generics.Uniplate.Data   -- package uniplate

data Expr
  = Variable String
  | Number Int
  | Add [Expr]
  | Sub Expr Expr
  deriving (Show, Data)

substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue = transform f
  where f (Variable x) | x == name = Number newValue
        f other = other

replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd = transform f
  where f (Sub x (Number y)) = Add [x, Number (-y)]
        f other = other

replaceSubWithAdd1 :: Expr -> Expr
replaceSubWithAdd1 = descend f
  where f (Sub x (Number y)) = Add [x, Number (-y)]
        f other = other

main = do
  print $ substituteName "x" 42 (Add [Add [Variable "x"], Number 0])
  print $ replaceSubWithAdd e
  print $ replaceSubWithAdd1 e
  where e = Add [Sub (Add [Variable "x", Sub (Variable "y") (Number 34)])
                     (Number 10), Number 4]
Accentuate answered 17/10, 2019 at 22:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.