Boilerplate-free annotation of ASTs in Haskell?
Asked Answered
E

4

28

I've been fiddling around with the Elm compiler, which is written in Haskell.

I'd like to start implementing some optimizations for it, and part of this involves traversing the AST and adding "annotation" to certain nodes, such as tail-calls, etc.

I know I can use SYB or uniplate to do the traversal, but I'm wondering if there's a boilerplate-free way to deal with the types.

So, suppose we have a bunch of algebraic types for our AST:

data Expr = PlusExpr Expr Expr ...

data Def = TypeAlias String [String] Type ...

If I were writing the boilerplate, I'd make new types like this:

data AnnotatedExpr = PlusExpr Expr Expr [Annotation] ...

data AnnotatedDef = TypeAlias String [String] Type [Annotation] ...

This is a lot of boilderplate to write, and it seems like good practice to avoid this.

I could write something like this:

Data AnnotationTree =  Leaf  [Annotation]
     | Internal [AnnotationTree] [Annotation]

Then I'd just have an annotation tree running parallel to the AST. But there is no guarantee that these trees will have the same structure, so we lose type safety.

So I'm wondering, is there an elegant/recommended solution to avoid boilerplate but still annotate a tree in a type-safe way? To replace each node with an equivalent one, plus a list of annotations which will be used in compilation later?

Enforce answered 26/11, 2014 at 19:49 Comment(0)
A
26

If you leave the recursion in your data type open you end up suffering an extra constructor everywhere, but can layer in annotations freely without changing most of your skeletal tree.

data Hutton x    -- non-recursive functor type
  = Int Int | Plus x x
  deriving Functor

newtype Tie f = Tie (f (Tie f))

data Annotate f a = Annotate { annotation :: a, layer :: (f (Annotate f a)) }

type Unannotated = Tie      Hutton
type Annotated a = Annotate Hutton a

This style is much easier when you can write most of your computations as Hutton-algebras since they will compose better.

interp :: Hutton Int -> Int
interp (Int i)    = i
interp (Plus a b) = a + b

runUnannotated :: Functor f => (f x -> x) -> Tie f -> x
runUnannotated phi (Tie f) = phi (fmap (runUnannotated phi) f)    

runAnnotated :: Functor f => (f x -> x) -> Annotate f a -> x
runAnnotated phi (Annotate _ f) = phi (fmap (runAnnotated phi) f)

What's also nice is that if you don't mind letting some amount of binding live in the Haskell level (such as in a middling-deep eDSL) then the Free Hutton monad is great for building ASTs and the Cofree Hutton comonad is essentially what Annotated is.

Here's a way to build annotations from the bottom up.

annotate :: Functor f => (f b -> b) -> Tie f -> Annotate f b
annotate phi = runUnannotated $ \x -> Annotate (phi (fmap annotation x)) x

memoize :: Unannotated -> Annotated Int
memoize = annotate interp

such that with the appropriate Show and Num instances

λ> memoize (2 + (2 + 2))
Annotate 6 (Plus (Annotate 2 (Int 2)) (Annotate 4 (Plus (Annotate 2 (Int 2)) (Annotate 2 (Int 2)))))

And here's how you can strip them

strip :: Annotated a -> Unannotated
strip = runAnnotated Tie

See here for a description of how you might achieve this kind of AST work with mutually recursive ADTs, insured by Gallais' comment below.

Azevedo answered 26/11, 2014 at 21:3 Comment(10)
How does this approach generalize to situations where instead of a single Expr type, we have a bunch of mutually-inductive types, e.g. suppose Expr has a case construct which contains Pats, but some of those can be view patterns which contain an Expr?Brash
You can scale it a tiny bit further with something like data Weave f g = Weave (g (Weave g f) (Weave f g)), gist.github.com/tel/29eb767c7cb331104537. Generally, I think you'd need to start investigating the work in Initial Algebra Semantics is Enough!, but I don't yet understand that.Azevedo
Unfortunately, that approach doesn't seem to work when annotating. The bialgebra f a a -> a is too restrictive to build Weaves from Weave recursors.Azevedo
@Cactus, the traditional way to encode mutually defined indexed families in a type theory with only indexed families is to add an index tagging which family you are defining a constructor for. See e.g. this gist defining trees and forests in one go.Unstrung
Oh, both beautiful and, in retrospect, pretty direct and obvious!Azevedo
@Unstrung What's the recursion principle for that Tie? I added a comment in the gist about it.Azevedo
See gist.github.com/tel/99e666308d270a3d1d8c for a slight generalization of Gallais' technique which includes a recursion principle.Azevedo
I've updated the gist with my take on it. Slightly nicer version in Agda.Unstrung
Btw, I've also commented on your entry @J.Abrahamson. Not sure whether gist.github sends notifications or not so I figured I'd ping you here. :)Unstrung
Matt Pickering has shown how to do this without losing the original constructors.Rosabelle
G
6

This question is very similar to a past one talking about the particular annotation of source location. The solution I find most elegant is to re-define Expr and Def to provide a constructor that contains an annotation:

data Expr = PlusExpr Expr Expr
          | AnnotatedExpr Annotation Expr
          ...
Gloomy answered 26/11, 2014 at 20:30 Comment(0)
O
4

You can also use attribute grammars for annotations. If you need many different annotations, the grammars approach will scale better. There are few AG libraries and preprocessors on Hackage, one is uuagc which is used to build UHC/EHC Haskell compiler.

Osprey answered 27/11, 2014 at 21:25 Comment(0)
B
2

It is also possible to write a Template Haskell macros which converts a datatype into an annotated one.

Blasting answered 27/11, 2014 at 14:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.