Proper tagged AST

Asked 3/7, 2014 at 17:3 Answered 5/7, 2014 at 15:17

haskell monads abstract-syntax-tree gadt name-binding

I’ve been trying to build tagged AST for a while now. Let’s introduce the issue:

data E a
  = V a
  | LitInt Int
  | LitBool Bool
  | FooIntBool (E a) (E a) -- er…
    deriving (Eq,Show)

The issue with that piece of code, to me, resides in FooIntBool. The idea is that it takes an int expression and a bool expression, and glues them together. But E a could be anything. This would be valid given the above definition of E:

FooIntBool (LitInt 3) (LitInt 0)

You can see the issue. Then, what do we want? A tagged expression. This is not possible given the current definition of E, so let’s introduce some GADTs:

data E :: * -> * -> * where
  V          :: a -> E l a
  LitInt     :: Int -> E Int a
  LitBool    :: Bool -> E Bool a
  FooIntBool :: E Int a -> E Bool a -> E (Int,Bool) a

This is quite nice! I can now right that kind of code:

FooIntBool (V "x") (LitBool False)

The issue with that is when I want to make it a monad, or an applicative. It’s just impossible. Consider the monad implementation:

instance Monad (E l) where
  return = V

That was obvious and straight-forward. Let’s see the bind implementation:

  V x >>= f = f x -- obvious as well
  LitInt a >>= _ = LitInt a -- obvious yeah
  LitBool a >>= _ = LitBool a -- …
  FooIntBool a b >>= f = FooIntBool (a >>= ?) (b >>= ?) -- AH?

We can’t write a >>= f and b >>= f since f :: a -> E l b. I haven’t found a solution to that issue yet, and I’m really curious how to deal with that and still being able to use de Bruijn indices (through bound).

Elaterite answered 3/7, 2014 at 17:3 Comment(11)

It is not clear what your AST means, so it is also not clear what the Monad instance should be. I'd write an evaluator first, to get the semantics right, and then try to write the Monad instance. – Equine 3/7, 2014 at 17:11

It constructs a new value of a new type by gluing an Int and a Bool. – Elaterite 3/7, 2014 at 17:34

What I don't understand is the meaning of the second parameter of your GADT: for V, it seems to be the variable name type, but this type is lost in the definition of FooIntBool in the specific case of your example. – Kassandra 3/7, 2014 at 17:43

Indeed, sorry. Fixed. @didierc, it’s the variable name, yes. I tried to use something like FooIntBool :: Int -> Bool -> E (Int,Bool) a, but of course it doesn’t work if I want to fetch the Int or Bool from a V. – Elaterite 3/7, 2014 at 18:0

Well, yes there was this problem too. I am still confused by the meaning of each parameter of the gadt. – Kassandra 3/7, 2014 at 18:13

Yeah, it’s now a clear I have a design issue. – Elaterite 3/7, 2014 at 18:21

Using higher order tagged bound is a bit tough. I know Edward has done it a few times, but it might be worth using simpler methods than bound until you nail down some other design criteria. – Othaothe 3/7, 2014 at 18:27

@phaazon, it's not clear at all, you might want to implement a simple language which only acount for Int, Bool, and pairs of these, and it would be perfectly fine for some applications. Sorry if I sounded harsh, I merely wanted to know what is your intent. Besides, @JAbrahamson is right when he speaks about simpler methods: do it step by step. – Kassandra 3/7, 2014 at 22:14

Why do you suddenly grow an extra type parameter when moving from ADTs to GADTs? The a in E x a seems to indicate that all V subterms are labeled with a value of type a, which strikes me as a strange invariant to care about. I would expect just one type argument, and for the a in E a to indicate that you can evaluate the expression to produce a value of type a. – Learnt 3/7, 2014 at 22:51

@DanielWagner, thanks! I was wondering what it meant, and your comment made me realise that it allows E to expose the type of index used for variables. Quite interesting! – Kassandra 4/7, 2014 at 6:11

@DanielWagner, yeah, a is a name. – Elaterite 4/7, 2014 at 14:8

I think your typed AST is unlikely to work the way you want. The fact that variables are untyped is going to hurt. Try to imagine what it would be like to write an interpreter with an environment; you'd have to look up variables in the environment and than either cast the results into the correct types or fail with an error. So I'm going to suggest a slightly different AST with typed variables, and an as yet unexplained reordering of the type parameters.

data E v a where
  V          :: v a -> E v a
  LitInt     :: Int  -> E v Int
  LitBool    :: Bool -> E v Bool
  FooIntBool :: E v Int -> E v Bool -> E v (Int, Bool)

Now, as far as I know it is not possible to define a law-abiding Monad instance for this. Note that the kind of E is (* -> *) -> * -> *; it may be more intuitive for our purposes to think of it as (* -> *) -> (* -> *). This is superficially compatible with what Monad expects, * -> *, at least if you partially apply E to some v, but then the types get weird. I believe you are already aware of this, which is why you had put your variable type parameter last; the intended effect of this is that (>>=) will represent substitution. However, if we did that with this new type I've proposed, it's not compatible with Monad at all.

There is a different kind of monad that can work, though. We can generalize its kind from * -> * to (k -> *) -> (k -> *) (where k in this case is just *). Note again that I've used parentheses to emphasize that, just like most instances of Monad, E is to be treated as a unary type constructor. We'll be working with natural transformations instead of just any old Haskell function:

type a ~> b = forall x. a x -> b x

(By the way, the kind of (~>) is (k -> *) -> (k -> *) -> *.)

To construct our new HMonad type class, we can just copy Monad and replace (->) with (~>). There is one complication, which is that we have to flip the argument ordering of (>>=), to make the types work out:

class HMonad m where
  hreturn ::  a ~> m a
  hbind   :: (a ~> m b) -> (m a ~> m b)

I'll just show you the HMonad instance for E and then attempt to explain it:

instance HMonad E where
  hreturn = V
  hbind f e = case e of
    V v            -> f v
    LitInt x       -> LitInt x
    LitBool x      -> LitBool x
    FooIntBool a b -> FooIntBool (hbind f a) (hbind f b)

Actually, this looks exactly the way a Monad instance would for an untyped version of the AST. Note that, as expected, hreturn just injects a variable, and hbind performs a kind of type-safe substitution by seeking out variables and applying the function to them. This works due to the higher rank types.

You can't quite do this with the bound package, since it uses Monad instead of this fancier HMonad. It is possible (and has even been done on multiple occasions) to write a version of bound that works for typed ASTs like this, but it's not clear whether it's actually worth it.

Armillda answered 5/7, 2014 at 15:17 Comment(0)

If you really want, it is possible to write a well typed Monad instance. I haven't checked if it follows the monad laws.

instance Monad (E l) where 
  return = V 

  V x >>= f = f x 
  LitInt a >>= _ = LitInt a 
  LitBool a >>= _ = LitBool a
  FooIntBool a b >>= f = FooIntBool (a >>= q.f) (b >>= r.f) where 

    q :: E (Int, Bool) t -> E Int t
    q (V x) = V x 
    q (FooIntBool x _) = x

    r :: E (Int, Bool) t -> E Bool t
    r (V x) = V x 
    r (FooIntBool _ x) = x

Monasticism answered 3/7, 2014 at 20:57 Comment(2)

The Applicative is determined by the monad, though. – Othaothe 3/7, 2014 at 21:33

@J.Abrahamson I completely forgot about that. – Monasticism 4/7, 2014 at 4:17

Recommended topics

Hot tags