SystemT Compiler and dealing with Infinite Types in Haskell
Asked Answered
A

1

3

I'm following this blog post: http://semantic-domain.blogspot.com/2012/12/total-functional-programming-in-partial.html

It shows a small OCaml compiler program for System T (a simple total functional language).

The entire pipeline takes OCaml syntax (via Camlp4 metaprogramming) transforms them to OCaml AST that is translated to SystemT Lambda Calculus (see: module Term) and then finally SystemT Combinator Calculus (see: module Goedel). The final step is also wrapped with OCaml metaprogramming Ast.expr type.

I'm attempting to translate it to Haskell without the use of Template Haskell.

For the SystemT Combinator form, I've written it as

{-# LANGUAGE GADTs #-}

data TNat = Zero | Succ TNat

data THom a b where
  Id :: THom a a
  Unit :: THom a ()
  ZeroH :: THom () TNat
  SuccH :: THom TNat TNat
  Compose :: THom a b -> THom b c -> THom a c
  Pair :: THom a b -> THom a c -> THom a (b, c)
  Fst :: THom (a, b) a
  Snd :: THom (a, b) b
  Curry :: THom (a, b) c -> THom a (b -> c)
  Eval :: THom ((a -> b), a) b -- (A = B) * A -> B
  Iter :: THom a b -> THom (a, b) b -> THom (a, TNat) b

Note that Compose is forward composition, which differs from (.).

During the translation of SystemT Lambda Calculus to SystemT Combinator Calculus, the Elaborate.synth function tries to convert SystemT Lambda calculus variables into series of composed projection expressions (related to De Brujin Indices). This is because combinator calculus doesn't have variables/variable names. This is done with the Elaborate.lookup which uses the Quote.find function.

The problem is that with my encoding of the combinator calculus as the GADT THom a b. Translating the Quote.find function:

  let rec find x  = function
    | [] -> raise Not_found
    | (x', t) :: ctx' when x = x' -> <:expr< Goedel.snd >> 
    | (x', t) :: ctx' -> <:expr< Goedel.compose Goedel.fst $find x ctx'$ >>

Into Haskell:

find :: TVar -> Context -> _
find tvar [] = error "Not Found"
find tvar ((tvar', ttype):ctx)
  | tvar == tvar' = Snd
  | otherwise     = Compose Fst (find tvar ctx)

Results in an infinite type error.

• Occurs check: cannot construct the infinite type: a ~ (a, c)
  Expected type: THom (a, c) c
    Actual type: THom ((a, c), c) c

The problem stems from the fact that using Compose and Fst and Snd from the THom a b GADT can result in infinite variations of the type signature.

The article doesn't have this problem because it appears to use the Ast.expr OCaml thing to wrap the underlying expressions.

I'm not sure how to resolve in Haskell. Should I be using an existentially quantified type like

data TExpr = forall a b. TExpr (THom a b)

Or some sort of type-level Fix to adapt the infinite type problem. However I'm unsure how this changes the find or lookup functions.

Annmarieannnora answered 17/11, 2018 at 8:54 Comment(13)
Actually, the type not match. Consider: tvar == tvar' = Snd indicates that find has type find :: TVar -> Context -> THom (a, b) b, hence (find tvar ctx) has same type. so the type of arguments of Compose Fst (find tvar ctx) as compose (Thom (a, b) a) (Thom (a, b) b), Note that in the definition of compose need: a == (a, b), but it is impossible.Toro
Yes that's precisely the problem. And infinite type error.Annmarieannnora
We need to be able to write a type for find :: TVar -> Context -> ???. The easy but inelegant solution is to avoid GADTs for THom (and introduce partiality everywhere around). Another solution is to try going fully dependent, introducing more indices: something like find :: TVar v -> Context c -> THom (F v c) with some type family F. In Agda/Coq I'm confident we could write this, given enough time. In Haskell, we probably need more "proof" parameters, singleton types, assuming it's doable at all (I'm not entirely confident).Cruciate
But what is the intention of find function?Toro
@Toro Roughly it compiles the access to a variable into a sequence of pair projections. A context is a nested pair similar to a list like (((),x),y) and to access x you want to use something like fst;snd AKA "discard the last variable, take the next". In general, in longer contexts you generate fst;fst;....;fst;snd to discard many variables and take the one you want at the end. The whole point of doing this is so that we expose more information in the type, unlike e.g. using a plain list. This approach however requires a lot of dependent functions.Cruciate
To translate system T lambda variables to projection expressions in system T combinator calculus.Annmarieannnora
According to a note in the linked post, you should disallow infinitely recursive THoms. You should make all the recursive occurrences strict. This neither worsens nor improves the problem you're having, just makes the translation closer. (At least, I think adding strictness solves the problem. The author says that laziness causes partiality, but doesn't offer a solution).Velamen
@Cruciate Partiality is unavoidable for what the original article does, since it’s parsing and typechecking potentially ill-typed expressions. The dependently-typed variant will probably be possible (with increasing effort), in Agda/Idris, in Haskell (with lots of pain), in Coq with Equations (a plugin for GADT-like functionality), or in plain Coq (where it’s suicide, since pure Coq does not support dependent pattern matching conveniently).Ligialignaloes
@Ligialignaloes Are you saying that dependent pattern matching is more convenient in Agda than it is in (plain) Coq? If so, why? (Related to axiom K?) I'm not a real expert, but in my own experience I understood the Coq matching better, so now I'm curious.Cruciate
@Ligialignaloes I don't understand what you're saying. I was just commenting on the fact that the author of the linked post noted that things like let bad = Compose bad SuccH in bad should be disallowed. I think that liberal application of ! annotations should eliminate things like that, such that if fun :: THom a b, then either fun = _|_ or fun is a real, complete, total, 100% functional, no partiality of any kind, anywhere representation of a function from a to b.Velamen
@Velamen Sorry, you already wrote "This neither worsens nor improves the problem you're having", I had missed that. Deleted my earlier comment.Ligialignaloes
@Cruciate Yes. Some define "dependent pattern matching" to mean the analogue of what GHC does for GADTs (tho I expect GADTs came later), so that plain Coq does not have that, and that's why Equations was written. I'm not saying it's trivial to use, but much more convenient once you learn it. One historical reason was axiom K, but after the advent of homotopy type theory, people have done research into supporting dependent pattern matching without necessarily relying on axiom K (with restrictions), or by proving restrictions of that axiom for specific datatypes where it does hold.Ligialignaloes
@Cruciate One annoyance with Agda dependent pattern matching is that its behavior is entirely predictable according to relatively simple rules, but next to nobody explained them (when I learned Agda), so some failures seem puzzling.Ligialignaloes
L
2

This answer will have to be a bit high-level, because there are three entirely different families of possible designs to fix that problem. What you’re doing seems closer to approach three, but the approaches are ordered by increasing complexity.

The approach in the original post

Translating the original post requires Template Haskell and partiality; find would return a Q.Exp representing some Hom a b, avoiding this problem just like the original post. Just like in the original post, a type error in the original code would be caught when typechecking the output of all the Template Haskell functions. So, type errors are still caught before runtime, but you will still need to write tests to find cases where your macros output ill-typed expressions. One can give stronger guarantees, at the cost of a significant increase in complexity.

Dependent typing/GADTs in input and output

If you want to diverge from the post, one alternative is to use “dependent typing” throughout and make the input dependently-typed. I use the term loosely to include both actually dependently-typed languages, actual Dependent Haskell (when it lands), and ways to fake dependent typing in Haskell today (via GADTs, singletons, and what not). However, you lose the ability to write your own typechecker and choose which type system to use; typically, you end up embedding a simply-typed lambda calculus, and can simulate polymorphism via polymorphic Haskell functions that can generate terms at a given type. This is easier in dependently-typed languages, but possible at all in Haskell.

But honestly, in this road it’s easier to use higher-order abstract syntax and Haskell functions, with something like:

data Exp a where
  Abs :: (Exp a -> Exp b) -> Exp (a -> b)
  App :: Exp (a -> b) -> Exp a -> Exp b
  Var :: String -> Exp a —- only use for free variables
exampleId :: Exp (a -> a)
exampleId = Abs (\x -> x)

If you can use this approach (details omitted here), you get high assurance from GADTs with limited complexity. However, this approach is too inflexible for many scenarios, for instance because the typing contexts are only visible to the Haskell compiler and not in your types or terms.

From untyped to typed terms

A third option is go dependently-typed and to still make your program turn weakly-typed input to strongly typed output. In this case, your typechecker overall transforms terms of some type Expr into terms of a GADT TExp gamma t, Hom a b, or such. Since you don’t know statically what gamma and t (or a and b) are, you’ll indeed need some sort of existential.

But a plain existential is too weak: to build bigger well-typed expression, you’ll need to check that the produced types allow it. For instance, to build a TExpr containing a Compose expression out of two smaller TExpr, you'll need to check (at runtime) that their types match. And with a plain existential, you can't. So you’ll need to have a representation of types also at the value level.

What's more existentials are annoying to deal with (as usual), since you can’t ever expose the hidden type variables in your return type, or project those out (unlike dependent records/sigma-types). I am honestly not entirely sure this could eventually be made to work. Here is a possible partial sketch in Haskell, up to one case of find.

data Type t where
  VNat :: Type Nat
  VString :: Type String
  VArrow :: Type a -> Type b -> Type (a -> b)
  VPair :: Type a -> Type b -> Type (a, b) 
  VUnit :: Type ()
data SomeType = forall t. SomeType (Type t)
data SomeHom = forall a b. SomeHom (Type a) (Type b) (THom a b)

type Context = [(TVar, SomeType)] 
getType :: Context -> SomeType
getType [] = SomeType VUnit 
getType ((_, SomeType ttyp) :: gamma) =  
   case getType gamma of
       SomeType ctxT -> SomeType (VPair ttyp
find :: TVar -> Context -> SomeHom 
find tvar ((tvar’, ttyp) :: gamma)
   | tvar == tvar’ =
       case (ttyp, getType gamma) of
         (SomeType t, SomeType ctxT) ->
          SomeHom (VPair t ctxT) t Fst
Ligialignaloes answered 17/11, 2018 at 23:39 Comment(6)
The reason why I'm not using Template Haskell is because I don't want an embedded DSL, I'm intending to use this for an external language. In this situation, do you recommend your third solution? Surely Haskell can be used to write this simple compiler.Annmarieannnora
So you recommend the third solution, but it is not simple?Annmarieannnora
Ah; then I’d usually recommend avoiding GADTs altogether, as most compilers for external languages do (GHC included), especially if you actually want to use the resulting code (as opposed to playing with it). The third one might work but it is not the most immediate: some operations on well-typed terms (such as substitution) might end up being much harder to implement; you might be lucky.Ligialignaloes
Please disregard the comment I now deleted — I had posted it by mistake while editing.Ligialignaloes
Ok if not GADTs, what do you suggest as an alternative encoding of System T combinator calculus?Annmarieannnora
@Annmarieannnora Whoops wasn't notified — I'd use plain algebraic data types. Basically, just remove the type parameters from Expr/THom etc. A few more functions will become partial, and you'll need more tests. You can still store (object-language) types inside the AST, so that you can run a "retypechecker" that checks (at runtime) your AST can be typed in System T. That's the most common choice in compilers/interpreters.Ligialignaloes

© 2022 - 2024 — McMap. All rights reserved.