Why does a wildcard match work when enumerating all cases doesn't?

Asked 2/4, 2020 at 14:43 Answered 4/4, 2020 at 1:9

Solved haskell pattern-matching type-inference gadt

Consider this code:

{-# LANGUAGE GADTs #-}

data P t where
 PA :: P Int
 PB :: P Double
 PC :: P Char

isA PA = True
isA _ = False

It compiles and works fine. Now consider this code:

{-# LANGUAGE GADTs #-}

data P t where
 PA :: P Int
 PB :: P Double
 PC :: P Char

isA PA = True
isA PB = False
isA PC = False

It fails to compile:

Main.hs:8:10: error:
    • Couldn't match expected type ‘p’ with actual type ‘Bool’
        ‘p’ is untouchable
          inside the constraints: t ~ Int
          bound by a pattern with constructor: PA :: P Int,
                   in an equation for ‘isA’
          at Main.hs:8:5-6
      ‘p’ is a rigid type variable bound by
        the inferred type of isA :: P t -> p
        at Main.hs:(8,1)-(10,14)
      Possible fix: add a type signature for ‘isA’
    • In the expression: True
      In an equation for ‘isA’: isA PA = True
    • Relevant bindings include isA :: P t -> p (bound at Main.hs:8:1)
  |
8 | isA PA = True
  |          ^^^^

Main.hs:9:10: error:
    • Couldn't match expected type ‘p’ with actual type ‘Bool’
        ‘p’ is untouchable
          inside the constraints: t ~ Double
          bound by a pattern with constructor: PB :: P Double,
                   in an equation for ‘isA’
          at Main.hs:9:5-6
      ‘p’ is a rigid type variable bound by
        the inferred type of isA :: P t -> p
        at Main.hs:(8,1)-(10,14)
      Possible fix: add a type signature for ‘isA’
    • In the expression: False
      In an equation for ‘isA’: isA PB = False
    • Relevant bindings include isA :: P t -> p (bound at Main.hs:8:1)
  |
9 | isA PB = False
  |          ^^^^^

Main.hs:10:10: error:
    • Couldn't match expected type ‘p’ with actual type ‘Bool’
        ‘p’ is untouchable
          inside the constraints: t ~ Char
          bound by a pattern with constructor: PC :: P Char,
                   in an equation for ‘isA’
          at Main.hs:10:5-6
      ‘p’ is a rigid type variable bound by
        the inferred type of isA :: P t -> p
        at Main.hs:(8,1)-(10,14)
      Possible fix: add a type signature for ‘isA’
    • In the expression: False
      In an equation for ‘isA’: isA PC = False
    • Relevant bindings include isA :: P t -> p (bound at Main.hs:8:1)
   |
10 | isA PC = False
   |          ^^^^^

Why? What's going on here?

Edit: Adding the type signature isA :: P t -> Bool makes it work, so my question now becomes: why doesn't type inference work in the second case, since it does in the first case?

Tendon answered 2/4, 2020 at 14:43 Comment(7)

Because a P Int is not the same as a P Bool, hence your isA has no "input type" that can satisfy all cases. – Watermelon 2/4, 2020 at 14:48

@WillemVanOnsem But there is: isA :: P t -> Bool (and in fact, writing that out makes it work; see edit). And where did P Bool even come from? – Tendon 2/4, 2020 at 14:57

I noticed that it compiles with only a warning about overlapping patterns if you append the wildcard case (isA _ = False) to the exhaustive patterns, but it throws the error again if you replace it with isA _ = undefined, although this could be less surprising to someone who is more familiar with GHC. – Garald 2/4, 2020 at 16:16

This Q&A has a comment pertaining to this exact situation, if it can help anyone. – Garald 2/4, 2020 at 16:41

I think the question could be reworded slightly to emphasize that the issue is the inference of P t -> p without the wildcard, rather than the expected P t -> Bool. – Appling 2/4, 2020 at 16:42

@Garald Great example. Supopse we had PA :: P Bool instead. Then, isA PA = True ; isA _ = False can only be isA :: P t -> Bool. OTOH, isA PA = True ; isA _ = undefined could be isA :: P t -> Bool as well as isA :: P t -> t. Which one should be inferred? Neither is more general than the other. – Sutlej 2/4, 2020 at 16:44

This really shouldn't come as a suprise since type infernce is undecidable for GADTs – Mercuric 4/4, 2020 at 5:26

In typing a case construct (whether an explicit case statement or an implicit pattern-based function definition) in the absence of GADTs, the individual alternatives:

pattern -> body

can be unified by typing all the patterns and unifying those with the type of the scrutinee, then typing all the bodies and unifying those with the type of the case expression as a whole. So, in a simple example like:

data U = UA | UB | UC
isA1 u = case u of
  UA -> True
  UB -> False
  x  -> False

we can initially type the patterns UA :: U, UB :: U, x :: a, unify them using the type equality a ~ U to infer the type of the scrutinee u :: U, and similarly unify True :: Bool and both of the False :: Bool to the type of the overall case expression Bool, unifying that with the type of isA to get isA :: U -> Bool.

Note that the process of unification can specialize the types. Here, the type of the pattern x :: a was general, but by the end of the unification process, it had been specialized to x :: U. This can happen with the bodies, too. For example:

len mstr = case mstr of
  Nothing -> 0
  Just str -> length str

Here, 0 :: Num a => a is polymorphic, but because length returns an Int, by the end of the process, the bodies (and so the entire case expression) have been unified to the type Int.

In general, through unification, the common, unified type of all the bodies (and so the type of the overall case expression) will be the "most general" / "least restrictive" type of which the types of the bodies are all generalizations. In some cases, this type might be the type of one of the bodies, but in general, all the bodies can be more more general than the "most general" unified type, but no body can be more restrictive.

Things change in the presence of GADTs. When type-checking case constructs with GADTs, the patterns in an alternative can introduce a "type refinement", a set of additional bindings of type variables to be used in type-checking the bodies of the alternative. (This is what makes GADTs useful in the first place.)

Because different alternatives' bodies are typed under different refinements, naive unification isn't possible. For example, consider the tiny typed DSL and its interpreter:

data Term a where
  Lit :: Int -> Term Int
  IsZ :: Term Int -> Term Bool
  If :: Term Bool -> Term a -> Term a -> Term a

eval :: Term a -> a
eval t = case t of
  Lit n -> n
  IsZ t -> eval t == 0
  If b t e -> if eval b then eval t else eval e

If we were to naively unify the bodies n :: Int, eval t == 0 :: Bool, and if eval b then eval t else eval e :: a, the program wouldn't type check (most obviously, because Int and Bool don't unify!).

In general, because type refinements allow the calculated types of the alternatives' bodies to be more specific than the final type, there's no obvious "most general" / "least restrictive" type to which all bodies can be unified, as there was for the case expression without GADTs.

Instead, we generally need to make available a "target" type for the overall case expression (e.g., for eval, the return type a in the type signature), and then determine if under each refinement introduced by a constructor (e.g., IsZ introducing the refinement a ~ Bool), the body eval t == 0 :: Bool has as its type the associated refinement of a.

If no target type is explicitly provided, then the best we can do -- in general -- is use a fresh type variable p as the target and try to check each refined type against that.

This means that, given the following definition without a type signature for isA2:

data P t where
  PA :: P Int
  PB :: P Double
  PC :: P Char

isA2 = \p -> case p of
  PA -> True
  PB -> False
  PC -> False

what GHC tries to do is type isA2 :: P t -> p. For the alternative:

PA -> True

it types PA :: P t giving the refinement t ~ Int, and under this refinement, it tries to type True :: p. Unfortunately, p is not Bool under any refinement involving the unrelated type variable a, and we get an error. Similar errors are generated for the other alternatives, too.

Actually, there's one more thing we can do. If there are alternatives that don't introduce a type refinement, then the calculated types of their bodies are NOT more specific than the final type. So, if we unify the body types for "unrefined" alternatives, the resulting type provides a legitimate unification target for the refined alternatives.

This means that, for the example:

isA3 = \p -> case p of
  PA -> True
  x  -> False

the second alternative:

x -> False

is typed by matching the pattern x :: P t which introduces no type refinement. The unrefined type of the body is Bool, and this type is an appropriate target for unification of the other alternatives.

Specifically, the first alternative:

PA -> True

matches with a type refinement a ~ Int. Under this refinement, the actual type of the body True :: Bool matches the "refinement" of the target type Bool (which is "refined" to Bool), and the alternative is determined to have valid type.

So, the intuition is that, without a wildcard alternative, the inferred type for the case expression is an arbitrary type variable p, which is too general to be unified with the type refining alternatives. However, when you add a wildcard case alternative _ -> False, it introduces a more restrictive body type Bool into the unification process that, having been deduced without any type refinement by the pattern _, can inform the unification algorithm by providing a more restrictive type Bool to which the other, type refined alternatives, can be unified.

Above, I've made it sound like there's some two-phase approach, where the "non-refining" alternatives are examined first to determine a target type, and then the refining alternatives are checked against it.

In fact, what happens is that the refinement process introduces fresh variables into the unification process that, even when they're unified, don't affect the larger type context. So, all the alternatives are unified at once, but unification of the unrefining alternatives affects the larger type context while unification of the refined alternatives affects a bunch of fresh variables, giving the same end result as if the unrefined and refined alternatives were processed separately.

Jude answered 4/4, 2020 at 1:9 Comment(0)

Disclaimer: I write this as an answer because It doesn't fit in a comment. But I might be wrong

This behaviour is the expected when pattern match on GADTs. Up to GHC's User manual:

type refinement is only carried out based on user-supplied type annotations. So if no type signature is supplied for eval, no type refinement happens, and lots of obscure error messages will occur

Also from de User Manual:

When pattern-matching against data constructors drawn from a GADT, for example in a case expression, the following rules apply:

The type of the scrutinee must be rigid.
The type of the entire case expression must be rigid.
The type of any free variable mentioned in any of the case alternatives must be rigid.

Note: a type variable is rigid iff it is specified by the user.

Up to this, when pattern matching against a GADT you must provided the type signature (the reason is that type inference is difficult on GADTs). Hence, apparently the first definition of isA should fail to compile, but in the paper which type inference for GADTs is explained (section 6.4):

We remarked in Section 4.3 that in PCON-R it would be unsound to use any unifier other than a most-general one. But must the refinement be a unifier at all? For example, even though the case expression could do refinement, no refinement is necessary to typecheck this function:

f :: Term a -> Int
f (Lit i) = i
f _       = 0

The above example is exactly your case!. In the paper this is called a pre-unifier and there is a very technical explanation on how this works but As far as I can understand, when writing:

data P t where
 PA :: P Int
 PB :: P Double
 PC :: P Char

isA PA = True
isA PB = False
isA PC = False

the compiler starts by deducing isA :: P t -> p and refuse to continue, because type variables aren't rigid (i.e. aren't user-specify)

whereas when writing:

data P t where
 PA :: P Int
 PB :: P Double
 PC :: P Char

isA PA = True
isA _  = False

the compiler can deduce that any type inference will be less general than deducing Bool as a returning type, hence It can safely deduce isA :: P t -> Bool.

Probably this seems as obscure to you as to me, but for sure the two cases you ask for are actually documentated, so probably this is the desired behaviour for GHC developers and not a weird bug.

Heroism answered 2/4, 2020 at 19:2 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags