QuickCheck: How to use exhaustiveness checker to prevent forgotten constructors of a sum type

Asked 27/8, 2014 at 19:36 Answered 26/10, 2019 at 15:52

I have a Haskell data type like

data Mytype
  = C1
  | C2 Char
  | C3 Int String

If I case on a Mytype and forget to handle one of the cases, GHC gives me a warning (exhaustiveness check).

I now want to write a QuickCheck Arbitrary instance to generate MyTypes like:

instance Arbitrary Mytype where
  arbitrary = do
    n <- choose (1, 3 :: Int)
    case n of
      1 -> C1
      2 -> C2 <$> arbitrary
      3 -> C3 <$> arbitrary <*> someCustomGen

The problem with this is that I can add a new alternative to Mytype and forget to update the Arbitrary instance, thus having my tests not test that alternative.

I would like to find a way of using GHC's exhaustiveness checker to remind me of forgotten cases in my Arbitrary instance.

The best I've come up with is

arbitrary = do
  x <- elements [C1, C2 undefined, C3 undefined undefined]
  case x of
    C1     -> C1
    C2 _   -> C2 <$> arbitrary
    C3 _ _ -> C3 <$> arbitrary <*> someCustomGen

But it doesn't really feel elegant.

I intuitively feel that there's no 100% clean solution to this, but would appreciate anything that reduces the chance of forgetting such cases - especially in a big project where code and tests are separated.

Degauss answered 27/8, 2014 at 19:36 Comment(5)

Just a note: One can write C2{} instead of C2 _ and so on which at least makes the syntax a bit nicer. – Degauss 28/8, 2014 at 19:59

Note that the undefined thing will fail if the constructor is strict. – Hollenbeck 4/9, 2014 at 8:23

Is there some reason you don't want to just derive the Arbitrary instance automatically with TH? – Considering 4/9, 2014 at 18:52

@DanBurton Yes, I wanted to hint at that with my use of someCustomGen. If I have some invariant not captured by the type system, or otherwise want to influence with what probability alternatives are generated (to make it more likely to generate non-trivial cases), I have to customise parts of the Arbitrary instance (e.g. with QuickCheck's frequency function). – Degauss 4/9, 2014 at 21:26

Great question! I have opened github.com/nick8325/quickcheck/issues/267 to ask what the recommended solution is today. – Grossman 31/8, 2019 at 15:16

I implemented a solution with TemplateHaskell, you can find a prototype at https://gist.github.com/nh2/d982e2ca4280a03364a8. With this you can write:

instance Arbitrary Mytype where
  arbitrary = oneof $(exhaustivenessCheck ''Mytype [|
      [ pure C1
      , C2 <$> arbitrary
      , C3 <$> arbitrary <*> arbitrary
      ]
    |])

It works like this: You give it a type name (like ''Mytype) and an expression (in my case a list of arbitrary style Gens). It gets the list of all constructors for that type name and checks whether the expression contains all of these constructors at least once. If you just added a constructor but forgot to add it to the Arbitrary instance, this function will warn you at compile time.

This is how it's implemented with TH:

exhaustivenessCheck :: Name -> Q Exp -> Q Exp
exhaustivenessCheck tyName qList = do
  tyInfo <- reify tyName
  let conNames = case tyInfo of
        TyConI (DataD _cxt _name _tyVarBndrs cons _derives) -> map conNameOf cons
        _ -> fail "exhaustivenessCheck: Can only handle simple data declarations"

  list <- qList
  case list of
    input@(ListE l) -> do
      -- We could be more specific by searching for `ConE`s in `l`
      let cons = toListOf tinplate l :: [Name]
      case filter (`notElem` cons) conNames of
        [] -> return input
        missings -> fail $ "exhaustivenessCheck: missing case: " ++ show missings
    _ -> fail "exhaustivenessCheck: argument must be a list"

I'm using GHC.Generics to easily traverse the syntax tree of the Exp: With toListOf tinplate exp :: [Name] (from lens) I can easily find all Names in the whole exp.

I was surprised that the types from Language.Haskell.TH do not have Generic instances, and neither (with current GHC 7.8) do Integer or Word8 - Generic instances for these are required because they appear in Exp. So I added them as orphan instances (for most things, StandaloneDeriving does it but for primitive types like Integer I had to copy-paste instances as Int has them).

The solution is not perfect because it doesn't use the exhaustiveness checker like case does, but as we agree, that's not possible while staying DRY, and this TH solution is DRY.

One possible improvement/alternative would be to write a TH function that does this check for all Arbitrary instances in a whole module at once instead of calling exhaustivenessCheck inside each Arbitrary instance.

Degauss answered 28/8, 2014 at 21:36 Comment(0)

You want to ensure that your code behaves in a particular way; the simplest way to check the behaviour of code is to test it.

In this case, the desired behaviour is that each constructor gets reasonable coverage in tests. We can check that with a simple test:

allCons xs = length xs > 100 ==> length constructors == 3
             where constructors = nubBy eqCons xs
                   eqCons  C1       C1      = True
                   eqCons  C1       _       = False
                   eqCons (C2 _)   (C2 _)   = True
                   eqCons (C2 _)    _       = False
                   eqCons (C3 _ _) (C3 _ _) = True
                   eqCons (C3 _ _)  _       = False

This is pretty naive, but it's a good first shot. Its advantages:

eqCons will trigger an exhaustiveness warning if new constructors are added, which is what you want
It checks that your instance is handling all constructors, which is what you want
It also checks that all constructors are actually generated with some useful probability (in this case at least 1%)
It also checks that your instance is usable, eg. doesn't hang

Its disadvantages:

Requires a large amount of test data, in order to filter out those with length > 100
eqCons is quite verbose, since a catch-all eqCons _ _ = False would bypass the exhaustiveness check
Uses magic numbers 100 and 3
Not very generic

There are ways to improve this, eg. we can compute the constructors using the Data.Data module:

allCons xs = sufficient ==> length constructors == consCount
             where sufficient   = length xs > 100 * consCount
                   constructors = length . nub . map toConstr $ xs
                   consCount    = dataTypeConstrs (head xs)

This loses the compile-time exhaustiveness check, but it's redundant as long as we test regularly and our code has become more generic.

If we really want the exhaustiveness check, there are a few places where we could shoe-horn it back in:

allCons xs = sufficient ==> length constructors == consCount
             where sufficient   = length xs > 100 * consCount
                   constructors = length . nub . map toConstr $ xs
                   consCount    = length . dataTypeConstrs $ case head xs of
                                                                  x@(C1)     -> x
                                                                  x@(C2 _)   -> x
                                                                  x@(C3 _ _) -> x

Notice that we use consCount to eliminate the magic 3 completely. The magic 100 (which determined the minimum required frequency of a constructor) now scales with consCount, but that just requires even more test data!

We can solve that quite easily using a newtype:

consCount = length (dataTypeConstrs C1)

newtype MyTypeList = MTL [MyType] deriving (Eq,Show)

instance Arbitrary MyTypeList where
  arbitrary = MTL <$> vectorOf (100 * consCount) arbitrary
  shrink (MTL xs) = MTL (shrink <$> xs)

allCons (MTL xs) = length constructors == consCount
                   where constructors = length . nub . map toConstr $ xs

We can put a simple exhaustiveness check in there somewhere if we like, eg.

instance Arbitrary MyTypeList where
  arbitrary = do x <- arbitrary
                 MTL <$> vectorOf (100 * consCount) getT
              where getT = do x <- arbitrary
                              return $ case x of
                                            C1     -> x
                                            C2 _   -> x
                                            C3 _ _ -> x
  shrink (MTL xs) = MTL (shrink <$> xs)

Afterwards answered 4/9, 2014 at 11:10 Comment(0)

Here I exploit an unused variable _x. This is not really more elegant than your solution, though.

instance Arbitrary Mytype where
  arbitrary = do
    let _x = case _x of C1 -> _x ; C2 _ -> _x ; C3 _ _ -> _x
    n <- choose (1, 3 :: Int)
    case n of
      1 -> C1
      2 -> C2 <$> arbitrary
      3 -> C3 <$> arbitrary <*> someCustomGen

Of course, one has to keep the last case coherent with the dummy definition of _x, so it is not completely DRY.

Alternatively, one might exploit Template Haskell to build a compile-time assert checking that the constructors in Data.Data.dataTypeOf are the expected ones. This assert has to be kept coherent with the Arbitrary instance, so this is not completely DRY either.

If you do not need custom generators, I believe Data.Data can be exploited to generate Arbitrary instances via Template Haskell (I think I saw some code doing exactly that, but I can't remember where). In this way, there's no chance the instance can miss a constructor.

Vowelize answered 27/8, 2014 at 22:20 Comment(7)

Another possibility is to use GHC.Generics to derive the arbitrary instance. GHC.Generics is very well suited in cases where you can say what to do for Sums (Constructors of a data type) and Products (Fields of a data constructor), which this one should be an instance of. – Maintopmast 28/8, 2014 at 19:37

@Maintopmast Unfortunately GHC.Generics only help if you want to use the default arbitrary in all fields. In that case, yes, they are great for this purpose, but in my case it is important that I can customise the instance (I tried hinting at that by including someCustomGen). – Degauss 28/8, 2014 at 19:56

The self-recursive way you propose is certainly nicer than my undefineds. – Degauss 28/8, 2014 at 20:1

If I add a C4 constructor how is the above code going to catch the missing case? – Radiant 29/8, 2014 at 17:10

@DanielVelkov It will emit a warning "Non exhaustive patterns", like the code posted by nh2 in his question. – Vowelize 29/8, 2014 at 17:21

@Vowelize and what if I update the case _x part but not the second part of the function which generates the random values? – Radiant 29/8, 2014 at 17:22

@DanielVelkov This is indeed a shortcoming. That's what I meant when I wrote that the case must be kept coherent with _x, so this is not completely DRY, nor much more elegant than the original nh2's code. It does not force the instance to be correct, but it helps the programmer in remembering to fix the code for new constructors. – Vowelize 29/8, 2014 at 17:29

Here is a solution using the generic-random library:

{-# language DeriveGeneric #-}
{-# language TypeOperators #-}

import Generic.Random
import GHC.Generics
import Test.QuickCheck

data Mytype
  = C1
  | C2 Char
  | C3 Int String
  deriving Generic

instance Arbitrary Mytype where
  arbitrary = genericArbitraryG customGens uniform
    where
      customGens :: Gen String :+ ()
      customGens = someCustomGen :+ ()

someCustomGen :: Gen String
someCustomGen = undefined

genericArbitraryG takes care of generating each constructor of MyType. In this case we use uniform to get a uniform distribution of constructors. With customGens we define that each String field in Mytype is generated with someCustomGen.

See Generic.Random.Tutorial for more examples.

Grossman answered 26/10, 2019 at 15:52 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags