QuickCheck: Arbitrary instances of nested data structures that generate balanced specimens
Asked Answered
C

3

29

tl;dr: how do you write instances of Arbitrary that don't explode if your data type allows for way too much nesting? And how would you guarantee these instances produce truly random specimens of your data structure?

I want to generate random tree structures, then test certain properties of these structures after I've mangled them with my library code. (NB: I'm writing an implementation of a subtyping algorithm, i.e. given a hierarchy of types, is type A a subtype of type B. This can be made arbitrarily complex, by including multiple-inheritance and post-initialization updates to the hierarchy. The classical method that supports neither of these is Schubert Numbering, and the latest result known to me is Alavi et al. 2008.)

Let's take the example of rose-trees, following Data.Tree:

data Tree a = Node a (Forest a)
type Forest a = [Tree a]

A very simple (and don't-try-this-at-home) instance of Arbitray would be:

instance (Arbitrary a) => Arbitrary (Tree a) where
    arbitrary = Node <$> arbitrary <$> arbitrary

Since a already has an Arbitrary instance as per the type constraint, and the Forest will have one, because [] is an instance, too, this seems straight-forward. It won't (typically) terminate for very obvious reasons: since the lists it generates are arbitrarily long, the structures become too large, and there's a good chance they won't fit into memory. Even a more conservative approach:

arbitrary = Node <$> arbitrary <*> oneof [arbitrary,return []]

won't work, again, for the same reason. One could tweak the size parameter, to keep the length of the lists down, but even that won't guarantee termination, since it's still multiple consecutive dice-rolls, and it can turn out quite badly (and I want the odd node with 100 children.)

Which means I need to limit the size of the entire tree. That is not so straight-forward. unordered-containers has it easy: just use fromList. This is not so easy here: How do you turn a list into a tree, randomly, and without incurring bias one way or the other (i.e. not favoring left-branches, or trees that are very left-leaning.)

Some sort of breadth-first construction (the functions provided by Data.Tree are all pre-order) from lists would be awesome, and I think I could write one, but it would turn out to be non-trivial. Since I'm using trees now, but will use even more complex stuff later on, I thought I might try to find a more general and less complex solution. Is there one, or will I have to resort to writing my own non-trivial Arbitrary generator? In the latter case, I might actually just resort to unit-tests, since this seems too much work.

Crapulent answered 11/4, 2013 at 21:40 Comment(2)
The Quickcheck manual explains this and has helpful examples (though they may be for QuickCheck v1); see the sections titled "The size of test data" and "Generating recursive data types".Rowena
This is great, thanks. I didn't read through the entire manual, but of course, I should have. I guess it was really hard (for me) to google for this. I think this should solve my problem.Crapulent
F
22

Use sized:

instance Arbitrary a => Arbitrary (Tree a) where
arbitrary = sized arbTree

arbTree :: Arbitrary a => Int -> Gen (Tree a)
arbTree 0 = do
    a <- arbitrary
    return $ Node a []
arbTree n = do
    (Positive m) <- arbitrary
    let n' = n `div` (m + 1)
    f <- replicateM m (arbTree n')
    a <- arbitrary
    return $ Node a f

(Adapted from the QuickCheck presentation).

P.S. Perhaps this will generate overly balanced trees...

Foilsman answered 11/4, 2013 at 22:20 Comment(3)
Great answer, thanks. That's basically in line with what @sacundim linked to in the manual. I hope this could be useful for others then, since the answer seems to be relatively straightforward.Crapulent
Yes, I tried to adapt this from binary trees though (in part because I realized this can help me on a project of mine too ;) )Foilsman
A comment on the "too balanced" problem: instead of using mapM in this way, one could generate a random partition of an arbitrary length list, and take that as the basis for generating children. Oleg's random list shuffles might provide help.Crapulent
L
7

You might want to use the library presented in the paper "Feat: Functional Enumeration of Algebraic Types" at the Haskell Symposium 2012. It is on Hackage as testing-feat, and a video of the talk introducing it is available here: http://www.youtube.com/watch?v=HbX7pxYXsHg

Lathing answered 2/5, 2013 at 10:51 Comment(0)
C
2

As Janis mentioned, you can use the package testing-feat, which creates enumerations of arbitrary algebraic data types. This is the easiest way to create unbiased uniformly distributed generators for all trees of up to a given size.

Here is how you would use it for rose trees:

import Test.Feat (Enumerable(..), uniform, consts, funcurry)
import Test.Feat.Class (Constructor)
import Data.Tree (Tree(..))
import qualified Test.QuickCheck as QC

-- We make an enumerable instance by listing all constructors
-- for the type. In this case, we have one binary constructor:
-- Node :: a -> [Tree a] -> Tree a
instance Enumerable a => Enumerable (Tree a) where
    enumerate = consts [binary Node]
      where
        binary :: (a -> b -> c) -> Constructor c
        binary = unary . funcurry

-- Now we use the Enumerable instance to create an Arbitrary
-- instance with the help of the function:
-- uniform :: Enumerable a => Int -> QC.Gen a
instance Enumerable a => QC.Arbitrary (Tree a) where
    QC.arbitrary = QC.sized uniform
    -- QC.shrink = <some implementation>

The Enumerable instance can also be generated automatically with TemplateHaskell:

deriveEnumerable ''Tree
Carmelo answered 10/4, 2018 at 13:11 Comment(2)
This example snippet does not appear to correspond to testing-feat 1.1.0.0 of May 26, 2018. It appears that your example came only ~1½ month before breaking API changes. Perhaps this is only a matter of fixing imports, as e.g. consts is now only in Test.Feat.Class, and so on. But the deprecation for Test.Feat.Class asks one to use Control.Enumerable instead.Zhdanov
The import Constructor does not appear anywhere in 1.1.0.0, but was a type alias, type Constructor = Enumerate in 0.4.0.3. There's an examples directory, but it lacks an example as simple as this one.Zhdanov

© 2022 - 2024 — McMap. All rights reserved.