How to guarantee FsCheck reproducibility
Asked Answered
F

2

6

We want to use FsCheck as part of our unit testing in continuous integration. As such deterministic and reproducible behaviour is very important for us.

FsCheck, being a random testing framework, can generate test cases that potentially sometimes break. The key is, we do not only use properties that would have to hold for necessarily every input, like say List.rev >> List.rev === id. But rather, we do some numerics and some test cases can cause the test to break because of being badly conditioned.

The question is: how can we guarantee, that once the test succeeds it will always succeed?

So far I see the following options:

  • hard code the seed, e.g. 0. This would be the easiest solution.
  • make very specific custom generators which avoid bad examples. Certainly possible, but could turn out pretty hard, especially if there are many objects to generate.
  • live with it, that in some cases the build might be red due to pathological cases and simply re-run.

What is the idiomatic way of using FsCheck in such a setting?

Fina answered 3/3, 2014 at 11:14 Comment(0)
C
3

(I started writing a comment but it got so long I guess it deserved its own answer).

It's very common to test properties with FsCheck that don't hold for every input. For example, FsCheck will trivially refute your List.rev example if you run it for list<float>.

Numerical stability is a tricky problem in itself - there isn't any non-determinism in FsCheck to blame here(FsCheck is totally deterministic, it's just an input generator...). The "non-determinism" you're referring to may be things like bugs in floating point operations in certain processors and so on. But even in that case, wouldn't you like to know about them? And if you algorithm is numerically unstable for a class of inputs, wouldn't you like to know about it? If you don't it seems to me like you're setting yourself up for some real non-determinism...in production.

The idiomatic way to write properties that don't hold for all inputs of a given type in FsCheck is to write a generator & shrinker. You can use ==> as a step up to that, but it doesn't scale up well to complex preconditions. You say this could turn out pretty hard - that's true in the sense that I guarantee you'll learn something about your code. A good thing!

Fixing the seed is a bad idea, except for reproducing a previously discovered bug. I mean, in practice what would you do: keep re-running the test until it passes, then fix the seed and declare "job done"?

Courtesy answered 4/3, 2014 at 8:21 Comment(2)
No, it's not "job done". Typically, we use more aggressive test cases than is actually necessary. And yes, getting to know more about subtle bugs or getting to know when it break is definitely a good thing. I was actually interested what is the proper way of dealing with it. But ya, I suppose generator + shrinker are the way to go. If for no other reason, then to know precisely what works and what breaks.Fina
I apologize, on re-reading I sound like a troll. I was trying to be provocative by stating an obviously bad practice, not suggesting you would actually do that...Courtesy
T
4

some test cases can cause the test to break because of being badly conditioned.

That sounds like you need a Conditional Property:

let isOk x =
    match x with
    | 42 -> false
    | _ -> true

let MyProperty (x:int) = isOk x ==> // check x here...

(assuming that you don't like the number 42.)

Tombac answered 3/3, 2014 at 11:55 Comment(2)
This is not a good idea if the bulk of your generated value is rejected by the precondition - this makes test case generation very inefficient. Use a custom generator instead. Also, for complex tests, make sure the precondition does not really belong in the test instead.Courtesy
This approach would work in my particular case. It might have failed one in 20 times running the 100 tests. But I know to not use it for filtering likely cases.Fina
C
3

(I started writing a comment but it got so long I guess it deserved its own answer).

It's very common to test properties with FsCheck that don't hold for every input. For example, FsCheck will trivially refute your List.rev example if you run it for list<float>.

Numerical stability is a tricky problem in itself - there isn't any non-determinism in FsCheck to blame here(FsCheck is totally deterministic, it's just an input generator...). The "non-determinism" you're referring to may be things like bugs in floating point operations in certain processors and so on. But even in that case, wouldn't you like to know about them? And if you algorithm is numerically unstable for a class of inputs, wouldn't you like to know about it? If you don't it seems to me like you're setting yourself up for some real non-determinism...in production.

The idiomatic way to write properties that don't hold for all inputs of a given type in FsCheck is to write a generator & shrinker. You can use ==> as a step up to that, but it doesn't scale up well to complex preconditions. You say this could turn out pretty hard - that's true in the sense that I guarantee you'll learn something about your code. A good thing!

Fixing the seed is a bad idea, except for reproducing a previously discovered bug. I mean, in practice what would you do: keep re-running the test until it passes, then fix the seed and declare "job done"?

Courtesy answered 4/3, 2014 at 8:21 Comment(2)
No, it's not "job done". Typically, we use more aggressive test cases than is actually necessary. And yes, getting to know more about subtle bugs or getting to know when it break is definitely a good thing. I was actually interested what is the proper way of dealing with it. But ya, I suppose generator + shrinker are the way to go. If for no other reason, then to know precisely what works and what breaks.Fina
I apologize, on re-reading I sound like a troll. I was trying to be provocative by stating an obviously bad practice, not suggesting you would actually do that...Courtesy

© 2022 - 2024 — McMap. All rights reserved.