I observed the following unexpected behaviour when using ScalaCheck's Gen.pic, which (for me) indicates that its picking is not quite random, even though its documentation says so:
/** A generator that picks a given number of elements from a list, randomly */
I ran the below three little programs in order (in a span of 2 days, at different times, as it might matter) after setting
implicit override val generatorDrivenConfig = PropertyCheckConfig(
maxSize = 1000,
minSize = 1000,
minSuccessful = 1000)
to get a decent sample size.
Program #1
val set = Set(1,2,3,4,5,6,7,8,9,10,
11,12,13,14,15,16,17,18,19,20,
21,22,23,24,25,26,27,28,29,30,
31,32,33,34,35,36,37,38,39,40,
41,42,43,44,45,46,47,48,49,50)
// Thanks to @Jubobs for the solution
// See: https://mcmap.net/q/1777762/-how-can-i-generate-a-list-of-n-unique-elements-picked-from-a-set
val g = Gen.pick(3, set).map { _.toList }
forAll (g) { s => println(s) }
Out of the 3000 numbers generated at 2 different runs I got a surprisingly similar, and quite non-random distribution (numbers are rounded, only top 5 listed, as for all listing from here on):
- Number: frequency in run #1, frequency in run #2
- 15: 33%, 33%
- 47: 22%, 22%
- 4: 15%, 16%
- 19: 10%, 10%
- 30: 6%, 6%
(Disclaimer: I couldn't find how to create a table here other then this way)
Program 2
val list: List[Int] = List.range(1, 50)
val g = Gen.pick(3, list)
forAll (g) { s => println(s) }
In case of using a List
, the numbers seem to get "stuck" at the end of the range (3x1000 numbers in case of both runs):
- 49: 33%, 33%
- 48: 22%, 22%
- 47: 14%, 14%
- 46: 10%, 10%
- 45: 6%, 6%
Interestingly, the frequencies are pretty much the same as in the case of Program 1.
Remark: I repeated the runs for lists up to 10 times, and experienced the very same distribution with +/- 1% differences, just didn't want to list all the numbers here in this strange "table" format.
Program 3
Just to spice up things a bit, I ran a third little snippet, creating the Set
(Program 1) from a List
(Program 2):
val set: Set[Int] = List.range(1, 50).toSet
val g = Gen.pick(3, set).map { _.toList }
forAll (g) { s => println(s) }
Now the numbers are the same as for Program 2 (List
wins!), although the frequencies (again, for 3*1000 numbers in 2 runs) got slightly different at the end:
- 49: 33%, 33%
- 48: 23%, 22%
- 47: 16%, 15%
- 46: 9%, 10%
- 45: 7%, 6%
Question
Even though the sample size is not enough (as it is never enough) to tell true randomness, I can't help but question Gen.pick
's claimed randomness (as far as using it out-of-the-box, I might need to set some seed for it to work "more" randomly), since numbers got "stuck", and frequencies are almost the same.
Upon looking at Gen.pick
's source code, at line #672 a certain seed0
is used:
def pick[T](n: Int, l: Iterable[T]): Gen[Seq[T]] = {
if (n > l.size || n < 0) throw new IllegalArgumentException(s"invalid choice: $n")
else if (n == 0) Gen.const(Nil)
else gen { (p, seed0) =>
// ...
which I can't find defined anywhere else (in Gen.scala source code, or in scala.util.Random documentation), but I have a hunch it might have something to do with the observed behaviour.
Is this expected behaviour of Gen.pick
? If so, how can I get "more" random picking?