There are several statistical tests suites available. I wrote, copied, and otherwise gathered together 120 PRNGs and tested each with a variety of tests suites given 4 hours per PRNG per test suite:
- PractRand (standard, 1 terabyte) found bias in 78 PRNGs
- TestU01 (BigCrush) found bias in 50 PRNGs
- RaBiGeTe (extended, 512 megabit, x1) found bias in 40 PRNGs
- Dieharder (custom command line options) found bias in 25 PRNGs
- Dieharder (-a command line option) found bias in 13 PRNGs
- NIST STS (default, 64 megabit x128) found bias in 11 PRNGs
How many of those were in PRNGs that the other test suites all missed?
- PractRand (standard, 1 terabyte) found 22 unique biases, in a wide variety of categories.
- RaBiGeTe (extended, 512 megabit, x1) found 5 unique biases, all in LCGs and combination LCGs.
- TestU01 BigCrush found 2 unique biases, both in small chaotic PRNGs.
No other test suite found any unique biases.
In short, only PractRand, TestU01, and possibly RaBiGeTe are worth using.
Full disclosure: I wrote PractRand, so either the set of PRNGs or any other non-qualitative measure could be biased in its favor.
Miscellaneous advantages:
- PractRand and TestU01 tend to be the easiest to interpret the output of in my opinion.
- PractRand and Dieharder tend to be the easiest to automate testing for via command line interface I think.
- PractRand and RaBiGeTe were the only ones to support multithreaded testing.
Miscellaneous disadvantages:
- PractRand required more bits of input to test than other test suites - could be a problem if your RNG is very slow or otherwise limited on amount of data produced.
- RaBiGeTe and NIST STS both have interface issues.
- Dieharder and NIST STS both have false-positive issues.
- NIST STS had the worst interface in my opinion.
- I could not get Dieharder to compile on Windows. I managed to get TestU01 to compile on windows but it took some work.
- Recent versions of RaBiGeTe are closed-source and windows-only.
The set of PRNGs tested:
The PRNG set includes 1 large GFSR, 1 large LFSR, 4 xorshift type PRNGs, 2 xorwow type PRNGs, 3 other not-quite-LFSR PRNGs. It includes 10 simple power-of-2 modulus LCGs (which discard low bits to reach acceptable quality levels), 10 power-of-2 modulus not-quite-LCGs, and 9 combination generators based primarily around LCGs and not-quite-LCGs. It includes 19 reduced strength versions of CSPRNGs, plus one full strength CSPRNG. Of those, 14 were based upon indirection / dynamic s-boxes (e.g. RC4, ISAAC), four were ChaCha/Salsa parameterizations, and the remaining 2 were Trivium variants. It includes 11 PRNGs broadly classifiable as LFib-type or similar, not counting the LFSRs/GFSRs. The rest (about 35) were small state chaotic PRNGs, of which 10 used multiplication and the others were limited to arithmetic and bitwise logic.
Edit: There is also the test set in gjrand, which is very obscure and a little odd, but actually does extremely well.
Also, all of the PRNGs tested are included as non-recommended PRNGs in PractRand.