C++ random_shuffle() how does it work?
Asked Answered
F

1

5

I have a Deck vector with 52 Card, and I want to shuffle it.

vector<Card^> cards;

So I used this:

random_shuffle(cards.begin(), cards.end());

The problem was that it gave me the same result every time, so I used srand to randomize it:

srand(unsigned(time(NULL))); 
random_shuffle(cards.begin(),cards.end());

This was still not truly random. When I started dealing cards, it was the same as in the last run. For example: "1. deal: A,6,3,2,K; 2. deal: Q,8,4,J,2", and when I restarted the program I got exactly the same order of deals.

Then I used srand() and random_shuffle with its 3rd parameter:

 int myrandom (int i) {
    return std::rand()%i; 
 }
 srand(unsigned(time(NULL))); 
 random_shuffle(cards.begin(),cards.end(), myrandom);

Now it's working and always gives me different results on re-runs, but I don't know why it works this way. How do these functions work, what did I do here?

Footway answered 12/10, 2014 at 0:23 Comment(12)
Read what CPPReference says on the gen parameter. It appears your compiler's idea of an "unspecified source" is a simple constant.Parrett
@Parrett You say cppreference.com, yet link to cplusplus.com.Paramount
Do you only see the issue of identical runs when you run your program quickly? That is, if you waited 5 seconds between runs, do you get a different output?Petrolic
Yuck, you are right; I prefer cppreference as well. Must'of copied the wrong link. (Fortunately its description is about the same.)Parrett
@Parrett Yeah, usually the descriptions are fine, it's the examples that sometimes have bugs.Paramount
That's C++/CLI, not C++. FTFY.Wraith
@sharth I tried it now slowly and still the same result. The last code what I copied is fine and always different results, just I don't know how. I have read about random_shuffle and the advise was that use srand() OR random_shuffle with 3rd parameter but not both.Footway
@Parrett and Namfuak: Ok thanks, I gonna read this :)Footway
Problem with srand of course is that it it only takes an unsigned int so you are likely limited to approximately 2^32 possible seed values. I say approximately because a seed of 1 is special (and many implementations also make a seed of 0 special). With that being said. Consider that there are 52! ways to shuffle a deck of 52 cards (80658175170943878571660636856403766975289505440883277824000000000000) . But there are far fewer seed combinations with a 32 bit int. I don't really recommend straight srand/32 bit seed for shuffling. You'd need 226 bit seed to cover all combinations.Press
VC++ and C++/CLI (From at least VS2012+) do support Mersenne twister (per the the C++11 standard). You might consider this alternative using Mersenne (if you want a PRNG): cplusplus.com/reference/random/mt19937 . Microsoft also covers this with an example in their MSDN documentationPress
@MichaelPetch what do you recommend? Use a "manual" shuffle algorithm?Footway
@MichaelPetch Why not write that up as an answer?Kike
P
8

This answer required some investigation, looking at the C++ Standard Library headers in VC++ and looking at the C++ standard itself. I knew what the standard said, but I was curious about VC++ (including C++CLI) did their implementation.

First what does the standard say about std::random_shuffle . We can find that here. In particular it says:

Reorders the elements in the given range [first, last) such that each possible permutation of those elements has equal probability of appearance.

1) The random number generator is implementation-defined, but the function std::rand is often used.

The bolded part is key. The standard says that the RNG can be implementation specific (so results across different compilers will vary). The standard suggests that std::rand is often used. But this isn't a requirement. So if an implementation doesn't use std::rand then it follows that it likely won't use std::srand for a starting seed. An interesting footnote is that the std::random_shuffle functions are deprecated as of C++14. However std::shuffle remains. My guess is that since std::shuffle requires you to provide a function object you are explicitly defining the behavior you want when generating random numbers, and that is an advantage over the older std::random_shuffle.

I took my VS2013 and looked at the C++ standard library headers and discovered that <algorithm> uses template class that uses a completely different pseudo-rng (PRNG) than std::rand with an index (seed) set to zero. Although this may vary in detail between different versions of VC++ (including C++/CLI) I think it is probable that most versions of VC++/CLI do something similar. This would explain why each time you run your application you get the same shuffled decks.

The option I would opt for if I am looking for a Pseudo RNG and I'm not doing cryptography is to use something well established like Mersenne Twister:

Advantages The commonly-used version of Mersenne Twister, MT19937, which produces a sequence of 32-bit integers, has the following desirable properties:

It has a very long period of 2^19937 − 1. While a long period is not a guarantee of quality in a random number generator, short periods (such as the 2^32 common in many older software packages) can be problematic.

It is k-distributed to 32-bit accuracy for every 1 ≤ k ≤ 623 (see definition below).

It passes numerous tests for statistical randomness, including the Diehard tests.

Luckily for us C++11 Standard Library (which I believe should work on VS2010 and later C++/CLI) includes a Mersenne Twister function object that can be used with std::shuffle Please see this C++ documentation for more details. The C++ Standard Library reference provided earlier actually contains code that does this:

std::random_device rd;
std::mt19937 g(rd());

std::shuffle(v.begin(), v.end(), g);

The thing to note is that std::random_device produces non-deterministic (non repeatable) unsigned integers. We need non-deterministic data if we want to seed our Mersenne Twister (std::mt19937) PRNG with. This is similar in concept to seeding rand with srand(time(NULL)) (The latter not being an overly good source of randomness).

This looks all well and good but has one disadvantage when dealing with card shuffling. An unsigned integer on the Windows platform is 4 bytes (32 bits) and can store 2^32 values. This means there are only 4,294,967,296 possible starting points (seeds) therefore only that many ways to shuffle the deck. The problem is that there are 52! (52 factorial) ways to shuffle a standard 52 card deck. That happens to be 80658175170943878571660636856403766975289505440883277824000000000000 ways, which is far bigger than the number of unique ways we can get from setting a 32-bit seed.

Thankfully, Mersenne Twister can accept seeds between 0 and 2^19937-1. 52! is a big number but all combinations can be represented with a seed of 226 bits (or ~29 bytes). The Standard Library allow std::mt19937 to accept a seed up to 2^19937-1 (~624 bytes of data) if we so choose. But since we need only 226 bits the following code would allow us to create 29 bytes of non-deterministic data to be used as a suitable seed for std::mt19937:

// rd is an array to hold 29 bytes of seed data which covers the 226 bits we need */
std::array<unsigned char, 29> seed_data; 
std::random_device rd;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(rd));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));

// Set the seed  for Mersenne *using the 29 byte sequence*
std::mt19937 g(seq);  

Then all you need to do is call shuffle with code like:

std::shuffle(cards.begin(),cards.end(), g);

On Windows VC++/CLI you will get a warning that you'll want to suppress with the code above. So at the top of the file (before other includes) you can add this:

#define _SCL_SECURE_NO_WARNINGS 1
Press answered 12/10, 2014 at 21:34 Comment(12)
I think that using an array to hold 29 random bytes will make no difference. The std::mt19937 is a generator of 32-bit numbers, so the input to shuffle will again be limited to 32-bit numbers. No?Grandstand
@Grandstand . Consider what would happen if we set a seed 4 bytes long (32 bits). 2^32 is 4294967296 unique seeds, which means that there are only 4294967296 unique ways to shuffle the deck. Since there are 80658175170943878571660636856403766975289505440883277824000000000000 unique ways to shuffle a deck, 32-bit seed would only allow us to use a small fraction of the potential hands. A 29 byte random seed allows the possibility of every combination of shuffled deck appearing.Press
I understand that part and it is a nice detail indeed. But notice that shuffle knows nothing about how you seeded the std::mt19937 g. And since std::mt19937 returns 32 bits, then shuffle is "seeded" with 32bits. So the problem remains. (I am not aware of the inner workings of shuffle)Grandstand
Correct it doesn't. Each of the potential seeds creates their own unique random number sequence. It is 32-bits of that sequence we pull out at a time. Let us say we had only one seed available, then there would be only one way to shuffle the deck, since that same seed will always generate the same numbers. You want enough unique seeds so that every potential shuffled deck can be represented. Your players would probably not like the fact that they got dealt the same hand every game .Press
Same holds true for only giving your players and opportunity to see only ~4 billion hands out of the huge number that can actually be dealt.Press
Correct! So we agree that using an array of 29 random bytes will not change the fact that shuffle only gets a 32 bits input for randomnessGrandstand
You don't understand. Each seed generates a unique sequence of bytes (which we read 4 bytes at a time)Press
Your seed is not the random data that will be used, the seed sets the starting point for how that random data will be generatedPress
Yes, I know that. But the huge seed of 29 random bytes, only seeds the mt19937. The shuffle uses mt19937, i.e. 32bits. Unless I am missing something regarding how shuffle uses the 3rd argument.Grandstand
No, mt19937 starts at a point set by the seed, and from there shuffle will request as many bytes of random data needed from mt19937. Internally it can request as many bytes as it needs to complete the shuffle task. And it will request a lot more than one 32-bit value to do so. The seed just says we are giving the shuffle algorithm a chance to see every potential shuffled deck possible.Press
Let us continue this discussion in chat.Grandstand
@joelk : Thanks, it has been some time since I posted this, and I had to wonder how I ended up with that erroneous code. Appears I pasted the snippet out of the wrong test project. You are quite correct I had always intended for it to be std::mt19937 g(seq); not std::mt19937 g(rd());. ThanksPress

© 2022 - 2024 — McMap. All rights reserved.