Efficiently generating random bytes of data in C++11/14
Asked Answered
N

5

28

My requirement is to generate random bytes of data (not random numbers) aka uniformly distributed bits.

As such I was wondering what are the correct/efficient ways of doing this using C++11/14 random facilities. I've had a look around at the examples, but they all seem to focus on number generation (ints, floats etc)

Current solution I'm using is the following:

#include <vector>
#include <random>

int main()
{
   std::random_device rd;
   std::uniform_int_distribution<int> dist(0,255);
   std::vector<char> data(1000);
   for (char& d : data)
   {
      d = static_cast<char>(dist(rd) & 0xFF);
   }
   return 0;
}
Naldo answered 14/8, 2014 at 1:10 Comment(10)
Whats the difference between a random int, and four random btyes?Zeigler
Nothing per say, so is the best way to generate 4 or 8 bytes at a time - and copy into the buffer 4/8 bytes per generation?Naldo
That's what I would do. Then there's no need for a distribution (which adds processing overhead).Ascariasis
...and use the largest integral type available on the target platform.Trug
@Ascariasis CaptainOblivious: I would like to use random_device directly, but the size of the return type seem to be implementation defined.Naldo
@soandos: Who's to say that int is four bytes? The standard certainly doesn't.Girasol
@RobertAllanHenniganLeahy, True, but not so material here. The point is random int = multiple random bytesZeigler
@Gerdiner: The size is implementation defined, but just use sizeof() to know how many bytes it is...Ascariasis
@RobertAllanHenniganLeahy int_least32_t.Semiconductor
Also see Why Aren't std::uniform_int_distribution<uint8_t> and std::uniform_int_distribution<int8_t> Allowed?Malpighiaceous
N
18

Distributions take random bits and turn them into numbers. If you actually want random bits then you want to use an engine:

In particular, those requirements specify the algorithmic interface for types and objects that produce sequences of bits in which each possible bit value is uniformly likely.3

A single call to a URNG object is allowed to produce and deliver many (typically 32 or more) bits, returning these bits as a single packaged value of an unsigned integer type.4 N3847

random_device happens to be specified such that accessing uniformly distributed bits is easy:

std::random_device engine;
unsigned x = engine(); // sizeof(unsigned) * CHAR_BIT random bits

Note that other engines may not make it quite as easy to get uniformly random bits as random_device, due to returning fewer bits than their result_type can hold or even by effectively returning fractional bits.

If your concern is that unsigned's size is implementation defined and so random_device returns an implementation defined number of bits, you can write an adapter that either collects enough bits before giving them to you, or one that will give you just enough bits and cache the rest for your next request. (You can also do this to handle other engines which exhibit the previously mentioned issues.)

Notwithstanding answered 14/8, 2014 at 1:42 Comment(4)
The paper you quote is wrong (or at least misleading) on this point: URNGs are not required to fill their output with random bits, so this answer is not correct. To take an extreme example, if G::min() == 0 and G::max == 2, none of the output bits will have an equal probability of being 1 or 0. Real-life URNGs can and do take advantage of this license, so it is not safe to assume that you can get random bits from a URNG for exactly the reasons that are detailed in the paper you cite.Melodimelodia
@GeoffRomer I suppose it's fair to say it's misleading; you can't just concatonate the integers returned from engines and treat the result like a bitstream. Engines with non-power of two ranges can be viewed simply returning fractional bits and in such cases recovering a random bit stream takes a bit more work. It's probably better to simply use an engine that does have a power of two range, as engines which don't, don't provide any particular advantage in return for imposing that cost.Notwithstanding
@GeoffRomer It's also true that my answer is taking advantage of the fact that random_device is specified by C++ to fill its entire result value and not return any fractional bits, but I don't feel that there's anything unreasonable about such an implementation.Notwithstanding
Fair enough, but your answer gives the strong impression that this will work for any URNG, not just random_device.Melodimelodia
D
47

What you're looking for is the std::independent_bits_engine adaptor:

#include <vector>
#include <random>
#include <climits>
#include <algorithm>
#include <functional>

using random_bytes_engine = std::independent_bits_engine<
    std::default_random_engine, CHAR_BIT, unsigned char>;

int main()
{
    random_bytes_engine rbe;
    std::vector<unsigned char> data(1000);
    std::generate(begin(data), end(data), std::ref(rbe));
}

Note that the accepted answer is not strictly correct in a general case – random engines produce unsigned values belonging to a range [min(), max()], which doesn't necessarily cover all possible values of the result type (for instance, std::minstd_rand0::min() == 1) and thus you may get random bytes that are not uniformly distributed if using an engine directly. However, for std::random_device the range is [std::numeric_limits<result_type>::min(), std::numeric_limits<result_type>::max()], so this particular engine would also work well without the adaptor.

Dubious answered 12/2, 2015 at 23:48 Comment(4)
Note: as @robert-allan-hennigan-leahy observed in another answer, random_bytes_engine doesn't technically support unsigned char, but this is probably a defect in the standard.Melodimelodia
What's a good way to seed std::independent_bits_engine<> from /dev/random?Gradation
@AndreasYankopolus - that's a good question which could standalone on SO.Prairie
Note that this is very inefficient. Independent bits engine will generate a new random for each byte, but the random it generates probably fills 32 or 64 bits, so this is 4 or 8x the required underlying rng engine calls. You'd basically have to reimplement independent_bits_engine in another form to do better, which is hard.Shuttle
N
18

Distributions take random bits and turn them into numbers. If you actually want random bits then you want to use an engine:

In particular, those requirements specify the algorithmic interface for types and objects that produce sequences of bits in which each possible bit value is uniformly likely.3

A single call to a URNG object is allowed to produce and deliver many (typically 32 or more) bits, returning these bits as a single packaged value of an unsigned integer type.4 N3847

random_device happens to be specified such that accessing uniformly distributed bits is easy:

std::random_device engine;
unsigned x = engine(); // sizeof(unsigned) * CHAR_BIT random bits

Note that other engines may not make it quite as easy to get uniformly random bits as random_device, due to returning fewer bits than their result_type can hold or even by effectively returning fractional bits.

If your concern is that unsigned's size is implementation defined and so random_device returns an implementation defined number of bits, you can write an adapter that either collects enough bits before giving them to you, or one that will give you just enough bits and cache the rest for your next request. (You can also do this to handle other engines which exhibit the previously mentioned issues.)

Notwithstanding answered 14/8, 2014 at 1:42 Comment(4)
The paper you quote is wrong (or at least misleading) on this point: URNGs are not required to fill their output with random bits, so this answer is not correct. To take an extreme example, if G::min() == 0 and G::max == 2, none of the output bits will have an equal probability of being 1 or 0. Real-life URNGs can and do take advantage of this license, so it is not safe to assume that you can get random bits from a URNG for exactly the reasons that are detailed in the paper you cite.Melodimelodia
@GeoffRomer I suppose it's fair to say it's misleading; you can't just concatonate the integers returned from engines and treat the result like a bitstream. Engines with non-power of two ranges can be viewed simply returning fractional bits and in such cases recovering a random bit stream takes a bit more work. It's probably better to simply use an engine that does have a power of two range, as engines which don't, don't provide any particular advantage in return for imposing that cost.Notwithstanding
@GeoffRomer It's also true that my answer is taking advantage of the fact that random_device is specified by C++ to fill its entire result value and not return any fractional bits, but I don't feel that there's anything unreasonable about such an implementation.Notwithstanding
Fair enough, but your answer gives the strong impression that this will work for any URNG, not just random_device.Melodimelodia
G
6

To answer your question: You can't.

The standard does not allow std::uniform_int_distribution to be templated on char, signed char, or unsigned char. Some believe that this is a defect in the standard, but it is the case.

You can simply template std::uniform_int_distribution on unsigned short, and set its min/max range to std::numeric_limits<unsigned char>::min() and std::numeric_limits<unsigned char>::max(), and then simply assign the result to an unsigned char.

From the standard:

Throughout this subclause 26.5, the effect of instantiating a template:

[...]

e) that has a template type parameter named IntType is undefined unless the corresponding template argument is cv-unqualified and is one of short, int, long, long long, unsigned short, unsigned int, unsigned long, or unsigned long long.

§26.5.1.1 [rand.req.genl]

Moreover:

You should use std::mt19937 to actually generate your random bytes. std::random_device is liable to be slow, and likely produces entropy with statistical properties (i.e. suitability for use in cryptography) that you don't need.

That said, you will need to seed your std::mt19937. You can do this with a std::random_device and a std::seed_seq.

Note that if you don't use a std::seed_seq to seed your std::mt19937, your std::mt19937 will be left with many, many zeroes in its internal state, and it will therefore take it quite a while to "warm up".

For more information on "warm up", see here.

Girasol answered 14/8, 2014 at 1:16 Comment(4)
I personally would have pointed out the undefined behavior and given Bryan time to fix it before downvoting, it was not an horrible answer, but that is just me.Malpighiaceous
I'd like to see this defect fixed. Was it already reported/discussed somewhere?Am
This defect has been reported as LWG 2326.Melodimelodia
... the new link to the closed defect. It was decided that it was a feature request.Prairie
S
0

I think the key to efficiency is to get the maximum number of values from every random number that is generated. Here's a small tweak to your original code that does that.

#include <cstdint>
#include <vector>
#include <random>

int main()
{
   std::random_device rd;
   std::uniform_int_distribution<uint32_t> dist(0,0xFFFFFFFFu);
   std::vector<char> data(1000);
   int offset = 0;
   uint32_t bits = 0;
   for (char& d : data)
   {
      if (offset == 0)
         bits = dist(rd);
      d = static_cast<char>(bits & 0xFF);
      bits >>= 8;
      if (++offset >= 4)
         offset = 0;
   }
   return 0;
}

If getting 4 bytes from every 32 bits doesn't work, try 3 bytes from 24 bits.

Sickle answered 12/8, 2020 at 22:47 Comment(0)
N
-1

Here's how I did it:

// Call srand(<some_number>) first if you want reproducible results
char oneRandomChar() {
    int x = rand();
    return (char) (x & 255);
}

As other folks have mentioned, you're drawing too sharp a distinction between a number and a byte. Of course it's true that you can interpret an int as a register and vice versa, but in hardware there is no distinction. This is one of those situations where you should focus on the benefits of being close to the metal. There's absolutely a time and place to build more specifics into the typing system (Haskell) but this is not one of them.

Numeral answered 18/7, 2021 at 17:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.