Is there any legitimate use for Intel's RDRAND?

Asked 6/11, 2014 at 3:39 Answered 22/4, 2023 at 9:28

Solved random cryptography stream-cipher rdrand

Today I thought: well, even if there is great suspicion on RDRAND implementation of NIST SP 800-90A, it is still a hardware implementation of pseudo-random number generator (PRNG) that must be good enough for non-sensitive applications. So I thought of using it on my game instead of Mersenne Twister.

So, to see if there was any performance gain on using the instruction, I compared the time of the two following codes:

// test.cpp
#include <cstdio>

int main()
{
    unsigned int rnd = 0;
    for(int i = 0; i < 10000000; ++i) {
        __builtin_ia32_rdrand32_step(&rnd);
    }
    printf("%x\n", rnd);
}

and

//test2.cpp
#include <cstdio>
#include <random>

int main()
{
    unsigned int rnd = 0;
    __builtin_ia32_rdrand32_step(&rnd);
    std::mt19937 gen(rnd);
    for(int i = 0; i < 10000000; ++i) {
        rnd ^= gen();
    }
    printf("%x\n", rnd);
}

and by running the two I get:

$ time ./test
d230449a

real    0m0.361s
user    0m0.358s
sys     0m0.002s

$ time ./test2 
bfc4e472

real    0m0.051s
user    0m0.050s
sys     0m0.002s

So, Mersenne Twister is much faster than RDRAND on my CPU. Well, I was disappointed, ruled out from my game. But RDRAND is a cryptographically secure PRNG (CSPRNG), so it does much behind the scenes... more fair would be compare it to other CSPRNG. So I took my Rabbit implementation (plain translation of the RFC to C, no fancy tricks for performance), and wrote the following test:

// test3.cpp
#include <cstdio>

extern "C"
{
#include "rabbit.h"
}

int main()
{
    rabbit_state s;
    unsigned long long buf[2];
    __builtin_ia32_rdrand64_step(&buf[0]);
    __builtin_ia32_rdrand64_step(&buf[1]);
    rabbit_init_key(&s, (uint8_t*)&buf[0]);

    for(int i = 0; i < 10000000; ++i) {
        rabbit_extract(&s, (uint8_t*)&buf[0]);
    }
    printf("%llx\n", buf[0]);
}

And for my surprise, generating twice as much pseudo-random data as the first two of them, I got a better time than RDRAND:

$ time ./test3 
8ef9772277b70aba

real    0m0.344s
user    0m0.341s
sys     0m0.002s

All three were compiled with optimization enabled.

So, we have a widespread paranoia that RDRAND was made to embed NSA backdoors into everybody's software cryptography. Also we have at least one software CSPRNG faster than RDRAND, and the most widely used decent PRNG, Mersenne Twister, is much faster than RDRAND. Finally, we have open-source auditable software entropy pools, like /dev/random and /dev/urandom, that are not hidden behind twofold scrambler layers of AES, like RDRAND.

So, the question: should people be using RDRAND? Is there any legitimate use for it? Or should we stop using it altogether?

Exuberate answered 6/11, 2014 at 3:39 Comment(6)

Well one reason to use it might be that it is more convenient for non performance critical applications to use it rather than roll your own or import another dependency? – Sluggard 6/11, 2014 at 3:42

I thought of that, but those snippets relies both on compiler specific intrinsics and on a hardware feature. In a real scenario, you would want to use a library anyways, both to support any hardware and to encapsulate the non-portable assembly/intrinsics code. – Exuberate 6/11, 2014 at 3:46

@Ivella, Fair enough - then my answer would probably be refined to "A legitimate use is to use it while doing quick and dirty prototyping cos its easy" (which admittedly is a pretty narrow use-case!) – Sluggard 6/11, 2014 at 3:50

>Or should we stop using it altogether? Just a thought - If we are worried about the legitimate implementation of an instruction, I'm afraid we should consider other instructions too not just 'rdrand'. – Cobalt 8/2, 2017 at 9:14

@ShivendraMishra Most instructions are easy to verify if they are working or not (and everything breaks if they are not). Can you be sure rdrand is working properly as well as you can be sure add, mul or ever AES-NI is working properly? – Exuberate 16/2, 2017 at 21:46

Beware that __builtin_ia32_rdrand32_step(&rnd); doesn't use the result, and may benchmark artificially faster than if rnd was volatile, or if you were accumulating the result with result ^= rnd; – Nomothetic 1/12, 2017 at 3:11

As pointed out in the other answer, RDRAND is seeded with true randomness. In particular, it frequently reseeds its internal CSPRNG with 128 bits of hardware-generated randomness, guaranteeing a reseed at least once every 511 * 128 bits. See section 4.2.5 of this doc:

https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide

So in your examples, you used a single 128-bit seed to generate 10 million random draws from rabbit_extract. In the RDRAND version, you had the equivalent of 2.5 million 128-bit draws, meaning that the CSPRING was reseeded at least 2,500,000/511 = 4,892 times.

So instead of 128 bits of entropy going into your rabbit example, there were at least 4,892*128 = 626,176 bits of entropy going into the RDRAND example.

That's much, much more entropy than you're going to get in 0.361 seconds without hardware support. That could matter if you're doing stuff where lots of real randomness is important. One example is Shamir secret sharing of large quantities of data -- not sure if there are others.

So in conclusion -- it's not for speed, it's for high security. The question of whether it's backdoored is troubling, of course, but you can always XOR it with other sources, and at the very least it's not hurting you.

Boomkin answered 6/12, 2014 at 0:22 Comment(2)

Nope. Sect.4.2.5 is useless. You might as well read the section that says "Trust us, we're the good guys and wouldn't kid you. Honest." All Intel documentation must be discounted and any analysis performed independently and starting from scratch. Your calculations are based on insecure assumptions that might be fine for general games programming, but can't be relied on for cryptography /high security. – Messeigneurs 16/6, 2017 at 23:39

Sure, as I mentioned, it could be backdoored. But so could any other source of entropy you rely on -- cryptographic entropy is just a random event your adversary hasn't figured out how to spy on yet. That's why I suggested XORing your various sources of supposed entropy together. – Boomkin 18/6, 2017 at 19:17

RDRAND is not just a PRNG. It is a whitened TRNG that is FIPS compliant. The difference is that you can rely on RDRAND to contain quite a lot of actual entropy directly retrieved from the CPU. So the main use of RDRAND is to supply entropy to OS/libraries/applications.

The only other good way for applications to retrieve entropy is usually using an OS supplied entropy source such as /dev/random or /dev/urandom (which usually draws the entropy from /dev/random). However, that OS also requires to find the entropy somewhere. Usually tiny differences in disk and network access times are used for this (+ other semi-random input). These devices are not always present, and are not designed as sources of entropy; they are often not very good sources, nor are they very fast. So on systems that support it, RDRAND is often used as an entropy source for the cryptographically secure random number generator of the operating system.

With regards to speed, especially for games, it is completely valid to use a (non-secure) PRNG. If you want to have a reasonable random seed then seeding it with the result of RDRAND may be a good idea, although seeding it from the OS supplied RNG may be a more portable and even a more secure option (in case you don't fully trust Intel or the US).

Note that currently RDRAND is implemented using (AES) CTR_DRBG instead of a (less well analysed) stream cipher that was created for speed such as Rabbit, so it should come as no surprise that Rabbit is faster. Even more importantly, it also has to retrieve the entropy from the entropy source within the CPU before it can run.

Stuffed answered 6/11, 2014 at 14:30 Comment(2)

Great answer. Question: What is "whitening" of a rng? – Capello 22/5, 2019 at 2:7

A hardware entropy source generally doesn't deliver well distributed "true" entropy in the sense that every bit has one bit of entropy. For instance, you can have an entropy source that has a slight preference for 1 over 0. You can whiten these bits by e.g. getting two bits X and Y, and then perform X ^ ~ Y A DRBG is a rather strong way of performing whitening like that. – Stuffed 22/5, 2019 at 3:14

Is there any legitimate use for Intel's RDRAND?

Yes.

Consider a Monte-Carlo simulation. It has no cryptographic needs, so it does not matter if its backdoored by the NSA.

Or should we stop using it altogether?

We can't answer that. That's a confluence use cases, requirements and personal preferences.

... Also we have at least one software CSPRNG faster than RDRAND, and the most widely used decent PRNG..."

Mersenne Twister may be faster for a word at at time after initialization and without Twists because its returning a word from the state array. But I doubt its as fast as RDRAND for a continuous stream. I know RDRAND can achieve theoretical limits based on bus width in a continuous stream.

According to David Johnston of Intel (who designed the circuit), that's something like 800+ MB/s. See DJ's answer at What is the latency and throughput of the RDRAND instruction on Ivy Bridge?.

So, we have a widespread paranoia that RDRAND was made to embed NSA backdoors into everybody's software cryptography.

Paranoid folks have at least two choices. First, they can forgo using RDRAND and RDSEED. Second, they can use the output of RDRAND and RDSEED to seed another generator, and then use the output of the second generator. I believe the Linux kernel takes the second approach.

Kirkendall answered 12/10, 2015 at 20:54 Comment(7)

In a Monte-Carlo simulation I would definitely use the 7x faster Mersenne Twistter. – Exuberate 1/3, 2017 at 19:51

@Ivella - Citation, please. RDRAND does not underflow, so it can produce and sustain a machine word every clock for each core. A software implementation like Mersenne Twistter can't do the same. Every time Twist is called, the generator loses hundreds of cycles building a new state array. – Kirkendall 1/3, 2017 at 23:35

Well, original research. In the question I explained how I benchmarked both methods, and it shows that __builtin_ia32_rdrand32_step fails to deliver one 32 bit random word per clock tick by far. Feel free to run the two samples by yourself. – Exuberate 2/3, 2017 at 19:31

@Ivella - Your comments prompted me to start benchmarking RNGs in some common crypto libraries. See, for example, Crypto++ Issue 386: Add Random Number Generator benchmarks. I'm seeing wildly varying results. I ping'd David Johnston about RDRAND and RDSEED throughput (he designed the circuits at Intel). He suggested Colin King StressNG to test RDRAND and RDSEED. – Kirkendall 7/3, 2017 at 10:26

@Ivella: your benchmark runs rdrand 10M times before printing out one word. If I understand the implementation correctly, you should only need to run it once per word- running it multiple times doesn't make its output more random. This means it's 10M times faster than your benchmark reports, does it not? – Temperament 20/4, 2017 at 20:28

@Exuberate and jww: rdrand has a throughput of 1 per ~110 clock cycles on IvB, according to agner.org/optimize. Or on Skylake, one per ~460 clocks. So yes, you get a result every time you run it, but no, it's nowhere near one per core clock. As the designer of the HW and author of librdrand explains, each execution of the instruction takes a round-trip off the core to the shared DRNG unit, because there's one per package, not per core. – Nomothetic 1/12, 2017 at 3:2

But if you test wrong (e.g. overwriting the register value without using it), the CPU doesn't even wait for the result to arrive before executing another rdrand. (Again, see the answer I linked last question). Oh, you already linked that answer?? 800MB/s is far below 64 bits per clock. 64 bits per core clock is 24 GB/s for a single 3GHz core. – Nomothetic 1/12, 2017 at 3:5

There is an astrophysics research paper here (http://iopscience.iop.org/article/10.3847/1538-4357/aa7ede/meta;jsessionid=A9DA9DDB925E6522D058F3CEEC7D0B21.ip-10-40-2-120), (non-paywalled) version here (https://arxiv.org/abs/1707.02212) that gives a legitimate use for RdRand.

It examines the effects of RdRand on a Monte Carlo simulator, as an earlier post advised. But the author didn't find any statistical difference in the results that use or do not use RdRand. From the performance standpoint, it looks like the Mersenne Twister is much faster. I think Sections 2.2.1 and 5 have all of the details.

Vector answered 13/11, 2017 at 22:18 Comment(1)

The links are poor, better links: Intel® DRNG, Intel® DRNG Software Implementation Guide and Wikipedia RdRand. – Finned 13/11, 2017 at 22:33

True Randomness vs Pseudorandom.

rdrand is a hardware-source of true entropy, and at that a quite powerfull one (also it is not Intel-exclusive - AMD offers it the same - why do you single out Intel then? well, i'm concentrating on the implementation of Intel as they offer a bit more information and have the vastly higher throughput compared to even Zen2 [ryzen 3xxx])

Forget about the clockcyle-throughput - that is a misleading metric here as it is not really related. The throughput is limited by the roundtrip-delay for a single thread and by the actual hardware implementation running at 800 Mhz, has a fixed 8 cycles and can deliver a 64bit random value per transaction. Looking just at the clock-cycles taken by the thread running the code will very depending on the current clockrate - idling at 800 MHz it would seem like at was 6 times faster than running at 4.8 Ghz. But the delay between the cores and the DRNG is in the order of inter-Core-latency, and you have that twice, so about 80ns - with results in roughly 12MHz@8byte => ~100 MByte/s/Thread and maximum is 800 MB/s.

one other benefit of this: It is Not using up your CPU-Resources like most PRNG would do, so while other generators can have quite a bit higher throughput (1-2 orders of magnitude) they also require more resources. In situations were you need a lot of random numbers and are in a tight computationally heavy loop rdrand might actually give the better performance, but that of course is heavily dependent on the exact scenario and hardware.

If you just need a lot of random numbers for some simulation or need quasi-random numbers for games then rdrand is likely not the best choice. If you need true secure random number? Use it - and combine it with other sources of entropy if you are worried about some backdoors.

Andras answered 12/9, 2019 at 12:9 Comment(0)

On the matter of NSA back doors. The backdoors I dodged while designing the DRNG logic are:

The Dual-EC-DRNG. It was obviously stupid even before the Snowden leaks. I used the CTR-DRBG which has independently proven security properties and was the most efficient and amenable to security from timing attacks and side channel attacks.
The SP800-90A DFs. They are probable back doors. Look at the key construction and the ability to pull more entropy out than you can put in.
The FIPS140-2 CRNGT. An entropy lowering back door. ISO removed it from ISO/IEC 19790-2012 Which is the ISO version of FIPS140.

So that's three back doors dodged for the price of one. Yet only one of them is present in the public consciousness as a back door. You're welcome.

Stovall answered 22/4, 2023 at 9:28 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags