Quality of PostgreSQL's random() function?
Asked Answered
O

2

9

Let's say I'm creating a table foo with a column bar that should be a very large random integer.

CREATE TABLE foo (
    bar bigint DEFAULT round(((9223372036854775807::bigint)::double precision * random())) NOT NULL,
    baz text
);

Is this the best way to do this? Can anyone speak to the quality of PostgreSQL's random() function? Is the multiplication here masking the entropy?

Note that I do have good hardware entropy feeding into /dev/random.

Oily answered 22/3, 2012 at 3:37 Comment(0)
P
15

Postgresql random is based on their own portable implementation of POSIX erand48. It's a linear congruential PRNG in a 48 bit domain.

If you need something stronger look to the pg_crypto module's gen_random_bytes function which is used to produce cryptographically strong entropy.

Pyriphlegethon answered 22/3, 2012 at 5:8 Comment(4)
This is the beauty of open source!Keelia
Any more info about the multiplication portion of this question? Is the multiplication here masking the entropy?Oily
Your number looks like a max signed long (263-1), which is a larger domain than pg's random produces. An IEEE 754 double has only 53 bits of mantissa precision which is also smaller than your 63 bit domain. So short-answer, I think exploding the random to 63 bits is going to get sparse coverage in the low order bits and you'll get better coverage if you limit the multiply to the same space as the core PRNG (248).Pyriphlegethon
Note that I have just posted this example using the gen_random_bytes() function: dba.stackexchange.com/a/15884/6776Oily
F
1

Starting with version 15.0 (released Oct 2022), PostgreSQL uses xoroshiro128** as its PRNG algorithm, for producing the values for random() among other Postgres functions. The xoroshiro algo family was introduced in 2018 and, as of writing, is still quite state-of-the-art.

Regarding the multiplication question, and in reference to @dbenhur's comment, the new algorithm and its 64-bit output will produce better results for your approach, but you still have the double precision bottleneck, so you still won't maximize your 263−1 domain.

If you can relax the requirement of an integer column, especially if you don't need it to be the primary key, consider the simpler and more elegant alternative of using a double precision-type column to save random() values directly.

Fenestella answered 31/8, 2023 at 7:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.