Create a random permutation of 1..N in constant space
Asked Answered
G

7

27

I am looking to enumerate a random permutation of the numbers 1..N in fixed space. This means that I cannot store all numbers in a list. The reason for that is that N can be very large, more than available memory. I still want to be able to walk through such a permutation of numbers one at a time, visiting each number exactly once.

I know this can be done for certain N: Many random number generators cycle through their whole state space randomly, but entirely. A good random number generator with state size of 32 bit will emit a permutation of the numbers 0..(2^32)-1. Every number exactly once.

I want to get to pick N to be any number at all and not be constrained to powers of 2 for example. Is there an algorithm for this?

Georgie answered 7/4, 2012 at 13:3 Comment(7)
How random must it be? For example, could the enumeration start from the same state, and generate the same sequence like a 'normal' random number generator, or must it be different each time?Genista
I've just discovered this article: en.wikipedia.org/wiki/Pseudorandom_permutation So this process of using a function that maps keys to permutations is called ‘pseudorandom permutation’, and the question is how to select/implement/use an algorithm that implements such a function. The article also mentions the link between ideal block ciphers and ideal pseudorandom permutation.Endarch
Slightly (very) dated but I have a solution for you that doesn't involve "tossing stuff away just be because we didn't get what we want" iif N is prime. I'm very hesitant to post it (as I'm still working on a CSRNG based on the concept) but will do so as an answer but if you're still interested (and the conditions described above match) I'd be willing.Alcot
@Alcot that would be a great solution since there are so many primes that we'd have throw away very little for any N.Georgie
@Georgie Let me clarify... it only works if N is prime. Not if you're looking to brute force something between two primes. In other words, it's not a solution to help you find permutations of any set, only unique sets that are prime in count. (Hence, if-and-only-if.)Alcot
@Alcot well, we can throw away generated numbers that exceed our maximum. The waste will be small, though, since we should be able to find a prime that is fairly close to the maximum. In my comment I confused the two numbers by calling both N.Georgie
@Georgie Alright. But understand if you're doing this for a secure purpose such as encryption or what-not, you're introducing timing attacks. Answer follows shortly.Alcot
R
11

The easiest way is probably to just create a full-range PRNG for a larger range than you care about, and when it generates a number larger than you want, just throw it away and get the next one.

Another possibility that's pretty much a variation of the same would be to use a linear feedback shift register (LFSR) to generate the numbers in the first place. This has a couple of advantages: first of all, an LFSR is probably a bit faster than most PRNGs. Second, it is (I believe) a bit easier to engineer an LFSR that produces numbers close to the range you want, and still be sure it cycles through the numbers in its range in (pseudo)random order, without any repetitions.

Without spending a lot of time on the details, the math behind LFSRs has been studied quite thoroughly. Producing one that runs through all the numbers in its range without repetition simply requires choosing a set of "taps" that correspond to an irreducible polynomial. If you don't want to search for that yourself, it's pretty easy to find tables of known ones for almost any reasonable size (e.g., doing a quick look, the wikipedia article lists them for size up to 19 bits).

If memory serves, there's at least one irreducible polynomial of ever possible bit size. That translates to the fact that in the worst case you can create a generator that has roughly twice the range you need, so on average you're throwing away (roughly) every other number you generate. Given the speed an LFSR, I'd guess you can do that and still maintain quite acceptable speed.

Rodman answered 7/4, 2012 at 13:49 Comment(2)
According to en.wikipedia.org/wiki/Maximum_length_sequence , LFSRs always map 0 onto 0 and the maximum possible length of the sequence is 2<sup>n</sup> - 1 where n is the number of bits of the register. Therefore, the number of possible permutations is limited to (2<sup>n</sup> - 1)! at most. Maximum randomness is achieved when a random key is input to a function that uniformly maps keys to the (2<sup>n</sup>)! possible permutations.Endarch
It's important to realise that when using an lfsr with hard-coded taps, the order is fixed and the seed only chooses where in that fixed order to start and end.Elizbeth
S
11

One way to do it would be

  1. Find a prime p larger than N, preferably not much larger.
  2. Find a primitive root of unity g modulo p, that is, a number 1 < g < p such that g^k ≡ 1 (mod p) if and only if k is a multiple of p-1.
  3. Go through g^k (mod p) for k = 1, 2, ..., ignoring the values that are larger than N.

For every prime p, there are φ(p-1) primitive roots of unity, so it works. However, it may take a while to find one. Finding a suitable prime is much easier in general.

For finding a primitive root, I know nothing substantially better than trial and error, but one can increase the probability of a fast find by choosing the prime p appropriately.

Since the number of primitive roots is φ(p-1), if one randomly chooses r in the range from 1 to p-1, the expected number of tries until one finds a primitive root is (p-1)/φ(p-1), hence one should choose p so that φ(p-1) is relatively large, that means that p-1 must have few distinct prime divisors (and preferably only large ones, except for the factor 2).

Instead of randomly choosing, one can also try in sequence whether 2, 3, 5, 6, 7, 10, ... is a primitive root, of course skipping perfect powers (or not, they are in general quickly eliminated), that should not affect the number of tries needed greatly.

So it boils down to checking whether a number x is a primitive root modulo p. If p-1 = q^a * r^b * s^c * ... with distinct primes q, r, s, ..., x is a primitive root if and only if

x^((p-1)/q) % p != 1
x^((p-1)/r) % p != 1
x^((p-1)/s) % p != 1
...

thus one needs a decent modular exponentiation (exponentiation by repeated squaring lends itself well for that, reducing by the modulus on each step). And a good method to find the prime factor decomposition of p-1. Note, however, that even naive trial division would be only O(√p), while the generation of the permutation is Θ(p), so it's not paramount that the factorisation is optimal.

Siberson answered 7/4, 2012 at 13:42 Comment(8)
That sounds right. Any idea how I can easily get a suitable prime root? My math is not up to the task, by far.Georgie
Ah, that is the tricky part. I know no really good way that's guaranteed to be fast, but I can offer something to increase the probability of finding one fast. I'll edit, give me a few minutes (more than a few, probably).Siberson
For prime p there are exactly phi(p-1) different primitive roots, so just trying to pick a random one will work, by mathworld.wolfram.com/PrimitiveRoot.htmlSturdivant
Also to increase randomness you can select the starting power k0 by random, i.e. dont go through g^k (mod p), but through g^(k+k0).Sturdivant
Ah, yes, picking a random starting point is a good idea, @kilotaras.Siberson
Regarding edit: trying in sequence will decrease randomness. You want to have one of ~n! possible permutations. Choosing only starting point randomly will leave us with only n choices.Sturdivant
From en.wikipedia.org/wiki/Primitive_root_modulo_n , “Curiously, permutations created in this way (and their circular shifts) have been shown to be Costas arrays.”. This means that the permutations that are produced are only randomly chosen from a subset of all possible permutations. This reduces randomness, which may or may not be important for the application. I guess that algorithms that have an equal distribution from keys to permutations are perfect candidates for a block cipher. Conversely, this leads me to believe that only by using a block cipher can randomness be maximised.Endarch
+1 This is very similar to this S.O. answer generate seemingly random unique numeric ID in SQL Server. I have used this method to answer Generate different random time in the given interval and Generating a random, non-repeating sequence of all integers in .NET :-)Antonomasia
R
11

The easiest way is probably to just create a full-range PRNG for a larger range than you care about, and when it generates a number larger than you want, just throw it away and get the next one.

Another possibility that's pretty much a variation of the same would be to use a linear feedback shift register (LFSR) to generate the numbers in the first place. This has a couple of advantages: first of all, an LFSR is probably a bit faster than most PRNGs. Second, it is (I believe) a bit easier to engineer an LFSR that produces numbers close to the range you want, and still be sure it cycles through the numbers in its range in (pseudo)random order, without any repetitions.

Without spending a lot of time on the details, the math behind LFSRs has been studied quite thoroughly. Producing one that runs through all the numbers in its range without repetition simply requires choosing a set of "taps" that correspond to an irreducible polynomial. If you don't want to search for that yourself, it's pretty easy to find tables of known ones for almost any reasonable size (e.g., doing a quick look, the wikipedia article lists them for size up to 19 bits).

If memory serves, there's at least one irreducible polynomial of ever possible bit size. That translates to the fact that in the worst case you can create a generator that has roughly twice the range you need, so on average you're throwing away (roughly) every other number you generate. Given the speed an LFSR, I'd guess you can do that and still maintain quite acceptable speed.

Rodman answered 7/4, 2012 at 13:49 Comment(2)
According to en.wikipedia.org/wiki/Maximum_length_sequence , LFSRs always map 0 onto 0 and the maximum possible length of the sequence is 2<sup>n</sup> - 1 where n is the number of bits of the register. Therefore, the number of possible permutations is limited to (2<sup>n</sup> - 1)! at most. Maximum randomness is achieved when a random key is input to a function that uniformly maps keys to the (2<sup>n</sup>)! possible permutations.Endarch
It's important to realise that when using an lfsr with hard-coded taps, the order is fixed and the seed only chooses where in that fixed order to start and end.Elizbeth
C
4

Another way to do this is with a block cipher; see this blog post for details.

The blog posts links to the paper Ciphers with Arbitrary Finite Domains which contains a bunch of solutions.

Coniine answered 11/4, 2012 at 5:32 Comment(7)
This will work. For generating a 27-bit random number I'd have to find some exotic cipher with that state size I guess. Or throw away 31/32 of the generated ciphertexts.Georgie
@Georgie The article I linked to describes how to do this. You can take an existing cipher and reduce its block size.Coniine
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes.Christabella
I'm not upvoting this due to lack of effort, but the linked article (and the PDF that that article references) is actually very useful and should probably be the accepted answer. To address the concern of hyperlink-rot mentioned by @ccjmne, here is an archived copy of the page: web.archive.org/web/20130517173425/http://blog.notdot.net/2007/…Endarch
@JamesHaigh If it helps, I wrote the article too.Coniine
Ah, yes it does help. Your article is excellent. I have now upvoted. Yes, it is good not to have to repeat oneself – if only the Web was layered atop of a peer-to-peer network such that data redundancy is consistent and useful links never brake.Endarch
I implemented this by creating my own block cipher that is parameterized by the number of bits to use. The cipher is a sequence of rounds each containing an addition with the key, a rotation and an xor with the key. These operations can be made to observe any number of bits easily. Clearly this "cipher" is not secure but it's seemingly random enough. It could be strengthened significantly by using round keys made from a high-strength cipher or seeded PRNG.Georgie
A
3

Consider the prime 3. To fully express all possible outputs, think of it this way...

bias + step mod prime

The bias is just an offset bias. step is an accumulator (if it's 1 for example, it would just be 0, 1, 2 in sequence, while 2 would result in 0, 2, 4) and prime is the prime number we want to generate the permutations against.

For example. A simple sequence of 0, 1, 2 would be...

0 + 0 mod 3 = 0
0 + 1 mod 3 = 1
0 + 2 mod 3 = 2

Modifying a couple of those variables for a second, we'll take bias of 1 and step of 2 (just for illustration)...

1 + 2 mod 3 = 0
1 + 4 mod 3 = 2
1 + 6 mod 3 = 1

You'll note that we produced an entirely different sequence. No number within the set repeats itself and all numbers are represented (it's bijective). Each unique combination of offset and bias will result in one of prime! possible permutations of the set. In the case of a prime of 3 you'll see that there are 6 different possible permuations:

0,1,2
0,2,1
1,0,2
1,2,0
2,0,1
2,1,0

If you do the math on the variables above you'll not that it results in the same information requirements...

1/3! = 1/6 = 1.66..

... vs...

1/3 (bias) * 1/2 (step) => 1/6 = 1.66..

Restrictions are simple, bias must be within 0..P-1 and step must be within 1..P-1 (I have been functionally just been using 0..P-2 and adding 1 on arithmetic in my own work). Other than that, it works with all prime numbers no matter how large and will permutate all possible unique sets of them without the need for memory beyond a couple of integers (each technically requiring slightly less bits than the prime itself).

Note carefully that this generator is not meant to be used to generate sets that are not prime in number. It's entirely possible to do so, but not recommended for security sensitive purposes as it would introduce a timing attack.

That said, if you would like to use this method to generate a set sequence that is not a prime, you have two choices.

First (and the simplest/cheapest), pick the prime number just larger than the set size you're looking for and have your generator simply discard anything that doesn't belong. Once more, danger, this is a very bad idea if this is a security sensitive application.

Second (by far the most complicated and costly), you can recognize that all numbers are composed of prime numbers and create multiple generators that then produce a product for each element in the set. In other words, an n of 6 would involve all possible prime generators that could match 6 (in this case, 2 and 3), multiplied in sequence. This is both expensive (although mathematically more elegant) as well as also introducing a timing attack so it's even less recommended.

Lastly, if you need a generator for bias and or step... why don't you use another of the same family :). Suddenly you're extremely close to creating true simple-random-samples (which is not easy usually).

Alcot answered 26/3, 2017 at 3:28 Comment(3)
I like this method since it is so simple to implement. But there is no hard security in this, right? Just clarifying, the question does not require security.Georgie
The input information = permutation information doesn't seem to generalize to permutations of more than 3 elements; for example for n=5, 1/5 (bias) * 1/4 (step) = 1/20 != 1/5! = 1/120. I don't think you can specify a permutation on n >3 elements using two numbers no greater than n.Lungan
1/6 does not equal 1.66.. is this a typo?Crankle
E
2

The fundamental weakness of LCGs (x=(x*m+c)%b style generators) is useful here.

If the generator is properly formed then x%f is also a repeating sequence of all values lower than f (provided f if a factor of b).

Since bis usually a power of 2 this means that you can take a 32-bit generator and reduce it to an n-bit generator by masking off the top bits and it will have the same full-range property.

This means that you can reduce the number of discard values to be fewer than N by choosing an appropriate mask.

Unfortunately LCG Is a poor generator for exactly the same reason as given above.

Also, this has exactly the same weakness as I noted in a comment on @JerryCoffin's answer. It will always produce the same sequence and the only thing the seed controls is where to start in that sequence.

Elizbeth answered 15/9, 2017 at 17:23 Comment(0)
M
0

Here's some SageMath code that should generate a random permutation the way Daniel Fischer suggested:

def random_safe_prime(lbound):
    while True:
        q = random_prime(lbound, lbound=lbound // 2)
        p = 2 * q + 1
        if is_prime(p):
            return p, q


def random_permutation(n):
    p, q = random_safe_prime(n + 2)

    while True:
        r = randint(2, p - 1)
        if pow(r, 2, p) != 1 and pow(r, q, p) != 1:
            i = 1
            while True:
                x = pow(r, i, p)
                if x == 1:
                    return

                if 0 <= x - 2 < n:
                    yield x - 2

                i += 1
Makings answered 9/3, 2018 at 13:18 Comment(0)
S
0

Take some number k < N, that is coprime with N. Then the sequence 0, k, 2k, 3k, ..., (N-1)k (mod N) goes through each number from 0 to N-1 exactly once.

A good pick for k is something close to N/phi (where phi is the golden ratio). This will make each term be "maximally far away" from all the previous terms, which can look pretty random to the untrained eye.

This idea is known as "Fibonacci hashing".

Samsara answered 20/5, 2023 at 1:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.