Given a permutation's lexicographic number, is it possible to get any item in it in O(1)
Asked Answered
L

5

22

I want to know whether the task explained below is even theoretically possible, and if so how I could do it.

You are given a space of N elements (i.e. all numbers between 0 and N-1.) Let's look at the space of all permutations on that space, and call it S. The ith member of S, which can be marked S[i], is the permutation with the lexicographic number i.

For example, if N is 3, then S is this list of permutations:

S[0]: 0, 1, 2
S[1]: 0, 2, 1
S[2]: 1, 0, 2
S[3]: 1, 2, 0
S[4]: 2, 0, 1
S[5]: 2, 1, 0

(Of course, when looking at a big N, this space becomes very large, N! to be exact.)

Now, I already know how to get the permutation by its index number i, and I already know how to do the reverse (get the lexicographic number of a given permutation.) But I want something better.

Some permutations can be huge by themselves. For example, if you're looking at N=10^20. (The size of S would be (10^20)! which I believe is the biggest number I ever mentioned in a Stack Overflow question :)

If you're looking at just a random permutation on that space, it would be so big that you wouldn't be able to store the whole thing on your harddrive, let alone calculate each one of the items by lexicographic number. What I want is to be able to do item access on that permutation, and also get the index of each item. That is, given N and i to specify a permutation, have one function that takes an index number and find the number that resides in that index, and another function that takes a number and finds in which index it resides. I want to do that in O(1), so I don't need to store or iterate over each member in the permutation.

Crazy, you say? Impossible? That may be. But consider this: A block cipher, like AES, is essentially a permutation, and it almost accomplishes the tasks I outlined above. AES has a block size of 16 bytes, meaning that N is 256^16 which is around 10^38. (The size of S, not that it matters, is a staggering (256^16)!, or around 10^85070591730234615865843651857942052838, which beats my recent record for "biggest number mentioned on Stack Overflow" :)

Each AES encryption key specifies a single permutation on N=256^16. That permutation couldn't be stored whole on your computer, because it has more members than there are atoms in the solar system. But, it allows you item access. By encrypting data using AES, you're looking at the data block by block, and for each block (member of range(N)) you output the encrypted block, which the member of range(N) that is in the index number of the original block in the permutation. And when you're decrypting, you're doing the reverse (Finding the index number of a block.) I believe this is done in O(1), I'm not sure but in any case it's very fast.

The problem with using AES or any other block cipher is that it limits you to very specific N, and it probably only captures a tiny fraction of the possible permutations, while I want to be able to use any N I like, and do item access on any permutation S[i] that I like.

Is it possible to get O(1) item access on a permutation, given size N and permutation number i? If so, how?

(If I'm lucky enough to get code answers here, I'd appreciate if they'll be in Python.)

UPDATE:

Some people pointed out the sad fact that the permutation number itself would be so huge, that just reading the number would make the task non-feasible. Then, I'd like to revise my question: Given access to the factoradic representation of a permutation's lexicographic number, is it possible to get any item in the permutation in O(as small as possible)?

Lentic answered 6/5, 2014 at 18:25 Comment(20)
O(1) seems crazily ambitious indeed.Fine
I'd settle for "fast". I don't know the complexity of AES, but it's definitely fast enough.Lentic
I'm not sure I understand why you think a block cipher is a permutation. A cyphertext block doesn't need to have the same number of bits as the plaintext block that produced it, the cipher can be any N->N mapping.Polypoid
@Polypoid A block cipher could, in theory, output larger ciphertext blocks than it accepts plaintext blocks. But most common block ciphers, such as AES which is the example OP used, doesn't do that. Their input and output block size are the same, hence they trivially are a permutation. Moreover, differing input and output sizes are so obscure that for example Wikipedia doesn't even consider it.Cinquecento
You need lg n! = Theta(n log n) bits to represent all possible permutations on n elements. AES is possible only because it can't do all of them, just enough that it "looks random" in a precise technical sense.Hardpressed
Regarding the question: Note that for large N, even specifying N and i takes significant space (around log N, which for 10**20 is 67; this is larger than virtually any wordsize in the real world so being pedantic about this is somewhat justified).Cinquecento
@DavidEisenstat It's not quite that straightforward. We don't want to store all possible permutations, we don't even want one, we only want one element of one permutation. This might have better lower bounds: As an analogy, the ith k-bit number takes O(k) space, even though all k-bit numbers take O(2^k * k) bits.Cinquecento
@delnan You're still boned because the Kolmogorov complexity of a random permutation is, with high probability, within a constant number of bits of the bound that I quoted. The capability of representing an arbitrary permutation has to go.Hardpressed
@DavidEisenstat Quite possibly, I'm currently trying to formulate a convincing argument along those lines. But the reasoning in your first comment is not enough.Cinquecento
AES-256 can only use 2^256 different permutations, which is far less than the theoretically possible number of permutations. Or am I missing something?Fine
@Polypoid I understand your confusion; I didn't mean AES is a permutation in that it mixes up the bits with state 1 in each block. I mean, let's look at each block as a number between 0 and 256^16, and permute that to a different number in that range.Lentic
Also @delnan's comment still stands: Specifying i alone will take Theta(n log n) spaceFine
@NiklasB. You're right, my earlier claim of log N bits was false (I counted the number of bits in the number of bits, for whatever reason). N log N is about 1021 for N = 1020. This is rather impractical when one can't even store one permutation of length N.Cinquecento
@NiklasB. Fine, so let's say we're talking about sizes like N=10^10, where specifying i would take around 40 gigabytes, but still you'd like to get item access to each item in the permutation without going over the ~5,000,000,000 items that preceded it.Lentic
@RamRachum I disagree, enumerating the 10^10 items before it is asymptotically optimal because it is O(N), which is even less than the input size. You can just build up the permutation in memory, it will just take you another 40gigsFine
The size of the index is O(N log N), which means that it is asymptotically the same (within a constant factor) as the size of N<sup>N</sup>; in other words, you could use as an index the valid permutations amongst the total set of N-digit base-N numbers. Or to put it another way, the permutation itself, written out, is only a constant factor larger than the index. Perhaps that puts this question into perspective.Chain
I would be surprised if there were an algorithm with a good worst case for locally unranking permutations. You might be interested in this paper on succinct representations of permutations, which gives an alternative representation that is efficient in time and space.Hardpressed
@RamRachum check: mathblog.dk/…Pinson
@KhaledAKhunaifer I believe that this algorithm suffers from the same problem I pointed out to phil_20686 in my comments to his answer. It requires going over every item in the permutation before you get to the one you want.Lentic
@RamRachum If you check the link, in the comments to that post, it seem some has proposed some unproved ideas on itPinson
A
5

The secret to doing this is to "count in base factorial".

In the same way that 134 = 1*10^2+3*10 + 4, 134 = 5! + 2 * 3! + 2! => 10210 in factorial notation (include 1!, exclude 0!). If you want to represent N!, you will then need N^2 base ten digits. (For each factorial digit N, the maximum number it can hold is N). Up to a bit of confusion about what you call 0, this factorial representation is exactly the lexicographic number of a permutation.

You can use this insight to solve Euler Problem 24 by hand. So I will do that here, and you will see how to solve your problem. We want the millionth permutation of 0-9. In factorial representation we take 1000000 => 26625122. Now to convert that to the permutation, I take my digits 0,1,2,3,4,5,6,7,8,9, and The first number is 2, which is the third (it could be 0), so I select 2 as the first digit, then I have a new list 0,1,3,4,5,6,7,8,9 and I take the seventh number which is 8 etc, and I get 2783915604.

However, this assumes that you start your lexicographic ordering at 0, if you actually start it at one, you have to subtract 1 from it, which gives 2783915460. Which is indeed the millionth permutation of the numbers 0-9.

You can obviously reverse this procedure, and hence convert backwards and forwards easily between the lexiographic number and the permutation that it represents.

I am not entirely clear what it is that you want to do here, but understanding the above procedure should help. For example, its clear that the lexiographic number represents an ordering which could be used as the key in a hashtable. And you can order numbers by comparing digits left to right so once you have inserted a number you never have to work outs it factorial.

Audie answered 12/5, 2014 at 10:56 Comment(5)
That's the algorithm I currently use, but when you want to get the value in the middle of the permutation you have to go through all the values up to it, which is what I'd like to avoid.Lentic
I don't understand what you mean? Why would i have to go through all the values? I have just shown that you can generate the millionth permutation directly from its lexiographic ordering. Or given a lexiographic number i can directly generate its permutation. If you have a class that represents the base factorial representation of the ordering, you also never need more than N^2 base ten digits to store the number.Audie
If you are using the factoridic representation then simply use these as keys and do a bucket sort (bounded hash table) for O(1) retrieval?Audie
I didn't mean you need to go through all permutations to get the permutation you want, I meant that inside one permutation, if you want item i, you need to go through all the items between 0 and i-1. (Because you need to know which digits will be taken.)Lentic
Well, you only need to go through the digits in the factorial expansion, e.g. if I want to know where the 4th element (3) appears for my permutation 26625122, i go through it, and every time I get to a number less than 1 I subtract 1. e.g. the first element is 2 (which is the third number - could be zero), so I now look for the third element, ignore the two sixes as 7>3, and then the the 2 (3) is the same, so 3 appears in fourth position. For a permutaion of N elements will have N digits, so you can retrieve in linear time in N given the lexiographic number.Audie
B
4

Your question is a bit moot, because your input size for an arbitrary permutation index has size log(N!) (assuming you want to represent all possible permutations) which is Theta(N log N), so if N is really large then just reading the input of the permutation index would take too long, certainly much longer than O(1). It may be possible to store the permutation index in such a way that if you already had it stored, then you could access elements in O(1) time. But probably any such method would be equivalent to just storing the permutation in contiguous memory (which also has Theta(N log N) size), and if you store the permutation directly in memory then the question becomes trivial assuming you can do O(1) memory access. (However you still need to account for the size of the bit encoding of the element, which is O(log N)).

In the spirit of your encryption analogy, perhaps you should specify a small SUBSET of permutations according to some property, and ask if O(1) or O(log N) element access is possible for that small subset.

Bedfellow answered 6/5, 2014 at 19:23 Comment(0)
B
2

Edit:

I misunderstood the question, but it was not in waste. My algorithms let me understand: the factoradic representation of a permutation's lexicographic number is almost the same as the permutation itself. In fact the first digit of the factoradic representation is the same as the first element of the corresponding permutation (assuming your space consists of numbers from 0 to N-1). Knowing this there is not really a point in storing the index rather than the permutation itself . To see how to convert the lexicographic number into a permutation, read below. See also this wikipedia link about Lehmer code.

Original post:

In the S space there are N elements that can fill the first slot, meaning that there are (N-1)! elements that start with 0. So i/(N-1)! is the first element (lets call it 'a'). The subset of S that starts with 0 consists of (N-1)! elements. These are the possible permutations of the set N{a}. Now you can get the second element: its the i(%((N-1)!)/(N-2)!). Repeat the process and you got the permutation.

Reverse is just as simple. Start with i=0. Get the 2nd last element of the permutation. Make a set of the last two elements, and find the element's position in it (its either the 0th element or the 1st), lets call this position j. Then i+=j*2!. Repeat the process (you can start with the last element too, but it will always be the 0th element of the possibilities).

Java-ish pesudo code:

find_by_index(List N, int i){
    String str = "";
    for(int l = N.length-1; i >= 0; i--){
        int pos = i/fact(l);
        str += N.get(pos);
        N.remove(pos);
        i %= fact(l);
    }
    return str;
}

find_index(String str){
    OrderedList N;
    int i = 0;
    for(int l = str.length-1; l >= 0; l--){
        String item = str.charAt(l);
        int pos = N.add(item);
        i += pos*fact(str.length-l)
    }
    return i;
}

find_by_index should run in O(n) assuming that N is pre ordered, while find_index is O(n*log(n)) (where n is the size of the N space)

Byword answered 10/5, 2014 at 23:1 Comment(0)
S
0

After some research in Wikipedia, I desgined this algorithm:

def getPick(fact_num_list):
    """fact_num_list should be a list with the factorial number representation, 
    getPick will return a tuple"""
    result = [] #Desired pick
    #This will hold all the numbers pickable; not actually a set, but a list
    #instead
    inputset = range(len(fact_num_list)) 
    for fnl in fact_num_list:
        result.append(inputset[fnl])
        del inputset[fnl] #Make sure we can't pick the number again
    return tuple(result)

Obviously, this won't reach O(1) due the factor we need to "pick" every number. Due we do a for loop and thus, assuming all operations are O(1), getPick will run in O(n).

If we need to convert from base 10 to factorial base, this is an aux function:

import math

def base10_baseFactorial(number):
    """Converts a base10 number into a factorial base number. Output is a list
    for better handle of units over 36! (after using all 0-9 and A-Z)"""
    loop = 1
    #Make sure n! <= number
    while math.factorial(loop) <= number:
        loop += 1
    result = []
    if not math.factorial(loop) == number:
        loop -= 1 #Prevent dividing over a smaller number than denominator
    while loop > 0:
        denominator = math.factorial(loop)
        number, rem = divmod(number, denominator)
        result.append(rem)
        loop -= 1
    result.append(0) #Don't forget to divide to 0! as well!
    return result

Again, this will run in O(n) due to the whiles.

Summing all, the best time we can find is O(n).

PS: I'm not a native English speaker, so spelling and phrasing errors may appear. Apologies in advance, and let me know if you can't get around something.

Suttles answered 14/5, 2014 at 19:14 Comment(0)
H
0

All correct algorithms for accessing the kth item of a permutation stored in factoradic form must read the first k digits. This is because, regardless of the values of the other digits among the first k, it makes a difference whether an unread digit is a 0 or takes on its maximum value. That this is the case can be seen by tracing the canonical correct decoding program in two parallel executions.

For example, if we want to decode the third digit of the permutation 1?0, then for 100, that digit is 0, and for 110, that digit is 2.

Hardpressed answered 15/5, 2014 at 20:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.