One-to-one integer mapping function
Asked Answered
G

6

12

We are using MySQL and developing an application where we'd like the ID sequence not to be publicly visible... the IDs are hardly top secret and there is no significant issue if someone indeed was able to decode them.

So, a hash is of course the obvious solution, we are currently using MD5... 32bit integers go in, and we trim the MD5 to 64bits and then store that. However, we have no idea how likely collisions are when you trim like this (especially since all numbers come from autoincrement or the current time). We currently check for collisions, but since we may be inserting 100.000 rows at once the performance is terrible (can't bulk insert).

But in the end, we really don't need the security offered by the hashes and they consume unnecessary space and also require an additional index... so, is there any simple and good enough function/algorithm out there that guarantees one-to-one mapping for any number without obvious visual patterns for sequential numbers?

EDIT: I'm using PHP which does not support integer arithmetic by default, but after looking around I found that it could be cheaply replicated with bitwise operators. Code for 32bit integer multiplication can be found here: http://pastebin.com/np28xhQF

Gap answered 2/9, 2011 at 12:48 Comment(4)
There are infinitely many functions that guarantee a 1:1 mapping.Pita
@Wooble, well then you should be able to answer the question quite easily I suppose ;-)Kezer
The tricky part is that, now when you asked the question here on SO, whatever answers we provide needs to be resistent also modulo the answers we give here ;-)Kezer
@Pita We're talking about 32 bit integers, so there are only (2^32)! 1:1 mappings.Pleistocene
P
9

You could simply XOR with 0xDEADBEEF, if that's good enough.

Alternatively multiply by an odd number mod 2^32. For the inverse mapping just multiply by the multiplicative inverse

Example: n = 2345678901; multiplicative inverse (mod 2^32): 2313902621 For the mapping just multiply by 2345678901 (mod 2^32):

1 --> 2345678901 2 --> 396390506

For the inverse mapping, multiply by 2313902621.

Pleistocene answered 2/9, 2011 at 12:55 Comment(15)
Doesn't both of those approaches give an obvious pattern just by looking at f(0), f(1), f(2)?Kezer
@Kezer true, but I understand security is not a concern here. Only the OP can decide, whether that's "good enough".Pleistocene
@aioobe: Yes, but as OP said, it doesn't have to be unbeatable.Albina
And the second approach doesn't give an obvious pattern if the number is big enough.Pleistocene
XOR does give rather predictable patterns I believe, although it can help to "randomize" I guess. Anyway, regarding multiplcation by an odd number... that should work given a really big number I guess... but that would entail working with really big numbers too (and makes going backwards a bit more costly? or?). Although, I'm thinking, a simple solution could perhaps be to just split the value up into 4x 1byte ... and then have 4 separate arrays that just scramble the values separately.Gap
@Gap just pick a 32 bit integer and calculate its inverse once. At runtime, you only have to do a 32 bit multiplication.Pleistocene
@Pleistocene ah of course, totally forgot about that... although I'm not exactly sure how one calculates the inverse, unless I misunderstand you (so that one can also recover the original integer).Gap
@Pleistocene ah, thank you very much and it would probably do perfectly... although I had totally forgotten (and thus sadly not mentioned) that PHP does not support integer math, it always reverts to floats it seems. So while your method absolutely works, the accuracy breaks it for PHP.Gap
@Nicholas Wilson I'm assuming that even math queries are sent to the server... so while you are right in the general case, doing 100.000+ basic math queries doesn't sound quite so nice (even if one could bunch them all together).Gap
You could. When you insert, set the hash column to MOD(ID*num). It'll do the calculation for you at insert time.Rocketry
Ah very true Nicholas, although it would all be bit more cumbersome to work with. Anyways! I worked around the problem in PHP and implemented integer multiplication with bitwise operators and now have a reasonably fast implementation that works well enough for my needs (source code is in the main post). I also do a simple XOR for good measure.Gap
AFAIK this won't necessarily give a bijection because 32 isn't prime.Hulett
@Alexei I don't get your point. Sure 32 isn't prime, so what? Any odd number has a multiplicative inverse mod (2^32) which yields an inverse mapping.Pleistocene
@Henrik, sorry, I missed that odd part :)Hulett
What's so special with 0xDEADBEEF?Nullification
M
5

If you want to ensure a 1:1 mapping then use an encryption (i.e. a permutation), not a hash. Encryption has to be 1:1 because it can be decrypted.

If you want 32 bit numbers then use Hasty Pudding Cypher or just write a simple four round Feistel cypher.

Here's one I prepared earlier:

import java.util.Random;

/**
 * IntegerPerm is a reversible keyed permutation of the integers.
 * This class is not cryptographically secure as the F function
 * is too simple and there are not enough rounds.
 *
 * @author Martin Ross
 */
public final class IntegerPerm {
    //////////////////
    // Private Data //
    //////////////////

    /** Non-zero default key, from www.random.org */
    private final static int DEFAULT_KEY = 0x6CFB18E2;

    private final static int LOW_16_MASK = 0xFFFF;
    private final static int HALF_SHIFT = 16;
    private final static int NUM_ROUNDS = 4;

    /** Permutation key */
    private int mKey;

    /** Round key schedule */
    private int[] mRoundKeys = new int[NUM_ROUNDS];

    //////////////////
    // Constructors //
    //////////////////

    public IntegerPerm() { this(DEFAULT_KEY); }

    public IntegerPerm(int key) { setKey(key); }

    ////////////////////
    // Public Methods //
    ////////////////////

    /** Sets a new value for the key and key schedule. */
    public void setKey(int newKey) {
        assert (NUM_ROUNDS == 4) : "NUM_ROUNDS is not 4";
        mKey = newKey;

        mRoundKeys[0] = mKey & LOW_16_MASK;
        mRoundKeys[1] = ~(mKey & LOW_16_MASK);
        mRoundKeys[2] = mKey >>> HALF_SHIFT;
        mRoundKeys[3] = ~(mKey >>> HALF_SHIFT);
    } // end setKey()

    /** Returns the current value of the key. */
    public int getKey() { return mKey; }

    /**
     * Calculates the enciphered (i.e. permuted) value of the given integer
     * under the current key.
     *
     * @param plain the integer to encipher.
     *
     * @return the enciphered (permuted) value.
     */
    public int encipher(int plain) {
        // 1 Split into two halves.
        int rhs = plain & LOW_16_MASK;
        int lhs = plain >>> HALF_SHIFT;

        // 2 Do NUM_ROUNDS simple Feistel rounds.
        for (int i = 0; i < NUM_ROUNDS; ++i) {
            if (i > 0) {
                // Swap lhs <-> rhs
                final int temp = lhs;
                lhs = rhs;
                rhs = temp;
            } // end if
            // Apply Feistel round function F().
            rhs ^= F(lhs, i);
        } // end for

        // 3 Recombine the two halves and return.
        return (lhs << HALF_SHIFT) + (rhs & LOW_16_MASK);
    } // end encipher()

    /**
     * Calculates the deciphered (i.e. inverse permuted) value of the given
     * integer under the current key.
     *
     * @param cypher the integer to decipher.
     *
     * @return the deciphered (inverse permuted) value.
     */
    public int decipher(int cypher) {
        // 1 Split into two halves.
        int rhs = cypher & LOW_16_MASK;
        int lhs = cypher >>> HALF_SHIFT;

        // 2 Do NUM_ROUNDS simple Feistel rounds.
        for (int i = 0; i < NUM_ROUNDS; ++i) {
            if (i > 0) {
                // Swap lhs <-> rhs
                final int temp = lhs;
                lhs = rhs;
                rhs = temp;
            } // end if
            // Apply Feistel round function F().
            rhs ^= F(lhs, NUM_ROUNDS - 1 - i);
        } // end for

        // 4 Recombine the two halves and return.
        return (lhs << HALF_SHIFT) + (rhs & LOW_16_MASK);
    } // end decipher()

    /////////////////////
    // Private Methods //
    /////////////////////

    // The F function for the Feistel rounds.
    private int F(int num, int round) {
        // XOR with round key.
        num ^= mRoundKeys[round];
        // Square, then XOR the high and low parts.
        num *= num;
        return (num >>> HALF_SHIFT) ^ (num & LOW_16_MASK);
    } // end F()

} // end class IntegerPerm
Moncada answered 2/9, 2011 at 15:8 Comment(1)
Very true, however, Henrik's solution is enough for my needs and also reasonably fast in PHP so that's what I'll go with. But indeed, encryption would have been the best solution.Gap
P
2

Do what Henrik said in his second suggestion. But since these values seem to be used by people (else you wouldn't want to randomize them). Take one additional step. Multiply the sequential number by a large prime and reduce mod N where N is a power of 2. But choose N to be 2 bits smaller than you can store. Next, multiply the result by 11 and use that. So we have:

Hash = ((count * large_prime) % 536870912) * 11

The multiplication by 11 protects against most data entry errors - if any digit is typed wrong, the result will not be a multiple of 11. If any 2 digits are transposed, the result will not be a multiple of 11. So as a preliminary check of any value entered, you check if it's divisible by 11 before even looking in the database.

Pronunciation answered 2/9, 2011 at 14:5 Comment(1)
This is all backend stuff, so there is no possibility of data entry errors. Regardless, PHP does not support integer math so while the method is great, it sadly isn't useable in my case.Gap
M
1

You can use mod operation for big prime number.

your number * big prime number 1 / big prime number 2.

Prime number 1 should be bigger than second. Seconds should be close to 2^32 but less than it. Than it will be hard to substitute.

Prime 1 and Prime 2 should be constants.

Mckibben answered 2/9, 2011 at 13:36 Comment(4)
Is there a way to recover to original integer?Gap
you will never has same rest for 1..big prime number 2, because in opposite case N2 - N1 should be divideable by prime 2, then your number 2 - your number 1 should be dividable also, but this is impossible in set 1..big prime number 2.Mckibben
you can keep both in your tableMckibben
I don't know function to restore original value and I'm not sure that it's possible.Mckibben
C
1

For our application, we use bit shuffle to generate the ID. It is very easy to reverse back to the original ID.

func (m Meeting) MeetingCode() uint {
    hashed := (m.ID + 10000000) & 0x00FFFFFF
    chunks := [24]uint{}
    for i := 0; i < 24; i++ {
        chunks[i] = hashed >> i & 0x1
    }
    shuffle := [24]uint{14, 1, 15, 21, 0, 6, 5, 10, 4, 3, 20, 22, 2, 23, 8, 13, 19, 9, 18, 12, 7, 11, 16, 17}
    result := uint(0)
    for i := 0; i < 24; i++ {
        result = result | (chunks[shuffle[i]] << i)
    }
    return result
}
Coastland answered 16/6, 2020 at 9:31 Comment(0)
R
0

There is an exceedingly simple solution that none have posted, even though an answer has been selected I highly advise any visiting this question to consider the nature of binary representations, and the application of modulos arithmetic.

Given an finite range of integers, all the values can be permuted in any order through a simple addition over their index while bound by the range of the index through a modulos. You could even leverage simple integer overflow such that using the modulos operator is not even necessary.

Essentially, you'd have a static variable in memory, where a function when called increments the static variable by some constant, enforces the boundaries, and then returns the value. This output could be an index over a collection of desired outputs, or the desired output itself

The constant of the increment that defines the mapping may be several times the size in memory of the value being returned, but given any mapping there exists some finite constant that will achieve the mapping through a trivial modulos arithmetic.

Returnable answered 19/4, 2022 at 17:36 Comment(3)
Please however note, that a given mapping will have infinitely many constants that would achieve it, and there will be a lot of overlap, which is why primes are typically favored.Returnable
You could use multiplication instead of addition, and a desired constant is trivial to attain, but in practice its application takes more time and requires more memory. Addition is also preferable, since multiplication wouldn't actually produce every potential mapping, and harder to reverse, thus being abelian addition might be cryptographically superior.Returnable
Addition as in over a modulos arithmetic, that is.Returnable

© 2022 - 2024 — McMap. All rights reserved.