Reversible hash function?
Asked Answered
C

5

59

I need a reversible hash function (obviously the input will be much smaller in size than the output) that maps the input to the output in a random-looking way. Basically, I want a way to transform a number like "123" to a larger number like "9874362483910978", but not in a way that will preserve comparisons, so it must not be always true that, if x1 > x2, f(x1) > f(x2) (but neither must it be always false).

The use case for this is that I need to find a way to transform small numbers into larger, random-looking ones. They don't actually need to be random (in fact, they need to be deterministic, so the same input always maps to the same output), but they do need to look random (at least when base64encoded into strings, so shifting by Z bits won't work as similar numbers will have similar MSBs).

Also, easy (fast) calculation and reversal is a plus, but not required.

I don't know if I'm being clear, or if such an algorithm exists, but I'd appreciate any and all help!

Cumulation answered 25/11, 2010 at 3:18 Comment(3)
If you want reversible then you could just use a symmetric encryption algorithm; for example AES.Buller
I could, but that's not deterministic. I guess I could always use the same seed for everything, but that seems overkill...Cumulation
That's also good, but that looks too complicated as well. I might just use the first few operations of some encryption algorithm, or just xor the last few bits of the number to the first ones...Cumulation
A
56

None of the answers provided seemed particularly useful, given the question. I had the same problem, needing a simple, reversible hash for not-security purposes, and decided to go with bit relocation. It's simple, it's fast, and it doesn't require knowing anything about boolean maths or crypo algorithms or anything else that requires actual thinking.

The simplest would probably be to just move half the bits left, and the other half right:

def hash(n):
  return ((0x0000FFFF & n)<<16) + ((0xFFFF0000 & n)>>16)

This is reversible, in that hash(hash(n)) = n, and has non-sequential pairs {n,m}, n < m, where hash(m) < hash(n).

And to get a much less sequential looking implementation, you might also want to consider an interlace reordering from [msb,z,...,a,lsb] to [msb,lsb,z,a,...] or [lsb,msb,a,z,...] or any other relocation you feel gives an appropriately non-sequential sequence for the numbers you deal with, or even add a XOR on top for peak desequential'ing.

(The above function is safe for numbers that fit in 32 bits, larger numbers are guaranteed to cause collisions and would need some more bit mask coverage to prevent problems. That said, 32 bits is usually enough for any non-security uid).

Also have a look at the multiplicative inverse answer given by Andy Hayden, below.

Apportionment answered 22/10, 2012 at 19:57 Comment(7)
I don't really remember what I wanted this for, now (:P), but your answer seems to exactly fit the bill, so I'll mark it as the correct one, thank you.Cumulation
Fantastic solution, actually. It has all the properties I was looking for. Now if I can only remember what I wanted it for...Cumulation
This method can be generalized to an arbitrary permutation of the bits in the input number.Diantha
I mentioned that as "To get a less sequential looking implementation, you might also want to consider [...] or any other relocation you feel gives an appropriately non-sequential sequence for the numbers you deal with."Mall
Would this work for a stream of n's? That's what I need, maybe another question for that? Note that all n's represent values << 32bit, cannot have collisions and max 64 n's. And I have a similar need, possibly on the order of 64bit.Fondea
Probably? this is a reversible int cypher, so if you have a stream of ints, just cypher them in sequence, and recypher them in sequence.Mall
What about a ("nondeterministic") reversible hash? A conventional hash from a big set to a smaller set, i.e., with collisions. The reversion would return all possible inputs for that hash. This would require brute force for most hashes, but maybe there are hashing algorithms specifically intended for that.Jase
D
36

Another simple solution is to use multiplicative inverses (see Eri Clippert's blog):

we showed how you can take any two coprime positive integers x and m and compute a third positive integer y with the property that (x * y) % m == 1, and therefore that (x * z * y) % m == z % m for any positive integer z. That is, there always exists a “multiplicative inverse”, that “undoes” the results of multiplying by x modulo m.

We take a large number e.g. 4000000000 and a large co-prime number e.g. 387420489:

def rhash(n):
    return n * 387420489 % 4000000000

>>> rhash(12)
649045868

We first calculate the multiplicative inverse with modinv which turns out to be 3513180409:

>>> 3513180409 * 387420489 % 4000000000
1

Now, we can define the inverse:

def un_rhash(h):
    return h * 3513180409 % 4000000000

>>> un_rhash(649045868)  # un_rhash(rhash(12))
12

Note: This answer is fast to compute and works for numbers up to 4000000000, if you need to handle larger numbers choose a sufficiently large number (and another co-prime).


You may want to do this with hexidecimal (to pack the int):

def rhash(n):
    return "%08x" % (n * 387420489 % 4000000000)

>>> rhash(12)
'26afa76c'

def un_rhash(h):
    return int(h, 16) * 3513180409 % 4000000000

>>> un_rhash('26afa76c')  # un_rhash(rhash(12))
12

If you choose a relatively large co-prime then this will seem random, be non-sequential and also be quick to calculate.

Dungeon answered 7/7, 2016 at 7:59 Comment(1)
This is a nice solution that avoids the problem of quantised sequences that you get with bit relocation.Mall
K
19

What you are asking for is encryption. A block cipher in its basic mode of operation, ECB, reversibly maps a input block onto an output block of the same size. The input and output blocks can be interpreted as numbers.

For example, AES is a 128 bit block cipher, so it maps an input 128 bit number onto an output 128 bit number. If 128 bits is good enough for your purposes, then you can simply pad your input number out to 128 bits, transform that single block with AES, then format the output as a 128 bit number.

If 128 bits is too large, you could use a 64 bit block cipher, like 3DES, IDEA or Blowfish.

ECB mode is considered weak, but its weakness is the constraint that you have postulated as a requirement (namely, that the mapping be "deterministic"). This is a weakness, because once an attacker has observed that 123 maps to 9874362483910978, from then on whenever she sees the latter number, she knows the plaintext was 123. An attacker can perform frequency analysis and/or build up a dictionary of known plaintext/ciphertext pairs.

Kelila answered 25/11, 2010 at 6:18 Comment(3)
Sure, but this isn't very simple to implement. I'm not looking for anything resembling security, the number it stores is public anyway. I just want something that's easy to write/reverse. In the end, I just settled for base64 encoding, after not being able to find anything else simple enough...Cumulation
@Stavros Korokithakis: There are several libraries for python, such as pycrypto, that reduce it down to a single function call.Kelila
The attack described in the last paragraph is a rainbow table.Capriole
I
4

Basically, you are looking for 2 way encryption, and one that probably uses a salt.

You have a number of choices:

  1. TripleDES
  2. AES

Here is an example:" Simple insecure two-way "obfuscation" for C#

What language are you looking at? If .NET then look at the encryption namespace for some ideas.

Ileneileo answered 25/11, 2010 at 3:33 Comment(4)
I'm using Python, but the thing is that this is just done for display purposes (I don't want the numbers to look small and sequential), so it's a bit too overkill to encrypt everything each time... There has to be a simpler way.Cumulation
Well just perform some maths on them at each end. E.g. x * 123456789 * (123456789 / 987654321) ^ 22Ileneileo
e.g. 23 --> MTQwMTUzMTYyMzAwOTUyNTA1ODY3Ny4wMTgwMTM5OTgy --> 23Ileneileo
That's interesting, but the MSBs of similar numbers are, again, the same. The best bet seems to be to take the number, reverse its bits and put them in the first positions of the word. However, that approach has overflow problems.Cumulation
P
4

Why not just XOR with a nice long number?

Easy. Fast. Reversible.

Or, if this doesn't need to be terribly secure, you could convert from base 10 to some smaller base (like base 8 or base 4, depending on how long you want the numbers to be).

Pantia answered 25/11, 2010 at 3:35 Comment(4)
Because the MSBs are always the same then.Cumulation
Converting to smaller bases also makes the MSBs remain the same :/Cumulation
The problem with XOR is that once an attacker knows a single plaintext/cipertext pair, they can decrypt every other ciphertext.Kelila
super late comment, but: as stated before, this is purely for display purposes, this is not for security purposes and so there are no "attackers". If someone figures it out, they figured out nothing.Mall

© 2022 - 2024 — McMap. All rights reserved.