Is it possible to get the original value of a number, after several multiplications **with overflow**?
Asked Answered
F

6

12

Summary: Is there a way to do that? Here's what I mean: suppose I have an unsigned int number. Then I multiply it several times(and there's overflow, which is expected). Then is it possible to "revert" the original value back?


In details:

It's all about Rabin-Karp rolling hash. What I need to do is: I have the hash of a long string - for example: "abcd". Then I have the hash for a shorter substring - for example "cd". How to calculate the "ab" hash with O(1), using the two given hashes?

What I have now as an algorithm:

  • substract the "cd" hash from "abcd" hash (remove the last elements from the polynomial)
  • devide the "abcd" hash by p ^ len( "cd" ), where p is the base (prime number).

So this is:

a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0 - abcd

c * p ^ 1 + d * p ^ 0 - cd

ab gets:

( 
  ( a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0 ) -
  ( c * p ^ 1 + d * p ^ 0 ) 
)
/ ( p ^ 2 )
= a * p ^ 1 + b * p ^ 0

And this works, if I don't have an overflow (if p is small number). But if it's not - it's not working.

Is there any trick or something?

P.S. The c++ tag is because of the number's overflow, as it is specific (and different from python, scheme or sth)

Farrington answered 7/5, 2011 at 11:32 Comment(3)
For p = 2 this is impossible. For all other primes p, it is possible...Iow
@Sven Marnach - well, how? I cannot subtract the last letter, then divide by the base (p), and again subtract the before-last letter and again divide by p, etc., as I don't know the strings, but only their hash..Farrington
@Sven Marnach - also, what is possible ? To "revert" the number or to calculate the hash? If the second, I guess I'm gonna need to accept cnicutar's answer and ask new question about hashing ?Farrington
F
3

Extended Euclidean algorithm is a good solution for this, but it's too complicated and hard to implement. There's a better one.


And there's another way to do this (thanks to e friend of mine (: )

There's a nice article in wikipedia - modular multiplicative inverse using Euler's theorem in the case, when m and a are coprime:

Euler's theorem for coprime number and modulo

where φ(m) is Euler's totient function.

In my case, the m (modulo) is the size of the hash type - 2^32, 2^64, etc. (64bit in my case).
Well, this means, that we should only find the value of φ(m). But think about that - m == 2 ^ 64 so, that gives us the guarantee that m will be coprime with all odd numbers and will NOT be coprime any even number. So, what we need to do is to get the number of all values and divide them by 2.

Also, we know that m will be unsigned, as otherwise we will have some issues. Than this gives us the chance to do this:

hash_t x = -1;
x /= 2;
hash_t a_reverse = fast_pow( a, x );

Well, about 64bit numbers, x is really big number ( 19 digits: 9 223 372 036 854 775 807), but fast_pow is really fast and we could cache the reverse number, in case that we need for more than one query.

fast_pow is a well-known algorithm:

hash_t fast_pow( hash_t source, hash_t pow )
{
    if( 0 == pow )
    {
        return 1;
    }

    if( 0 != pow % 2 )
    {
        return source * fast_pow( source, pow - 1 );
    }
    else
    {
        return fast_pow( source * source, pow / 2  );    
    }

}

Addition: for example:

    hash_t base = 2305843009213693951;  // 9th mersenne prime
    hash_t x = 1234567890987654321;

    x *= fast_pow( base, 123456789 );   // x * ( base ^ 123456789 )

    hash_t y = -1;
    y /= 2;
    hash_t base_reverse = fast_pow( base, y );

    x *= fast_pow( base_reverse, 123456789 );   // x * ( base_reverse ^ 123456789 )
    assert( x == 1234567890987654321 ) ;

works perfect and very fast.

Farrington answered 8/5, 2011 at 12:26 Comment(0)
L
6

Don't know about the overflow part, but there is a way of getting back the original value.

The Chinese Remainder Theorem help a great deal. Let's call h = abcd - cd. G is the value, h, without overflows, G = h + k*2^32, assuming the overflow simply does %2^32. And thus ab = G / p^2.

G = h (mod 2^32)
G = 0 (mod p^2)

If p^2 and 2^32 are coprime. This page on Chinese Remainder Theorem, gives us

G = h * b * p^2 (mod 2^32 * p^2)

Where b is modular multiplicative inverse of p^2 modulo 2^32, b * p^2 = 1 (mod 2^32). After you calculate G, simply divide by p^2 to find ab.

I hope I didn't make any mistakes...

Labiodental answered 7/5, 2011 at 13:11 Comment(0)
F
3

Extended Euclidean algorithm is a good solution for this, but it's too complicated and hard to implement. There's a better one.


And there's another way to do this (thanks to e friend of mine (: )

There's a nice article in wikipedia - modular multiplicative inverse using Euler's theorem in the case, when m and a are coprime:

Euler's theorem for coprime number and modulo

where φ(m) is Euler's totient function.

In my case, the m (modulo) is the size of the hash type - 2^32, 2^64, etc. (64bit in my case).
Well, this means, that we should only find the value of φ(m). But think about that - m == 2 ^ 64 so, that gives us the guarantee that m will be coprime with all odd numbers and will NOT be coprime any even number. So, what we need to do is to get the number of all values and divide them by 2.

Also, we know that m will be unsigned, as otherwise we will have some issues. Than this gives us the chance to do this:

hash_t x = -1;
x /= 2;
hash_t a_reverse = fast_pow( a, x );

Well, about 64bit numbers, x is really big number ( 19 digits: 9 223 372 036 854 775 807), but fast_pow is really fast and we could cache the reverse number, in case that we need for more than one query.

fast_pow is a well-known algorithm:

hash_t fast_pow( hash_t source, hash_t pow )
{
    if( 0 == pow )
    {
        return 1;
    }

    if( 0 != pow % 2 )
    {
        return source * fast_pow( source, pow - 1 );
    }
    else
    {
        return fast_pow( source * source, pow / 2  );    
    }

}

Addition: for example:

    hash_t base = 2305843009213693951;  // 9th mersenne prime
    hash_t x = 1234567890987654321;

    x *= fast_pow( base, 123456789 );   // x * ( base ^ 123456789 )

    hash_t y = -1;
    y /= 2;
    hash_t base_reverse = fast_pow( base, y );

    x *= fast_pow( base_reverse, 123456789 );   // x * ( base_reverse ^ 123456789 )
    assert( x == 1234567890987654321 ) ;

works perfect and very fast.

Farrington answered 8/5, 2011 at 12:26 Comment(0)
M
2

You have a * b = c mod 2^32 (or mod something else depending on how you are doing your hash). If you could find d such that b * d = 1 mod 2^32 (or mod whatever) then you could compute a * b * d = a and retrieve a. If gcd(b, mod 2^32) = 1 then you can use the http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm to find x and y such that b * x + 2^32 * y = 1, or b * x = 1 - y * 2^32, or b * x = 1 mod 2^32, so x is the number you want to multiply by.

Maplemaples answered 8/5, 2011 at 4:23 Comment(0)
G
1

You should use unsigned integers to get defined overflow behaviour (modulo 2^N). Signed integer overflow is undefined.

Also, instead of dividing you should multiply by the multiplicative inverse of p modulo the appropriate value. For example, if p=3 and your hash values are 8 bits, multiply by 171 because 171*3=513=2*256+1. The multiplicative inverse exists if p and the modulo value are relatively prime.

Gaikwar answered 7/5, 2011 at 13:35 Comment(1)
It is unsigned int, I just edited the question. Hm, that sounds reasonable, but as I know, finding such inverse element is far from trivial problem and will slow down the things.. +1 for the idea. I'll think about that.Farrington
A
1

Just a partial side-answer here: i believe it is not strictly necessary to use unsigned integers. You can use one's complement.

But note, that this will have a separate representation for -0 and +0, and that you'll probably have to handcode arithmetic operations along the way.

Some of the processor instructions are agnostic of the integer representation but not all.

Advanced answered 7/5, 2011 at 14:55 Comment(0)
B
0

So overflow is actually just your compiler being nice to you; the C/++ standard actually suggests that overflowing is undefined behaviour. So once you've overflown, there's actually nothing you can do because your program ceases to be deterministic.

You might need to rethink the algorithm, or tack on modulo operations / subtractions to fix your algorithm.

Baguio answered 7/5, 2011 at 12:37 Comment(7)
Unsigned values do not overflow. Your answer only applies to signed arithmetic.Incomprehension
It isn't defined by the standard, but it is defined by the architecture. On x86, the add instruction has deterministic behavior known as overflow.Chammy
@R: Unsigned values can overflow. uint8_t x = 255; x += 1; The value x will be 0 -- overflow.Chammy
@Travis: no, this is wrapping, not overflow. and for unsigned types this is well defined behavior and the standard describes exactly how this is done, namely with arithmetic module the corresponding power of 2.Jamille
Yeah, when I posted my answer the clarification regarding unsigned-ness hadn't yet been made. Also, an architecture definition means nothing; it won't work in the general case because not all architecture acts the same!Baguio
Sorry, wrapping. Since unsigned integer wrapping is well-defined and signed integers are implemented as 2's compliment unsigned int, wouldn't signed integer wrapping still be well-defined (perhaps not directly, but as a result of the behavior being a combination of two well-defined ones)?Chammy
@Ben Stott: unsigned integer wrapping is defined in the C standard, it's not architecture specific.Chiquita

© 2022 - 2024 — McMap. All rights reserved.