Better ways to implement a modulo operation (algorithm question)
Asked Answered
P

5

15

I've been trying to implement a modular exponentiator recently. I'm writing the code in VHDL, but I'm looking for advice of a more algorithmic nature. The main component of the modular exponentiator is a modular multiplier which I also have to implement myself. I haven't had any problems with the multiplication algorithm- it's just adding and shifting and I've done a good job of figuring out what all of my variables mean so that I can multiply in a pretty reasonable amount of time.

The problem that I'm having is with implementing the modulus operation in the multiplier. I know that performing repeated subtractions will work, but it will also be slow. I found out that I could shift the modulus to effectively subtract large multiples of the modulus but I think there might still be better ways to do this. The algorithm that I'm using works something like this (weird pseudocode follows):

result,modulus : integer (n bits) (previously defined)
shiftcount : integer (initialized to zero)
while( (modulus<result) and  (modulus(n-1) != 1) ){
     modulus = modulus << 1
     shiftcount++
}
for(i=shiftcount;i>=0;i--){
     if(modulus<result){result = result-modulus}
     if(i!=0){modulus = modulus >> 1}
}

So...is this a good algorithm, or at least a good place to start? Wikipedia doesn't really discuss algorithms for implementing the modulo operation, and whenever I try to search elsewhere I find really interesting but incredibly complicated (and often unrelated) research papers and publications. If there's an obvious way to implement this that I'm not seeing, I'd really appreciate some feedback.

Pavier answered 5/5, 2010 at 13:33 Comment(4)
is this significantly slower than multiplication? it doesn't seem like it should be; you've got the same basic components.Fusil
BTW, I'm also frustrated with how Wikipedia articles are increasingly written by mathematicians. Just because something can be easily expressed using advanced concepts and notation does not mean that's the best way to explain it ;-) It's similar to discussions on Stackoverflow vs those on Mathoverflow.Neary
@Jason S of course the above algorithm is a lot slower than multiplication. Multiplication (with Wallace trees) is O(log n) while the above is O(n).Bethune
@phkahler, to be fair, modulo operation exists at least since Euler's time, so it is more a mathematical topic than a computers topic. But, still, notation is frequently overused and underexplained in mathematics. It is a known problem in maths papers...Archduchess
G
14

I'm not sure what you're calculating there to be honest. You talk about modulo operation, but usually a modulo operation is between two numbers a and b, and its result is the remainder of dividing a by b. Where is the a and b in your pseudocode...?

Anyway, maybe this'll help: a mod b = a - floor(a / b) * b.

I don't know if this is faster or not, it depends on whether or not you can do division and multiplication faster than a lot of subtractions.

Another way to speed up the subtraction approach is to use binary search. If you want a mod b, you need to subtract b from a until a is smaller than b. So basically you need to find k such that:

a - k*b < b, k is min

One way to find this k is a linear search:

k = 0;
while ( a - k*b >= b )
    ++k;

return a - k*b;

But you can also binary search it (only ran a few tests but it worked on all of them):

k = 0;
left = 0, right = a
while ( left < right )
{
    m = (left + right) / 2;
    if ( a - m*b >= b )
       left = m + 1;
    else
       right = m;
}

return a - left*b;

I'm guessing the binary search solution will be the fastest when dealing with big numbers.

If you want to calculate a mod b and only a is a big number (you can store b on a primitive data type), you can do it even faster:

for each digit p of a do
    mod = (mod * 10 + p) % b
return mod

This works because we can write a as a_n*10^n + a_(n-1)*10^(n-1) + ... + a_1*10^0 = (((a_n * 10 + a_(n-1)) * 10 + a_(n-2)) * 10 + ...

I think the binary search is what you're looking for though.

Gran answered 5/5, 2010 at 14:13 Comment(5)
OP is basically doing the division algorithm (by repeated subtraction, which is how you do division at a low-level). Binary search won't speed it up when there's a multiplication step (which takes just as long as division when you do it at a low level).Fusil
@Jason S - I'm not really sure what the OP is doing, but it looks to me like his while loop could be replaced with a binary search.Gran
This is in very low level gate logic. Shifting is easy, fast, and simple. Binary searches are not.Fusil
@IVlad, if you're not sure what the OP is trying to accomplish, why do you try to answer him at all? There's no necessity for answering. Irrelevant answers just clutter the thread, making finding the relevant answers much more hard.Jerrome
@Jerrome because as the first post is written, there is no way to know exactly what the OP is after, because his question is very vague. In fact, there's not even a real question, just if there's a better algorithm or if his is good enough. There was also no attempt from him to clarify judging by his lack of reply to all of the answers he received. My answer might help people who stumble upon this question. Judging by the upvotes I received, I'm assuming it did help some people, so I don't think this is irrelevant. Had the OP clarified that this doesn't help him, I would have removed it.Gran
N
6

If you're using shift-and-add for the multiplication (which is by no means the fastest way) you can do the modulo operation after each addition step. If the sum is greater than the modulus you then subtract the modulus. If you can predict the overflow, you can do the addition and subtraction at the same time. Doing the modulo at each step will also reduce the overall size of your multiplier (same length as input rather than double).

The shifting of the modulus you're doing is getting you most of the way towards a full division algorithm (modulo is just taking the remainder).

EDIT Here is my implementation in Python:

def mod_mul(a,b,m):
    result = 0
    a = a % m
    b = b % m
    while (b>0):
        if (b&1)!=0:
            result += a
            if result >= m: result -= m
        a = a << 1
        if a>=m: a-= m
        b = b>>1
    return result

This is just modular multiplication (result = a*b mod m). The modulo operations at the top are not needed, but serve as a reminder that the algorithm assumes a and b are less than m.

Of course for modular exponentiation you'll have an outer loop that does this entire operation at each step doing either squaring or multiplication. But I think you knew that.

Neary answered 5/5, 2010 at 14:56 Comment(1)
this has an extra benefit: if each number, before you shift it left by one bit, is less than the modulus, then the number shifted left by one bit (which is twice the number) cannot be more than two times the modulus, which means you will only ever need to subtract the modulus once in these steps.Dichotomize
B
6

There are many ways to do it in O(log n) time for n bits; you can do it with multiplication and you don't have to iterate 1 bit at a time. For example,

a mod b = a - floor((a * r)/2^n) * b

where

r = 2^n / b

is precomputed because typically you're using the same b many times. If not, use the standard superconverging polynomial iteration method for reciprocal (iterate 2x - bx^2 in fixed point).

Choose n according to the range you need the result (for many algorithms like modulo exponentiation it doesn't have to be 0..b).

(Many decades ago I thought I saw a trick to avoid 2 multiplications in a row... Update: I think it's Montgomery Multiplication (see REDC algorithm). I take it back, REDC does the same work as the simpler algorithm above. Not sure why REDC was ever invented... Maybe slightly lower latency due to using the low-order result into the chained multiplication, instead of the higher-order result?)

Of course if you have a lot of memory, you can just precompute all the 2^n mod b partial sums for n = log2(b)..log2(a). Many software implementations do this.

Bethune answered 20/10, 2019 at 6:4 Comment(2)
Your first example is, apparently, called "Barrett reduction." Also... you're 9+ years late to the party.Fifield
@Fifield Much later than that... a lot of this was figured out in the 1970s. But that's OK: The question itself was almost as late.Bethune
F
0

For modulo itself, I'm not sure. For modulo as part of the larger modular exponential operation, did you look up Montgomery multiplication as mentioned in the wikipedia page on modular exponentiation? It's been a while since I've looked into this type of algorithm, but from what I recall, it's commonly used in fast modular exponentiation.

edit: for what it's worth, your modulo algorithm seems ok at first glance. You're basically doing division which is a repeated subtraction algorithm.

Fusil answered 5/5, 2010 at 13:56 Comment(0)
R
0

That test (modulus(n-1) != 1) //a bit test?

-seems redundant combined with (modulus<result).

Designing for hardware implementation i would be conscious of the smaller/greater than tests implying more logic (subtraction) than bitwise operations and branching on zero.

If we can do bitwise tests easily, this could be quick:

m=msb_of(modulus)

while( result>0 ) 
{
  r=msb_of(result) //countdown from prev msb onto result
  shift=r-m        //countdown from r onto modulus or 
                   //unroll the small subtraction 

  takeoff=(modulus<<(shift))  //or integrate this into count of shift

  result=result-takeoff;  //necessary subtraction

  if(shift!=0 && result<0)
  { result=result+(takeoff>>1); }

  } //endwhile

if(result==0) { return result }
else          { return result+takeoff }

(code untested may contain gotchas)

result is repetively decremented by modulus shifted to match at most significant bits.

After each subtraction: result has a ~50/50 chance of loosing more than 1 msb. It also has ~50/50 chance of going negative, addition of half what was subtracted will always put it into positive again. > it should be put back in positive if shift was not=0

The working loop exits when result is underrun and 'shift' was 0.

Romeu answered 8/5, 2010 at 20:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.