how to calculate (a times b) divided by c only using 32-bit integer types even if a times b would not fit such a type

Asked 10/11, 2010 at 12:0 Answered 28/2, 2021 at 8:58

Solved c integer multiplication integer-overflow integer-division

Consider the following as a reference implementation:

/* calculates (a * b) / c */
uint32_t muldiv(uint32_t a, uint32_t b, uint32_t c)
{
    uint64_t x = a;
    x = x * b;
    x = x / c;
    return x;
}

I am interested in an implementation (in C or pseudocode) that does not require a 64-bit integer type.

I started sketching an implementation that outlines like this:

/* calculates (a * b) / c */
uint32_t muldiv(uint32_t a, uint32_t b, uint32_t c)
{
    uint32_t d1, d2, d1d2;
    d1 = (1 << 10);
    d2 = (1 << 10);
    d1d2 = (1 << 20); /* d1 * d2 */
    return ((a / d1) * (b /d2)) / (c / d1d2);
}

But the difficulty is to pick values for d1 and d2 that manage to avoid the overflow ((a / d1) * (b / d2) <= UINT32_MAX) and minimize the error of the whole calculation.

Any thoughts?

Deviled answered 10/11, 2010 at 12:0 Comment(4)

Implementing a 64 bit multiplication and division from the 32 bit one is one obvious way. It – Relic 10/11, 2010 at 12:17

What do you want to happen when the result does not fit in 32 bits? (UINT_MAX * UINT_MAX / 1 for example) – Horizon 10/11, 2010 at 12:19

@pmg: same as the reference implementation, surely - return the mathematical result modulo (UINT_MAX+1). That's what "reference implementation" means in my dictionary ;-) – Radman 10/11, 2010 at 12:25

related: Most accurate way to do a combined multiply-and-divide operation in 64-bit? – Ununa 21/3, 2015 at 18:44

I have adapted the algorithm posted by Paul for unsigned ints (by omitting the parts that are dealing with signs). The algorithm is basically Ancient Egyptian multiplication of a with the fraction floor(b/c) + (b%c)/c (with the slash denoting real division here).

uint32_t muldiv(uint32_t a, uint32_t b, uint32_t c)
{
    uint32_t q = 0;              // the quotient
    uint32_t r = 0;              // the remainder
    uint32_t qn = b / c;
    uint32_t rn = b % c;
    while(a)
    {
        if (a & 1)
        {
            q += qn;
            r += rn;
            if (r >= c)
            {
                q++;
                r -= c;
            }
        }
        a  >>= 1;
        qn <<= 1;
        rn <<= 1;
        if (rn >= c)
        {
            qn++; 
            rn -= c;
        }
    }
    return q;
}

This algorithm will yield the exact answer as long as it fits in 32 bits. You can optionally also return the remainder r.

Cryohydrate answered 10/11, 2010 at 13:29 Comment(0)

The simplest way would be converting the intermediar result to 64 bits, but, depending on value of c, you could use another approach:

((a/c)*b  +  (a%c)*(b/c) + ((a%c)*(b%c))/c

The only problem is that the last term could still overflow for large values of c. still thinking about it..

Breadbasket answered 10/11, 2010 at 12:13 Comment(8)

Hm. I don't see how that would work... 123 * 45 / 100 = 55, but ((123 % 100) * (45 % 100)) % 100 = 35. – Viccora 10/11, 2010 at 12:19

The problem there of course is that (a % c) * b can still overflow. As you say it depends on the value of c. If a and b are both quite large, but c is even larger, you're basically done for. – Radman 10/11, 2010 at 12:20

@Guffa: But what ruslik suggests is (123/100) * 45 + ((123 % 100) * 45)/ 100 is 45 + (23 * 45) / 100, which is 55, which is correct. – Radman 10/11, 2010 at 12:22

@Steve Jessop I've changed the last term so that it won't overflow. – Breadbasket 10/11, 2010 at 12:24

@ruslik: (a%c)*(b%c) still overflows if 2^16 < a < b < c. Well, I said "overflow", I should have said "wraps around", since the C standard says that unsigned arithmetic doesn't "overflow" by definition. Point is, it loses information that's needed for the result. – Radman 10/11, 2010 at 12:28

@Steve Jessop: That's not what the answer looked like when I commented it. – Viccora 10/11, 2010 at 12:42

Correct, this matches every combinations of a, b, c for UINT8 implementation (compared to (UINT8)(UINT16*UINT16/UINT16). But with more operators than the Guffa's solution. – Appetizer 6/3, 2021 at 15:48

Sorry, not, my tests were flawed. If we cannot use 64 bits variables for intermediar results, this formula fails. See my comment under Guffa's proposal. – Appetizer 7/3, 2021 at 8:32

You can first divide a by c and also get the reminder of the division, and multiply the reminder with b before dividing it by c. That way you only lose data in the last division, and you get the same result as making the 64 bit division.

You can rewrite the formula like this (where \ is integer division):

a * b / c =
(a / c) * b =
(a \ c + (a % c) / c) * b =
(a \ c) * b + ((a % c) * b) / c

By making sure that a >= b, you can use larger values before they overflow:

uint32_t muldiv(uint32_t a, uint32_t b, uint32_t c) {
  uint32_t hi = a > b ? a : b;
  uint32_t lo = a > b ? b : a;
  return (hi / c) * lo + (hi % c) * lo / c;
}

Another approach would be to loop addition and subtraction instead of multiplying and dividing, but that is of course a lot more work:

uint32_t muldiv(uint32_t a, uint32_t b, uint32_t c) {
  uint32_t hi = a > b ? a : b;
  uint32_t lo = a > b ? b : a;
  uint32_t sum = 0;
  uint32_t cnt = 0;
  for (uint32_t i = 0; i < hi; i++) {
    sum += lo;
    while (sum >= c) {
      sum -= c;
      cnt++;
    }
  }
  return cnt;
}

Viccora answered 10/11, 2010 at 12:37 Comment(4)

The problem with your first approach is that (a % c) * b can wrap around. The second approach is intersting but I think that sum could warp around. It is a pain working with large values! – Deviled 11/11, 2010 at 21:29

No, the first approach is right, it never wrap around (or it does like (UINT32)(UINT64*UINT64/UINT64) does, when the result doesn't fit in UINT32). Implemented for UINT8, this calculation is always equals to (UINT8)(UINT16*UINT16/UINT16), for every combinations of a, b and c. But the sort of a and b (hi and lo variables) are strictly useless. – Appetizer 6/3, 2021 at 15:11

However, the second approach is wrong. For example, in UINT8, muldiv8(14, 178, 253) returns 3 instead of 9. And it is very inefficient, since it can loops billions of times, for UINT32. – Appetizer 6/3, 2021 at 15:25

Sorry, not. the first approach fails to. With the inline formula, overflows was hidden, due to implicit integer promotions to the native registers width of my machine. But if I store every intermediate results to a temporary variable (to truncate it), the test fail. So, this was an interesting, but not valid solution. – Appetizer 7/3, 2021 at 8:27

Searching on www.google.com/codesearch turns up a number of implementations, including this wonderfuly obvious one. I particularly like the extensive comments and well chosen variable names

INT32 muldiv(INT32 a, INT32 b, INT32 c)
{ INT32 q=0, r=0, qn, rn;
  int qneg=0, rneg=0;
  if (c==0) c=1;
  if (a<0) { qneg=!qneg; rneg=!rneg; a = -a; }
  if (b<0) { qneg=!qneg; rneg=!rneg; b = -b; }
  if (c<0) { qneg=!qneg;             c = -c; }

  qn = b / c;
  rn = b % c;

  while(a)
  { if (a&1) { q += qn;
               r += rn;
               if(r>=c) { q++; r -= c; }
             }
    a  >>= 1;
    qn <<= 1;
    rn <<= 1;
    if (rn>=c) {qn++; rn -= c; }
  }
  result2 = rneg ? -r : r;
  return qneg ? -q : q;
}

http://www.google.com/codesearch/p?hl=en#HTrPUplLEaU/users/mr/MCPL/mcpl.tgz|gIE-sNMlwIs/MCPL/mintcode/sysc/mintsys.c&q=muldiv%20lang:c

Iconoduly answered 10/11, 2010 at 12:53 Comment(2)

The algorithm gets even a bit simpler for unsigned integers :) – Cryohydrate 10/11, 2010 at 13:3

I actually think this algorithm is the best answer so far. It's the only one that really works in all cases it possible can work in (that is, the result fits in 32 bits). – Cryohydrate 10/11, 2010 at 13:10

I implemented the Sven's code as UINT16, to intensively test it:

uint16_t muldiv16(uint16_t a, uint16_t b, uint16_t c);

int main(int argc, char *argv[]){
    uint32_t a;
    uint32_t b;
    uint32_t c;
    uint16_t r1, r2;

// ~167 days, estimated on i7 6700k, single thread.
// Split the 'a' range, to run several instances of this code on multi-cores processor
// ~1s, with an UINT8 implementation
    for(a=0; a<=UINT16_MAX; a++){
        for(b=0; b<=UINT16_MAX; b++){
            for(c=1; c<=UINT16_MAX; c++){
                r1 = uint16_t( a*b/c );
                r2 = muldiv16(uint16_t(a), uint16_t(b), uint16_t(c));
                if( r1 != r2 ){
                    std::cout << "Err: " << a << " * " << b << " / " << c << ", result: " << r2 << ", exected: " << r1 << std::endl;
                    return -1;
                }
            }
        }
        std::cout << a << std::endl
    }
    std::cout << "Done." << std::endl;
    return 0;
}

Unfortunately, it seems that it is limited to UINT31 for 'b' (0-2147483647).

Here is my correction, that seems to work (not completed the test on UINT16, but run a lot. Completed on UINT8).

uint32_t muldiv32(uint32_t a, uint32_t b, uint32_t c)
{
    uint32_t q = 0;              // the quotient
    uint32_t r = 0;              // the remainder
    uint32_t qn = b / c;
    uint32_t rn = b % c;
    uint32_t r_carry;
    uint32_t rn_carry;
    while(a)
    {
        if (a & 1)
        {
            q += qn;
            r_carry = (r > UINT32_MAX-rn);
            r += rn;
            if (r >= c || r_carry)
            {
                q++;
                r -= c;
            }
        }
        a  >>= 1;
        qn <<= 1;
        rn_carry = rn & 0x80000000UL;
        rn <<= 1;
        if (rn >= c || rn_carry)
        {
            qn++;
            rn -= c;
        }
    }
    return q;
}

Edit: an improvement, that returns the remainder, manages the round, warns about overflow and, of course, manages the full range of UINT32 for a, b and c:

typedef enum{
    ROUND_DOWNWARD=0,
    ROUND_TONEAREST,
    ROUND_UPWARD
}ROUND;

//remainder is always positive for ROUND_DOWN ( a * b = c * q + remainder )
//remainder is always negative for ROUND_UPWARD ( a * b = c * q - remainder )
//remainder is signed for ROUND_CLOSEST ( a * b = c * q + sint32_t(remainder) )
uint32_t muldiv32(uint32_t a, uint32_t b, uint32_t c, uint32_t *remainder, ROUND round, uint8_t *ovf)
{
    uint32_t q = 0;              // the quotient
    uint32_t r = 0;              // the remainder
    uint32_t qn = b / c;
    uint32_t rn = b % c;
    uint32_t r_carry;
    uint32_t rn_carry;
    uint8_t o = 0;
    uint8_t rup;
    while(a)
    {
        if (a & 1)
        {
            o |= (q > UINT32_MAX-qn);
            q += qn;
            r_carry = (r > UINT32_MAX-rn);
            r += rn;
            if (r >= c || r_carry)
            {
                o |= (q == UINT32_MAX);
                q++;
                r -= c;
            }
        }
        a  >>= 1;
        qn <<= 1;
        rn_carry = rn & 0x80000000;
        rn <<= 1;
        if (rn >= c || rn_carry)
        {
            qn++;
            rn -= c;
        }
    }
    rup = (round == ROUND_UPWARD && r);
    rup |= (round == ROUND_TONEAREST && ((r<<1) >= c || r & 0x80000000));
    if(rup)
    {   //round
        o |= (q == UINT32_MAX);
        q++;
        r = (round == ROUND_UPWARD) ? c-r : r-c;
    }
    if(remainder)
        *remainder = r;
    if(ovf)
        *ovf = o;
    return q;
}

Maybe there could exist another approach, perhaps even more efficient: 8-bits, 16-bits and 32-bits MCU are able to compute 64-bits calculations (long long int). Anyone known how the compilers emulate it?

Edit 2:

Here is some interresting timings, on 8-bits MCU:

UINT8 x UINT8 / UINT8: 3.5µs

UINT16 x UINT16 / UINT16: 22.5µs, muldiv8: 29.9 to 45.3µs

UINT32 x UINT32 / UINT32: 84µs, muldiv16: 120 to 189µs

FLOAT32 * FLOAT32 / FLOAT32: 40.2 ot 135.5µs, muldiv32: 1.193 to 1.764ms

And on 32-bits MCU:

Type - optimized code - without optimization

UINT32: 521ns - 604ns

UINT64: 2958ns - 3313ns

FLOAT32: 2563ns - 2688ns

muldiv32: 6791ns - 25375ns

So, the compilers are clever than this C algorithm. And it is always better to work with float variables (even without FPU) than whith integer bigger than the native registers (even though float32 has worst precision than uint32, starting 16777217).

Edit3: Ok, so: my N-bits MCU are using a N-bits MUL N-bits native instruction, that produce a 2N-bits result, stored into two N-Bits registers.

Here, you can found a C implementation (prefer the EasyasPi's solution)

But they don't have 2N-bits DIV N-bits native instruction. Instead, they are using the __udivdi3 function from gcc, with loops and 2N-bits variables (here, UINT64). So, this cannot be a solution for the original question.

Appetizer answered 28/2, 2021 at 8:58 Comment(0)

If b and c are both constants, you can calculate the result very simply using Egyptian fractions.

For example. y = a * 4 / 99 can be written as

y = a / 25 + a / 2475

You can express any fraction as a sum of Egyptian fractions, as explained in answers to Egyptian Fractions in C.

Having b and c fixed in advance might seem like a bit of a restriction, but this method is a lot simpler than the general case answered by others.

Coronet answered 18/11, 2015 at 11:35 Comment(0)

-4

I suppose there are reasons you can't do

x = a/c;
x = x*b;

are there? And maybe add

y = b/c;
y = y*a;

if ( x != y )
    return ERROR_VALUE;

Note that, since you're using integer division, a*b/c and a/c*b might lead to different values if c is bigger than a or b. Also, if both a and b are smaller than c it won't work.

Syrupy answered 10/11, 2010 at 12:34 Comment(3)

It doesn't work if both a and b are lower than c. For example 20*30/100 = 6 while (20/100)*30 = 0 and (30/100)*20 = 0. – Viccora 10/11, 2010 at 12:46

I said it in the post already. Perhaps not explicitly, so I corrected it. – Syrupy 10/11, 2010 at 13:0

This fails for every values, even if c is small than a and b (e.g.: 163*206/57 = 589, while 163/57*206 = 412). When you divide an integer, you loose all its decimal part, you cannot retrieve with the following multiplication. – Appetizer 6/3, 2021 at 15:38

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags