Since this is tagged Visual C++ I'll give a solution that abuses MSVC-specific intrinsics.
This example is fairly complicated. It's a highly simplified version of the same algorithm that is used by GMP and java.math.BigInteger
for large division.
Although I have a simpler algorithm in mind, it's probably about 30x slower.
This solution has the following constraints/behavior:
- It requires x64. It will not compile on x86.
- The quotient is not zero.
- The quotient saturates if it overflows 64-bits.
Note that this is for the unsigned integer case. It's trivial to build a wrapper around this to make it work for signed cases as well. This example should also produce correctly truncated results.
This code is not fully tested. However, it has passed all the tests cases that I've thrown at it.
(Even cases that I've intentionally constructed to try to break the algorithm.)
#include <intrin.h>
uint64_t muldiv2(uint64_t a, uint64_t b, uint64_t c){
// Normalize divisor
unsigned long shift;
_BitScanReverse64(&shift,c);
shift = 63 - shift;
c <<= shift;
// Multiply
a = _umul128(a,b,&b);
if (((b << shift) >> shift) != b){
cout << "Overflow" << endl;
return 0xffffffffffffffff;
}
b = __shiftleft128(a,b,shift);
a <<= shift;
uint32_t div;
uint32_t q0,q1;
uint64_t t0,t1;
// 1st Reduction
div = (uint32_t)(c >> 32);
t0 = b / div;
if (t0 > 0xffffffff)
t0 = 0xffffffff;
q1 = (uint32_t)t0;
while (1){
t0 = _umul128(c,(uint64_t)q1 << 32,&t1);
if (t1 < b || (t1 == b && t0 <= a))
break;
q1--;
// cout << "correction 0" << endl;
}
b -= t1;
if (t0 > a) b--;
a -= t0;
if (b > 0xffffffff){
cout << "Overflow" << endl;
return 0xffffffffffffffff;
}
// 2nd reduction
t0 = ((b << 32) | (a >> 32)) / div;
if (t0 > 0xffffffff)
t0 = 0xffffffff;
q0 = (uint32_t)t0;
while (1){
t0 = _umul128(c,q0,&t1);
if (t1 < b || (t1 == b && t0 <= a))
break;
q0--;
// cout << "correction 1" << endl;
}
// // (a - t0) gives the modulus.
// a -= t0;
return ((uint64_t)q1 << 32) | q0;
}
Note that if you don't need a perfectly truncated result, you can remove the last loop completely. If you do this, the answer will be no more than 2 larger than the correct quotient.
Test Cases:
cout << muldiv2(4984198405165151231,6132198419878046132,9156498145135109843) << endl;
cout << muldiv2(11540173641653250113, 10150593219136339683, 13592284235543989460) << endl;
cout << muldiv2(449033535071450778, 3155170653582908051, 4945421831474875872) << endl;
cout << muldiv2(303601908757, 829267376026, 659820219978) << endl;
cout << muldiv2(449033535071450778, 829267376026, 659820219978) << endl;
cout << muldiv2(1234568, 829267376026, 1) << endl;
cout << muldiv2(6991754535226557229, 7798003721120799096, 4923601287520449332) << endl;
cout << muldiv2(9223372036854775808, 2147483648, 18446744073709551615) << endl;
cout << muldiv2(9223372032559808512, 9223372036854775807, 9223372036854775807) << endl;
cout << muldiv2(9223372032559808512, 9223372036854775807, 12) << endl;
cout << muldiv2(18446744073709551615, 18446744073709551615, 9223372036854775808) << endl;
Output:
3337967539561099935
8618095846487663363
286482625873293138
381569328444
564348969767547451
1023786965885666768
11073546515850664288
1073741824
9223372032559808512
Overflow
18446744073709551615
Overflow
18446744073709551615
long long
doesn't work. – Archaeanlong long
is only 64 bits! I thought they would have made it 128 bits! I just looked it up. Pity. Well then your next option might be wonderful inline assembly! – Feverfew__int128
. – Waterlesserror C4235: nonstandard extension used : '__int128' keyword not supported on this architecture
– Communicant__int128
- not sure if it's still up to date, though: connect.microsoft.com/VisualStudio/feedback/details/529171/… – Craggyi
in my UN. I'm using VS2010 SP1. – Communicantmod 2^64
? I think my answer meets all your criteria (with perfect integer truncation) except for the overflow case. As of right now, I can't find a clean way to deal with the overflow case. (And actually the 64-bitdiv
andidiv
instructions will throw an exception if the quotient is >64-bits.) So there may not be a clean way to do this even with access to inline assembly. – Communicant_umul128()
, which would have to be done using the long-multiplication suggestions here. I suppose a work-around would be to combine it with the implementation you linked to. – CommunicantMulDiv64(0x1203785013274012, 0x1cef95b58432904f, 0x53)
on both implementations, and neither gave back Wolfram Alpha's answer of0x05DC4FD990C5C443
... and Microsoft's Power Calculator gave back0x647AC843C8E79298B21B29FB808EF.347
for the same thing, but without the mod... which should result in0x298B21B29FB808EF
after truncation. I feel like I'm missing something. Ideas? – Bouncy32607500402250344521899981391005935
EDIT2: That answer is the IEEE double-precision representation of the result, not an integer. – Communicantfirst * second / third
... probably wasn't correct. Still kinda weird that Alpha and Power Calculator give different answers, though... EDIT: Yeah the IEEE representation kinda baffled me, not sure why they gave me back a floating-point number in its hexadecimal representation. I'll use PowerCalc then. – Bouncyfirst * second / third
is correct.a * b / c
in my answer. – Communicant