If using an external library is an option then have a look at this question. You can use TTMath which is a very simple header for big precision math. On 32-bit architectures ttmath:UInt<4>
will create a 128-bit int
type with four 32-bit limbs. Some other alternatives are (u)int128_t
in Boost.Multiprecision or calccrypto/uint128_t
If you must write it your own then there are already a lot of solutions on SO and I'll summarize them here
For addition and subtraction, it's very easy and straightforward, simply add/subtract the words (which big int libraries often called limbs) from the lowest significant to higher significant, with carry of course.
typedef struct INT128 {
uint64_t H, L;
} my_uint128_t;
inline my_uint128_t add(my_uint128_t a, my_uint128_t b)
{
my_uint128_t c;
c.L = a.L + b.L;
c.H = a.H + b.H + (c.L < a.L); // c = a + b
return c;
}
The assembly output can be checked with Compiler Explorer
The compilers can already generate efficient code for double-word operations, but many aren't smart enough to use "add with carry" when compiling multi-word operations from high level languages as you can see in the question efficient 128-bit addition using carry flag. Therefore using 2 long long
s like above will make it not only more readable but also easier for the compiler to emit a little more efficient code.
If that still doesn't suit your performance requirement, you must use intrinsic or write it in assembly. To add the 128-bit value stored in bignum
to the 128-bit value in {eax, ebx, ecx, edx} you can use the following code
add edx, [bignum]
adc ecx, [bignum+4]
adc ebx, [bignum+8]
adc eax, [bignum+12]
The equivalent intrinsic will be like this for Clang
unsigned *x, *y, *z, carryin=0, carryout;
z[0] = __builtin_addc(x[0], y[0], carryin, &carryout);
carryin = carryout;
z[1] = __builtin_addc(x[1], y[1], carryin, &carryout);
carryin = carryout;
z[2] = __builtin_addc(x[2], y[2], carryin, &carryout);
carryin = carryout;
z[3] = __builtin_addc(x[3], y[3], carryin, &carryout);
You need to change the intrinsic to the one supported by your compiler, for example __builtin_uadd_overflow
in gcc, or _addcarry_u32
for MSVC and ICC
For more information read these
For bit shifts you can find the C solution in the question Bitwise shift operation on a 128-bit number. This is a simple left shift but you can unroll the recursive call for more performance
void shiftl128 (
unsigned int& a,
unsigned int& b,
unsigned int& c,
unsigned int& d,
size_t k)
{
assert (k <= 128);
if (k >= 32) // shifting a 32-bit integer by more than 31 bits is "undefined"
{
a=b;
b=c;
c=d;
d=0;
shiftl128(a,b,c,d,k-32);
}
else
{
a = (a << k) | (b >> (32-k));
b = (b << k) | (c >> (32-k));
c = (c << k) | (d >> (32-k));
d = (d << k);
}
}
The assembly for less-than-32-bit shifts can be found in the question 128-bit shifts using assembly language?
shld edx, ecx, cl
shld ecx, ebx, cl
shld ebx, eax, cl
shl eax, cl
Right shifts can be implemented similarly, or just copy from the above linked question
Multiplication and divisions are a lot more complex and you can reference the solution in the question Efficient Multiply/Divide of two 128-bit Integers on x86 (no 64-bit):
class int128_t
{
uint32_t dw3, dw2, dw1, dw0;
// Various constrctors, operators, etc...
int128_t& operator*=(const int128_t& rhs) __attribute__((always_inline))
{
int128_t Urhs(rhs);
uint32_t lhs_xor_mask = (int32_t(dw3) >> 31);
uint32_t rhs_xor_mask = (int32_t(Urhs.dw3) >> 31);
uint32_t result_xor_mask = (lhs_xor_mask ^ rhs_xor_mask);
dw0 ^= lhs_xor_mask;
dw1 ^= lhs_xor_mask;
dw2 ^= lhs_xor_mask;
dw3 ^= lhs_xor_mask;
Urhs.dw0 ^= rhs_xor_mask;
Urhs.dw1 ^= rhs_xor_mask;
Urhs.dw2 ^= rhs_xor_mask;
Urhs.dw3 ^= rhs_xor_mask;
*this += (lhs_xor_mask & 1);
Urhs += (rhs_xor_mask & 1);
struct mul128_t
{
int128_t dqw1, dqw0;
mul128_t(const int128_t& dqw1, const int128_t& dqw0): dqw1(dqw1), dqw0(dqw0){}
};
mul128_t data(Urhs,*this);
asm volatile(
"push %%ebp \n\
movl %%eax, %%ebp \n\
movl $0x00, %%ebx \n\
movl $0x00, %%ecx \n\
movl $0x00, %%esi \n\
movl $0x00, %%edi \n\
movl 28(%%ebp), %%eax #Calc: (dw0*dw0) \n\
mull 12(%%ebp) \n\
addl %%eax, %%ebx \n\
adcl %%edx, %%ecx \n\
adcl $0x00, %%esi \n\
adcl $0x00, %%edi \n\
movl 24(%%ebp), %%eax #Calc: (dw1*dw0) \n\
mull 12(%%ebp) \n\
addl %%eax, %%ecx \n\
adcl %%edx, %%esi \n\
adcl $0x00, %%edi \n\
movl 20(%%ebp), %%eax #Calc: (dw2*dw0) \n\
mull 12(%%ebp) \n\
addl %%eax, %%esi \n\
adcl %%edx, %%edi \n\
movl 16(%%ebp), %%eax #Calc: (dw3*dw0) \n\
mull 12(%%ebp) \n\
addl %%eax, %%edi \n\
movl 28(%%ebp), %%eax #Calc: (dw0*dw1) \n\
mull 8(%%ebp) \n\
addl %%eax, %%ecx \n\
adcl %%edx, %%esi \n\
adcl $0x00, %%edi \n\
movl 24(%%ebp), %%eax #Calc: (dw1*dw1) \n\
mull 8(%%ebp) \n\
addl %%eax, %%esi \n\
adcl %%edx, %%edi \n\
movl 20(%%ebp), %%eax #Calc: (dw2*dw1) \n\
mull 8(%%ebp) \n\
addl %%eax, %%edi \n\
movl 28(%%ebp), %%eax #Calc: (dw0*dw2) \n\
mull 4(%%ebp) \n\
addl %%eax, %%esi \n\
adcl %%edx, %%edi \n\
movl 24(%%ebp), %%eax #Calc: (dw1*dw2) \n\
mull 4(%%ebp) \n\
addl %%eax, %%edi \n\
movl 28(%%ebp), %%eax #Calc: (dw0*dw3) \n\
mull (%%ebp) \n\
addl %%eax, %%edi \n\
pop %%ebp \n"
:"=b"(this->dw0),"=c"(this->dw1),"=S"(this->dw2),"=D"(this->dw3)
:"a"(&data):"%ebp");
dw0 ^= result_xor_mask;
dw1 ^= result_xor_mask;
dw2 ^= result_xor_mask;
dw3 ^= result_xor_mask;
return (*this += (result_xor_mask & 1));
}
};
You can also find a lot of related questions with the 128bit tag