Here I have some examples of processing Big Integers.
Addition
Principle is pretty simple. You need to use CF
(carry-flag) for any bigger overflow, with adc
(add with carry) propagating that carry between chunks. Let's think about two 128-bit number addition.
num1_lo: dq 1<<63
num1_hi: dq 1<<63
num2_lo: dq 1<<63
num2_hi: dq 1<<62
;Result of addition should be 0xC0000000 0x000000001 0x00000000 0x00000000
mov eax, dword [num1_lo]
mov ebx, dword [num1_lo+4]
mov ecx, dword [num1_hi]
mov edx, dword [num1_hi+4]
add eax, dword [num2_lo]
adc ebx, dword [num2_lo+4]
adc ecx, dword [num2_hi]
adc edx, dword [num2_hi+4]
; 128-bit integer sum in EDX:ECX:EBX:EAX
jc .overflow ; detect wrapping if you want
You don't need all of it in registers at once; you could store a 32-bit chunk before loading the next, because mov
doesn't affect FLAGS. (Looping is trickier, although dec
/jnz
is usable on modern CPUs which don't have partial-flag stalls for ADC reading CF after dec
writes other FLAGS. See Problems with ADC/SBB and INC/DEC in tight loops on some CPUs)
Subtraction
Very similar to addition, although you CF
is now called borrow.
mov eax, dword [num1_lo]
mov ebx, dword [num1_lo+4]
mov ecx, dword [num1_hi]
mov edx, dword [num1_hi+4]
sub eax, dword [num2_lo]
sbb ebx, dword [num2_lo+4]
sbb ecx, dword [num2_hi]
sbb edx, dword [num2_hi+4]
jb .overflow ;or jc
Multiplication
Is much more difficult. You need to multiply each part of first number with each part of second number and add the results. You don't have to multiply only two highest parts that will surely overflow. Pseudocode:
long long int /*128-bit*/ result = 0;
long long int n1 = ;
long long int n2 = ;
#define PART_WIDTH 32 //to be able to manipulate with numbers in 32-bit registers
int i_1 = 0; /*iteration index*/
for(each n-bit wide part of first number : n1_part) {
int i_2 = 0;
for(each n-bit wide part of second number : n2_part) {
result += (n1_part << (i_1*PART_WIDTH))*(n2_part << (i_2*PART_WIDTH));
i_2++;
}
i++;
}
Division
is even more complicated. User Brendan on OsDev.org forum posted example pseudocode for division of n-bit integers. I'm pasting it here because principle is the same.
result = 0;
count = 0;
remainder = numerator;
while(highest_bit_of_divisor_not_set) {
divisor = divisor << 1;
count++;
}
while(remainder != 0) {
if(remainder >= divisor) {
remainder = remainder - divisor;
result = result | (1 << count);
}
if(count == 0) {
break;
}
divisor = divisor >> 1;
count--;
}
Dividing a wide number by a 1-chunk (32 or 64-bit number) can use a sequence of div
instructions, using the remainder of the high element as the high half of the dividend for the next lower chunk. See Why should EDX be 0 before using the DIV instruction? for an example of when div
is useful with non-zero EDX.
But this doesn't generalize to N-chunk / N-chunk division, hence the above manual shift / subtract algorithm.
2^32
or2^64
. – Solve