Is there a reasonably portable way in C/C++ to multiply two 64-bit integers for a 128-bit result and get the top 64-bits of the result, rather than the bottom 64-bits? I need this for distributing a hash function over an arbitrary sized table.
This answer shows how to get the (exact) top 64-bits from a 64x64 bit multiply on a system that doesn't support 128-bit integers. The answer by @amdn will give better performance on systems that do support 128-bit integers.
The diagram below shows one method for computing a 128-bit product from two 64-bit numbers. Each black rectangle represents a 64-bit number. The 64-bit inputs to the method, X
and Y
, are divided into 32-bits chunks labeled a
, b
, c
, and d
. Then four 32x32 bit multiplications are performed, giving four 64-bit products labeled a*c
, b*c
, a*d
, and b*d
. The four products must be shifted and added to compute the final answer.
Note that the lower 32-bits of the 128-bit product are solely determined by the lower 32-bits of partial product b*d
. The next 32-bits are determined by the lower 32-bits of the following
mid34 = ((b*c) & 0xffffffff) + ((a*d) & 0xffffffff) + ((b*d) >> 32);
Note that mid34
is the sum of three 32-bit numbers and therefore is in fact a 34-bit sum. The upper two bits of mid34
act as a carry into the top 64-bits of the 64x64 bit multiply.
Which brings us to the demo code. The top64
function computes the upper 64-bits of a 64x64 multiply. It's a little verbose to allow for the calculation of the lower 64-bits to be shown in a comment. The main
function takes advantage of 128-bit integers to verify the results with a simple test case. Further testing is left as an exercise for the reader.
#include <stdio.h>
#include <stdint.h>
typedef unsigned __int128 uint128_t;
uint64_t top64( uint64_t x, uint64_t y )
{
uint64_t a = x >> 32, b = x & 0xffffffff;
uint64_t c = y >> 32, d = y & 0xffffffff;
uint64_t ac = a * c;
uint64_t bc = b * c;
uint64_t ad = a * d;
uint64_t bd = b * d;
uint64_t mid34 = (bd >> 32) + (bc & 0xffffffff) + (ad & 0xffffffff);
uint64_t upper64 = ac + (bc >> 32) + (ad >> 32) + (mid34 >> 32);
// uint64_t lower64 = (mid34 << 32) | (bd & 0xffffffff);
return upper64;
}
int main( void )
{
uint64_t x = 0x0000000100000003;
uint64_t y = 0x55555555ffffffff;
uint128_t m = x, n = y;
uint128_t p = m * n;
uint64_t top = p >> 64;
printf( "%016llx %016llx\n", top, top64( x, y ) );
}
A little algebra never hurt:
#include <stdint.h>
uint64_t top64(uint64_t x, uint64_t y) {
uint64_t a = x >> 32, b = x & 0xFFFFFFFF;
uint64_t c = y >> 32, d = y & 0xFFFFFFFF;
return a * c + ((b * d >> 32) + (a * d) + (b * c)) >> 32 +
((((a * d) & 0xFFFFFFFF) + ((b *c) & 0xFFFFFFFF) + ((b * d) >> 32)) >> 32);
}
ad
and bc
terms are doing in my equation. –
Portia x = y = 2**64-1
. Your method gives 18446744073709551613
, whereas the correct result is 18446744073709551614
. –
Shakespeare (((a*d)&0xffffffff) + ((b*c)&0xffffffff) + ((b*d)>>32)) >> 32
. Maybe there is a faster way of computing the carry than this. –
Spoonerism bd
can't be it: 0xFFFFFFFF * 0xFFFFFFFF = 0xfffffffe00000001L. Think about it: 2 ** 32 - 1 squared can't possibly be larger than 2 ** 32 squared. –
Portia 1 + 1 + 0xfffffffe = 0x100000000
; you seem to have addressed this in your latest edit. –
Shakespeare x=0x0000000100000003
y=0x55555555ffffffff
–
Dicephalous a*c + ((a*d + b*c + (b*d >> 32)) >> 32)
. The bottom 32 bits of b*d
can't affect the results since all the other terms have 0 in those bits. Edit: oops that partial sum can overflow, so maybe it can't really be simplified. –
Jesusa Both gcc and clang support 128-bit integers as an extension.
Here's one way to do it demo
#include <iostream>
#include <cstdint>
//https://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/_005f_005fint128.html#_005f_005fint128
using u128 = unsigned __int128;
using u64 = uint64_t;
void mul64x64( u64 a, u64 b, u64 & hi, u64 & lo ) {
u128 product = u128(a) * b;
lo = product;
hi = product >> 64;
}
int main()
{
u64 hi, lo;
mul64x64( 40282220, u64{1} << 63, hi, lo );
std::cout << hi << std::endl;
}
Output is
set -x ; clang++ -std=c++11 -O0 -Wall -Werror main.cpp && ./a.out
+ clang++ -std=c++11 -O0 -Wall -Werror main.cpp
+ ./a.out
20141110
© 2022 - 2024 — McMap. All rights reserved.
rdx
, the 64 low bits go intorax
. – Swartha*c + ((b*c)>>32) + ((a*d)>>32)
as others have said, without worrying about the other parts that contribute to a possible carry. – Spoonerism