Do there exist intrinsics for ARM C compilers to do add-with-carry operations, or is it necessary to use assembly language?
On x86, there is _addcarry_u64
for add-with-carry. (There's also the newer _addcarryx_u64
for special purposes.)
Do there exist intrinsics for ARM C compilers to do add-with-carry operations, or is it necessary to use assembly language?
On x86, there is _addcarry_u64
for add-with-carry. (There's also the newer _addcarryx_u64
for special purposes.)
From old documentation (as old as gcc 5 !!!!)
https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Integer-Overflow-Builtins.html
Both clang and GCC do implement these builtins, and I verified the generated code is optimal on both x86_64 and aarch64 targets
#include <stdint.h>
typedef unsigned __int128 uint128_t;
// carry_out = a + b + carry_in
uint8_t my_addcarry_u64(uint8_t carry_in, uint64_t a, uint64_t b, uint64_t * sum)
{
bool c;
uint64_t res;
c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res);
c |= __builtin_uaddll_overflow (res, carry_in, (long long unsigned *)&res);
*sum = res;
return c;
}
// carry_out = a + b + carry_in
uint8_t my_addcarry_u128(uint8_t carry_in, uint128_t a, uint128_t b, uint128_t * sum)
{
bool c;
uint64_t res_lo, res_hi;
c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res_lo);
c |= __builtin_uaddll_overflow (carry_in, res_lo, (long long unsigned *)&res_lo);
c = __builtin_uaddll_overflow (a >> 64, c, (long long unsigned *)&res_hi);
c |= __builtin_uaddll_overflow (b >> 64, res_hi, (long long unsigned *)&res_hi);
*sum = ((uint128_t)res_hi << 64) + res_lo;
return c;
}
Even if the original post is old, I provide a solution to the original question, in case someone reads this thread again
There is no intrinsic with current versions of gcc (gcc5 was released the year this question was asked). An issue is that communication of the 'carry flag'. However, the ARM backend does know and define a set of ADC
primitives such as addsi3_carryin.
For example,
unsigned long long big_inc(unsigned long long x)
{
return ++x;
}
Is translated to,
big_inc(unsigned long long):
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
adds r0, r0, #1
adc r1, r1, #0
bx lr
It is always instructive to look at open source multi-precision libraries when you have a question like this. There is OpenSSL bignum and GNU MP libraries without any research. As the intrinsic doesn't exist a more definitive answer (for your work) depends on exactly what it is you want to achieve; prime factors, multiply, add, etc. You can always use assembler or more powerfully use a script that generates assembler for your particular integer length.
adds
/adc
/adcs
appropriately to do an operation. However, the inside of the loop can often be combined for efficiency gains as you don't need to keep loading memory; the comment above about it depends what you want to do... –
Ilia From old documentation (as old as gcc 5 !!!!)
https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Integer-Overflow-Builtins.html
Both clang and GCC do implement these builtins, and I verified the generated code is optimal on both x86_64 and aarch64 targets
#include <stdint.h>
typedef unsigned __int128 uint128_t;
// carry_out = a + b + carry_in
uint8_t my_addcarry_u64(uint8_t carry_in, uint64_t a, uint64_t b, uint64_t * sum)
{
bool c;
uint64_t res;
c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res);
c |= __builtin_uaddll_overflow (res, carry_in, (long long unsigned *)&res);
*sum = res;
return c;
}
// carry_out = a + b + carry_in
uint8_t my_addcarry_u128(uint8_t carry_in, uint128_t a, uint128_t b, uint128_t * sum)
{
bool c;
uint64_t res_lo, res_hi;
c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res_lo);
c |= __builtin_uaddll_overflow (carry_in, res_lo, (long long unsigned *)&res_lo);
c = __builtin_uaddll_overflow (a >> 64, c, (long long unsigned *)&res_hi);
c |= __builtin_uaddll_overflow (b >> 64, res_hi, (long long unsigned *)&res_hi);
*sum = ((uint128_t)res_hi << 64) + res_lo;
return c;
}
Even if the original post is old, I provide a solution to the original question, in case someone reads this thread again
© 2022 - 2025 — McMap. All rights reserved.
int64_t
on 32bit machines withadc
, so you can cast to that. GNU C (gcc / clang) on 64bit platforms usually supports__int128_t
. – Elflandadds
/adc
to add two__int128_t
args), but Aarch64 gcc 4.8 uses the cmov-and-incrementcsinc
instruction. See the output on godbolt. IDK why aarch64 gcc 4.8 doesn't manage to just useadc
. – Elfland__builtin
function for add-with-carry :( – Elfland