Are there ARM intrinsics for add-with-carry in C?

Asked 9/5, 2016 at 23:21 Answered 19/8, 2024 at 23:54

Do there exist intrinsics for ARM C compilers to do add-with-carry operations, or is it necessary to use assembly language?

On x86, there is _addcarry_u64 for add-with-carry. (There's also the newer _addcarryx_u64 for special purposes.)

Nobel answered 9/5, 2016 at 23:21 Comment(5)

Compilers typically implement int64_t on 32bit machines with adc, so you can cast to that. GNU C (gcc / clang) on 64bit platforms usually supports __int128_t. – Elfland 10/5, 2016 at 1:59

Aarch64 has an adc instruction that gcc 5.2 uses. (adds / adc to add two __int128_t args), but Aarch64 gcc 4.8 uses the cmov-and-increment csinc instruction. See the output on godbolt. IDK why aarch64 gcc 4.8 doesn't manage to just use adc. – Elfland 10/5, 2016 at 2:39

I don't see a gcc __builtin function for add-with-carry :( – Elfland 10/5, 2016 at 2:50

Note that for whatever it is you're trying to do which would need this, you'd equally need another intrinsic for setting the carry flag in the first place. I'm pretty certain neither exists. – Convector 10/5, 2016 at 8:32

The situation with x86 is not any better at leas for GCC and Clang #33691291 – Literary 16/5, 2016 at 7:43

From old documentation (as old as gcc 5 !!!!)

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Integer-Overflow-Builtins.html

Both clang and GCC do implement these builtins, and I verified the generated code is optimal on both x86_64 and aarch64 targets

#include <stdint.h>

typedef unsigned __int128 uint128_t;


// carry_out = a + b + carry_in
uint8_t my_addcarry_u64(uint8_t carry_in, uint64_t a, uint64_t b, uint64_t * sum)
{
        bool c;
        uint64_t res;
        c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res);
        c |= __builtin_uaddll_overflow (res, carry_in, (long long unsigned *)&res);
        *sum = res;
        return c;
}

// carry_out = a + b + carry_in
uint8_t my_addcarry_u128(uint8_t carry_in, uint128_t a, uint128_t b, uint128_t * sum)
{
        bool c;
        uint64_t res_lo, res_hi;
        c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res_lo);
        c |= __builtin_uaddll_overflow (carry_in, res_lo, (long long unsigned *)&res_lo);
        c = __builtin_uaddll_overflow (a >> 64, c, (long long unsigned *)&res_hi);
        c |= __builtin_uaddll_overflow (b >> 64, res_hi, (long long unsigned *)&res_hi);
        *sum = ((uint128_t)res_hi << 64) + res_lo;
        return c;
}

Even if the original post is old, I provide a solution to the original question, in case someone reads this thread again

Sedgewake answered 19/8, 2024 at 23:54 Comment(1)

This feature has now been released in the latest product update released version Visual Studio 2022 version 17.7 (e.g. _add_overflow_i8 ....) – Sedgewake 2/9, 2024 at 12:43

There is no intrinsic with current versions of gcc (gcc5 was released the year this question was asked). An issue is that communication of the 'carry flag'. However, the ARM backend does know and define a set of ADC primitives such as addsi3_carryin.

For example,

unsigned long long big_inc(unsigned long long x)
{
  return ++x;
}

Is translated to,

big_inc(unsigned long long):
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        adds    r0, r0, #1
        adc     r1, r1, #0
        bx      lr

It is always instructive to look at open source multi-precision libraries when you have a question like this. There is OpenSSL bignum and GNU MP libraries without any research. As the intrinsic doesn't exist a more definitive answer (for your work) depends on exactly what it is you want to achieve; prime factors, multiply, add, etc. You can always use assembler or more powerfully use a script that generates assembler for your particular integer length.

Ilia answered 10/5, 2016 at 13:33 Comment(1)

A lot of general purpose libraries will do something like Duff's device and run through to 'int[]` arrays to do the adds/adc/adcs appropriately to do an operation. However, the inside of the loop can often be combined for efficiency gains as you don't need to keep loading memory; the comment above about it depends what you want to do... – Ilia 10/5, 2016 at 21:17