Get GCC To Use Carry Logic For Arbitrary Precision Arithmetic Without Inline Assembly?
Asked Answered
D

1

9

When working with arbitrary precision arithmetic (e.g. 512-bit integers), is there any way to get GCC to use ADC and similar instructions without using inline assembly?

A first glance at GMP's sourcecode shows that they simply have assembly implementations for every supported platform.

Here is the test code I wrote, which adds two 128-bit numbers from the command line and prints the result. (Inspired by mini-gmp's add_n):

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

int main (int argc, char **argv)
{
    uint32_t a[4];
    uint32_t b[4];
    uint32_t c[4];
    uint32_t carry = 0;

    for (int i = 0; i < 4; ++i)
    {
        a[i] = strtoul (argv[i+1], NULL, 16);
        b[i] = strtoul (argv[i+5], NULL, 16);
    }

    for (int i = 0; i < 4; ++i)
    {
        uint32_t aa = a[i];
        uint32_t bb = b[i];
        uint32_t r = aa + carry;
        carry = (r < carry);
        r += bb;
        carry += (r < bb);
        c[i] = r;
    }

    printf ("%08X%08X%08X%08X + %08X%08X%08X%08X =\n", a[3], a[2], a[1], a[0], b[3], b[2], b[1], b[0]);
    printf ("%08X%08X%08X%08X\n", c[3], c[2], c[1], c[0]);

    return 0;
}

GCC -O3 -std=c99 Does not produce any adc instructions, as checked by objdump. My gcc version is i686-pc-mingw32-gcc (GCC) 4.5.2.

Decahedron answered 29/3, 2013 at 2:41 Comment(6)
Nope. Not gonna happen that easily (if it's at all possible). That's one of the reasons why GMP uses inline assembly.Braxy
@Mysticial: I agree, but I wanted to pose the question just in case. I saw little meaningful discussion on the topic during my Google searches.Decahedron
Yeah, I know exactly what you're talking about since I've tried to do the exact same before and failed miserably. Not only is GCC not able to do it, but neither will MSVC nor ICC. In short, it requires a very specialized compiler optimization pass to detect the intent - which is so niche that no compiler writer would waste time on such a thing.Braxy
@Mysticial, I got ICC to do this efficiently using the _addcarry_u64 intrinsic stackoverflow.com/questions/29029572/…Ivelisseivens
@Zboson ICC and MSVC didn't have that intrinsic until very recently. (~1 year ago)Braxy
@Mysticial, woah, I did not realized MSVC had _addcarry_u64. I just tried it and indeed it produces 1x add, and 3x adc ! What's up with GCC and Clang then?Ivelisseivens
D
1

GCC will use the carry flag if it can see that it needs to:
When adding two uint64_t values on a 32-bit machine, for example, this must result in one 32-bit ADD plus one 32-bit ADC. But apart from those cases, where the compiler is forced to use the carry, it probably cannot be persuaded to do so w/o assembler. Therefore, it may be beneficial to use the biggest integer type available to allow GCC to optimize operations by effectively letting it know that the single 'components' of the value belong together.

For the simple addition, another way to calculate the carry could be to look at the relevant bits in the operands, like:

uint32_t aa,bb,rr;
bool msbA, msbB, msbR, carry;
// ...

rr = aa+bb;

msbA = aa >= (1<<31); // equivalent: (aa & (1<<31)) != 0;
msbB = bb >= (1<<31);
msbR = rr >= (1<<31);


carry = (msbA && msbB) || ( !msbR && ( msbA || msbB) );
Dishonest answered 29/3, 2013 at 13:57 Comment(8)
Using a bigger integer type only pushes the problem to a higher level. Unless GCC has 512-bit integer types, you still need to mess with the carry logic.Braxy
Of course, you will definitely have to compute your own carry sooner or later. But when using uint64_t instead of uint32_t, for example, you will save yourself 50% of the 'manual' carry flag calculations and their processing time.Dishonest
If you are going for speed (as most people are in multiprecision packages), you might want to compute the carry using aa, bb, and rr and then right shiftng 31 bits, saving 2 shifts. You also probably want to non-branching operators & | ~ because these bit are presumably very hard to predict and so a branch predictor will likely guess wrong while executing the expression.Pfosi
Another cheap trick is to only use 31 bits of the 32 bit word (even better, 63 of 64 bits). You give up 3% or less of your storage capacity, but the carry bit gets computed for you in the MSB of the sum.Pfosi
By the way, for unsigned types carry == ((a+b) < a).Dishonest
If you need multiprecision arithmethic occasionally, then doing the carry computation klunkily in C won't have any real impact on performance of the application oveall. Getting the carry managed by the compiler is only going to matter if your computation uses the multiprecision arithmetic very heavily. Then you are likely to find out that divide-multiprecision is the real cost.Pfosi
@HannoBinder: ((a+b)<a) works in languages (like C) that don't disallow overflows. I work with a language in which the results after overflow are explicitly defined as "undefined behavior". (Life sometimes isn't easy being green).Pfosi
Maybe it is worth having a look at GCC's Integer Overflow Builtins: gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.htmlCody

© 2022 - 2024 — McMap. All rights reserved.