Get GCC To Use Carry Logic For Arbitrary Precision Arithmetic Without Inline Assembly? - McMap

About

Get GCC To Use Carry Logic For Arbitrary Precision Arithmetic Without Inline Assembly?

Asked 29/3, 2013 at 2:41 Answered 29/3, 2013 at 13:57

c optimization gcc compiler-optimization arbitrary-precision

D

1

9

When working with arbitrary precision arithmetic (e.g. 512-bit integers), is there any way to get GCC to use ADC and similar instructions without using inline assembly?

A first glance at GMP's sourcecode shows that they simply have assembly implementations for every supported platform.

Here is the test code I wrote, which adds two 128-bit numbers from the command line and prints the result. (Inspired by mini-gmp's add_n):

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

int main (int argc, char **argv)
{
    uint32_t a[4];
    uint32_t b[4];
    uint32_t c[4];
    uint32_t carry = 0;

    for (int i = 0; i < 4; ++i)
    {
        a[i] = strtoul (argv[i+1], NULL, 16);
        b[i] = strtoul (argv[i+5], NULL, 16);
    }

    for (int i = 0; i < 4; ++i)
    {
        uint32_t aa = a[i];
        uint32_t bb = b[i];
        uint32_t r = aa + carry;
        carry = (r < carry);
        r += bb;
        carry += (r < bb);
        c[i] = r;
    }

    printf ("%08X%08X%08X%08X + %08X%08X%08X%08X =\n", a[3], a[2], a[1], a[0], b[3], b[2], b[1], b[0]);
    printf ("%08X%08X%08X%08X\n", c[3], c[2], c[1], c[0]);

    return 0;
}

GCC -O3 -std=c99 Does not produce any adc instructions, as checked by objdump. My gcc version is i686-pc-mingw32-gcc (GCC) 4.5.2.

Decahedron answered 29/3, 2013 at 2:41 Comment(6)

Nope. Not gonna happen that easily (if it's at all possible). That's one of the reasons why GMP uses inline assembly. – Braxy 29/3, 2013 at 2:43

@Mysticial: I agree, but I wanted to pose the question just in case. I saw little meaningful discussion on the topic during my Google searches. – Decahedron 29/3, 2013 at 2:47

Yeah, I know exactly what you're talking about since I've tried to do the exact same before and failed miserably. Not only is GCC not able to do it, but neither will MSVC nor ICC. In short, it requires a very specialized compiler optimization pass to detect the intent - which is so niche that no compiler writer would waste time on such a thing. – Braxy 29/3, 2013 at 2:49

@Mysticial, I got ICC to do this efficiently using the _addcarry_u64 intrinsic stackoverflow.com/questions/29029572/… – Ivelisseivens 23/3, 2015 at 14:23

@Zboson ICC and MSVC didn't have that intrinsic until very recently. (~1 year ago) – Braxy 23/3, 2015 at 14:48

@Mysticial, woah, I did not realized MSVC had _addcarry_u64. I just tried it and indeed it produces 1x add, and 3x adc ! What's up with GCC and Clang then? – Ivelisseivens 23/3, 2015 at 16:40

D

1

GCC will use the carry flag if it can see that it needs to:
When adding two uint64_t values on a 32-bit machine, for example, this must result in one 32-bit ADD plus one 32-bit ADC. But apart from those cases, where the compiler is forced to use the carry, it probably cannot be persuaded to do so w/o assembler. Therefore, it may be beneficial to use the biggest integer type available to allow GCC to optimize operations by effectively letting it know that the single 'components' of the value belong together.

For the simple addition, another way to calculate the carry could be to look at the relevant bits in the operands, like:

uint32_t aa,bb,rr;
bool msbA, msbB, msbR, carry;
// ...

rr = aa+bb;

msbA = aa >= (1<<31); // equivalent: (aa & (1<<31)) != 0;
msbB = bb >= (1<<31);
msbR = rr >= (1<<31);


carry = (msbA && msbB) || ( !msbR && ( msbA || msbB) );

Dishonest answered 29/3, 2013 at 13:57 Comment(8)

Using a bigger integer type only pushes the problem to a higher level. Unless GCC has 512-bit integer types, you still need to mess with the carry logic. – Braxy 29/3, 2013 at 14:27

Of course, you will definitely have to compute your own carry sooner or later. But when using uint64_t instead of uint32_t, for example, you will save yourself 50% of the 'manual' carry flag calculations and their processing time. – Dishonest 29/3, 2013 at 14:31

If you are going for speed (as most people are in multiprecision packages), you might want to compute the carry using aa, bb, and rr and then right shiftng 31 bits, saving 2 shifts. You also probably want to non-branching operators & | ~ because these bit are presumably very hard to predict and so a branch predictor will likely guess wrong while executing the expression. – Pfosi 19/3, 2015 at 14:15

Another cheap trick is to only use 31 bits of the 32 bit word (even better, 63 of 64 bits). You give up 3% or less of your storage capacity, but the carry bit gets computed for you in the MSB of the sum. – Pfosi 19/3, 2015 at 14:16

By the way, for unsigned types carry == ((a+b) < a). – Dishonest 19/3, 2015 at 15:27

If you need multiprecision arithmethic occasionally, then doing the carry computation klunkily in C won't have any real impact on performance of the application oveall. Getting the carry managed by the compiler is only going to matter if your computation uses the multiprecision arithmetic very heavily. Then you are likely to find out that divide-multiprecision is the real cost. – Pfosi 20/3, 2015 at 10:59

@HannoBinder: ((a+b)<a) works in languages (like C) that don't disallow overflows. I work with a language in which the results after overflow are explicitly defined as "undefined behavior". (Life sometimes isn't easy being green). – Pfosi 20/3, 2015 at 11:2

Maybe it is worth having a look at GCC's Integer Overflow Builtins: gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html – Cody 7/9, 2016 at 15:34

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.