In GCC-style extended inline asm, is it possible to output a "virtualized" boolean value, e.g. the carry flag?
Asked Answered
M

2

7

If I have the following C++ code to compare two 128-bit unsigned integers, with inline amd-64 asm:

struct uint128_t {
    uint64_t lo, hi;
};
inline bool operator< (const uint128_t &a, const uint128_t &b)
{
    uint64_t temp;
    bool result;
    __asm__(
        "cmpq %3, %2;"
        "sbbq %4, %1;"
        "setc %0;"
        : // outputs:
        /*0*/"=r,1,2"(result),
        /*1*/"=r,r,r"(temp)
        : // inputs:
        /*2*/"r,r,r"(a.lo),
        /*3*/"emr,emr,emr"(b.lo),
        /*4*/"emr,emr,emr"(b.hi),
        "1"(a.hi));
    return result;
}

Then it will be inlined quite efficiently, but with one flaw. The return value is done through the "interface" of a general register with a value of 0 or 1. This adds two or three unnecessary extra instructions and detracts from a compare operation that would otherwise be fully optimized. The generated code will look something like this:

    mov    r10, [r14]
    mov    r11, [r14+8]
    cmp    r10, [r15]
    sbb    r11, [r15+8]
    setc   al
    movzx  eax, al
    test   eax, eax
    jnz    is_lessthan

If I use "sbb %0,%0" with an "int" return value instead of "setc %0" with a "bool" return value, there's still two extra instructions:

    mov    r10, [r14]
    mov    r11, [r14+8]
    cmp    r10, [r15]
    sbb    r11, [r15+8]
    sbb    eax, eax
    test   eax, eax
    jnz    is_lessthan

What I want is this:

    mov    r10, [r14]
    mov    r11, [r14+8]
    cmp    r10, [r15]
    sbb    r11, [r15+8]
    jc     is_lessthan

GCC extended inline asm is wonderful, otherwise. But I want it to be just as good as an intrinsic function would be, in every way. I want to be able to directly return a boolean value in the form of the state of a CPU flag or flags, without having to "render" it into a general register.

Is this possible, or would GCC (and the Intel C++ compiler, which also allows this form of inline asm to be used) have to be modified or even refactored to make it possible?

Also, while I'm at it — is there any other way my formulation of the compare operator could be improved?

Macassar answered 20/2, 2010 at 8:17 Comment(1)
As of 2013, still doesn't seem possible to do directly. But here's a 2011 bug report discussing the desirability of such a feature: gcc.gnu.org/bugzilla/show_bug.cgi?id=49611. It links to a 2001 Linux kernel thread also wishing for such a thing: lkml.indiana.edu/hypermail/linux/kernel/0111.2/0256.html.Hauteur
N
7

Here we are almost 7 years later, and YES, gcc finally added support for "outputting flags" (added in 6.1.0, released ~April 2016). The detailed docs are here, but in short, it looks like this:

/* Test if bit 0 is set in 'value' */
char a;

asm("bt $0, %1"
    : "=@ccc" (a)
    : "r" (value) );

if (a)
   blah;

To understand =@ccc: The output constraint (which requires =) is of type @cc followed by the condition code to use (in this case c to reference the carry flag).

Ok, this may not be an issue for your specific case anymore (since gcc now supports comparing 128bit data types directly), but (currently) 1,326 people have viewed this question. Apparently there's some interest in this feature.

Now I personally favor the school of thought that says don't use inline asm at all. But if you must, yes you can (now) 'output' flags.

FWIW.

Niche answered 16/1, 2017 at 3:20 Comment(0)
P
6

I don't know a way to do this. You may or may not consider this an improvement:

inline bool operator< (const uint128_t &a, const uint128_t &b)
{
    register uint64_t temp = a.hi;
    __asm__(
        "cmpq %2, %1;"
        "sbbq $0, %0;"
        : // outputs:
        /*0*/"=r"(temp)
        : // inputs:
        /*1*/"r"(a.lo),
        /*2*/"mr"(b.lo),
        "0"(temp));

    return temp < b.hi;
}

It produces something like:

mov    rdx, [r14]
mov    rax, [r14+8]
cmp    rdx, [r15]
sbb    rax, 0
cmp    rax, [r15+8]
jc is_lessthan
Pascual answered 21/2, 2010 at 8:26 Comment(1)
+1 Excellent improvement. This does benchmark better (for doing a multi-gigabyte sort operation) than what I posted in the question. I'm still holding out hope there might be a way to return the carry flag directly, so I won't mark this as "accepted answer" yet, but, nicely done.Macassar

© 2022 - 2024 — McMap. All rights reserved.