What's So Difficult About `uint64_t`? (Conversion Assembly From `float`)

I am in a situation where I need to compute something like size_t s=(size_t)floorf(f);. That is, the argument is a float, but it has an integer value (assume floorf(f) is small enough to be represented exactly). While optimizing this, I discovered something interesting.

Here are some conversions from float to integers (GCC 5.2.0 -O3). For clarity, the conversion given is the return value of a test function.

Here's int32_t x=(int32_t)f:

    cvttss2si   eax, xmm0
    ret

Here's uint32_t x=(uint32_t)f:

    cvttss2si   rax, xmm0
    ret

Here's int64_t x=(int64_t)f:

    cvttss2si   rax, xmm0
    ret

Last, here's uint64_t x=(uint64_t)f;:

    ucomiss xmm0, DWORD PTR .LC2[rip]
    jnb .L4
    cvttss2si   rax, xmm0
    ret
.L4:
    subss   xmm0, DWORD PTR .LC2[rip]
    movabs  rdx, -9223372036854775808
    cvttss2si   rax, xmm0
    xor rax, rdx
    ret

.LC2:
    .long   1593835520

This last one is much more complex than the others. Moreover, Clang and MSVC behave similarly. For your convenience, I've translated it into pseudo-C:

float lc2 = (float)(/* 2^63 - 1 */);
if (f<lc2) {
    return (uint64_t)f;
} else {
    f -= lc2;
    uint64_t temp = (uint64_t)f;
    temp ^= /* 2^63 */; //Toggle highest bit
    return temp;
}

This looks like it is trying to compute the first overflow mod 64 correctly. That seems kindof bogus, since the documentation for cvttss2si tells me that if an overflow happens (at 2^32, not 2^64), "the indefinite integer value (80000000H) is returned".

My questions:

What is this really doing, and why?
Why wasn't something similar done for the other integer types?
How can I change the conversion so as to produce similar code (only output lines 3 and 4) (again, assume that the value is exactly representable)?

Since cvttss2si does a signed conversion, it will consider the numbers in the interval [2^63, 2^64) to be out of range, when in fact they are in range for unsigned. Hence, this case is detected and mapped to the low half in the float, and a correction is applied after conversion.

As for the other cases, notice that the uint32_t conversion still uses a 64 bit destination which will work for the full range of the uint32_t and further truncation is implicit by using the low 32 bits of the result according to calling convention.

In terms of avoiding the extra code, it depends on whether your input may fall into the above mentioned range. If it can, there is no way around it. Otherwise, a double cast first to signed then to unsigned could work, ie. (uint64_t)(int64_t)f.

Recommended topics

Hot tags