What is the modern, correct way to do type punning in C++?
Asked Answered
C

2

51

It seems like there are two types of C++. The practical C++ and the language lawyer C++. In certain situations, it can be useful to be able to interpret a bit pattern of one type as if it were a different type. Floating-point tricks are a notable example. Let's take the famous fast inverse square root (taken from Wikipedia, which was in turn taken from here):

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;                       // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );               // what the
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//  y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

    return y;
}

Setting aside details, it uses certain properties of the IEEE-754 floating-point bit representation. The interesting part here is the *(long*) cast from float* to long*. There are differences between C and C++ about which types of such reinterpreting casts are defined behavior, however in practice such techniques are used often in both languages.

The thing is that for such a simple problem there are a lot of pitfalls that can occur with the approach presented above and different others. To name some:

At the same time, there are a lot of ways of performing type punning and a lot of mechanisms related to it. These are all that I could find:

  • reinterpret_cast and c-style cast

    [[nodiscard]] float int_to_float1(int x) noexcept
    {
        return *reinterpret_cast<float*>(&x);
    }
    [[nodiscard]] float int_to_float2(int x) noexcept
    {
        return *(float*)(&x);
    }
    
  • static_cast and void*

    [[nodiscard]] float int_to_float3(int x) noexcept
    {
        return *static_cast<float*>(static_cast<void*>(&x));
    }
    
  • std::bit_cast

    [[nodiscard]] constexpr float int_to_float4(int x) noexcept
    {
        return std::bit_cast<float>(x);
    }
    
  • memcpy

    [[nodiscard]] float int_to_float5(int x) noexcept
    {
        float destination;
        memcpy(&destination, &x, sizeof(x));
        return destination;
    }
    
  • union

    [[nodiscard]] float int_to_float6(int x) noexcept
    {
        union {
            int as_int;
            float as_float;
        } destination{x};
        return destination.as_float;
    }
    
  • placement new and std::launder

    [[nodiscard]] float int_to_float7(int x) noexcept
    {
        new(&x) float;
        return *std::launder(reinterpret_cast<float*>(&x));
    }
    
  • std::byte

    [[nodiscard]] float int_to_float8(int x) noexcept
    {
        return *reinterpret_cast<float*>(reinterpret_cast<std::byte*>(&x));
    }
    

The question is which of these ways are safe, which are unsafe, and which are damned forever. Which one should be used and why? Is there a canonical one accepted by the C++ community? Why are new versions of C++ introducing even more mechanisms std::launder in C++17 or std::byte, std::bit_cast in C++20?

To give a concrete problem: what would be the safest, most performant, and best way to rewrite the fast inverse square root function? (Yes, I know that there is a suggestion of one way on Wikipedia).

Edit: To add to the confusion, it seems that there is a proposal that suggests adding yet another type punning mechanism: std::start_lifetime_as, which is also discussed in another question.

(godbolt)

Cavazos answered 21/5, 2021 at 11:45 Comment(12)
what you call practical vs language-lawyer is actually caring about portability or not. You can study what the compiler does when the standard claims that it is UB, but then you are bound to that compiler, a particular set of compiler options and one target platform. I am not a language lawyer but I have to write code that when compiled here also compiles there. And that is a very practial viewGratulate
I think only std::bit_cast and memcpy are not UB.Moorland
The standard lawyery consensus is in favor of the memcpy variant. It is well-defined and the compiler can and will optimize it to do the right thing.Penates
You need to read your compiler's documentation and find where they guarantee (going forward) a particular Standard breaking behaviour. If you can't find this then you are in the UB / works for now position.Pretense
std::bit_cast is only C++20 and later... But is certainly the modern way.Pyrogallol
"Very often there is a need to interpret a bit pattern of one type as if it were a different type" -- really? Change that to "occasionally there is an advantage to [...]" and I'd probably be on board.Afrikaner
i appriecate your aim of getting an overview, but I am afraid this is too broad, at least for me. I focused on the parts I can answer, while others are purely opinion-based imhoGratulate
btw "However, type punning through a union is considered bad practice in C++." from the wikipedia is not quite correct. Type punning through a union is undefined in C++, though many compilers offer it as extensionGratulate
@JohnBollinger In rendering and game development it is rather often.Cavazos
@janekb04, even if I accepted that, rendering and game development constitute a fairly small segment of all C++ development efforts.Afrikaner
@JohnBollinger Ok, you convinced me.Cavazos
Even if the motivation to do this for optimization purposes (e.g., fast inverse square roots) has lapsed over time, that doesn't mean that there is never a need for a modern-day C++ programmer to do type punning. It's quite common in embedded development.Stoical
G
14

This is what I get from gcc 11.1 with -O3:

int_to_float4(int):
        movd    xmm0, edi
        ret
int_to_float1(int):
        movd    xmm0, edi
        ret
int_to_float2(int):
        movd    xmm0, edi
        ret
int_to_float3(int):
        movd    xmm0, edi
        ret
int_to_float5(int):
        movd    xmm0, edi
        ret
int_to_float6(int):
        movd    xmm0, edi
        ret
int_to_float7(int):
        mov     DWORD PTR [rsp-4], edi
        movss   xmm0, DWORD PTR [rsp-4]
        ret
int_to_float8(int):
        movd    xmm0, edi
        ret

I had to add a auto x = &int_to_float4; to force gcc to actually emit anything for int_to_float4, I guess thats the reason it appears first.

Live Example

I am not that familiar with std::launder so I cannot tell why it is different. Otherwise they are identical. This is what gcc has to say about it (in this context, with that flags). What the standard says is different story. Though, memcpy(&destination, &x, sizeof(x)); is well defined and most compilers know how to optimize it. std::bit_cast was introduced in C++20 to make such casts more explicit. Note that in the possible implementation on cppreference they use std::memcpy ;).


TL;DR

what would be the safest, most performant and best way to rewrite the fast inverse square root function?

std::memcpy and in C++20 and beyond std::bit_cast.

Gratulate answered 21/5, 2021 at 12:7 Comment(3)
GCC is not generating any code for int_to_float4 because the function doesn't do anything and you marked it constexpr. It's always going to inline it, and it's not used, so there's no need to emit code for the function. As an alternative to taking the address of the function, you can just remove the constexpr. (Note that the [[noinline]] attribute is ignored for constexpr functions.) That being said, I don't see how an assembly listing of what one particular compiler generates is anywhere close to being an answer to this question.Stoical
Indeed, looking at compiler output for trivial examples is often misleading. Several of these approaches may look fine in isolation, but when slotted into more complex code, may break badly, due to things like strict aliasing violations.Queer
@CodyGray It isn't. Out of curiosity I was looking at gcc's output for OPs code (no modifications, thats the reason for constexpr and [[noinline]]). Instead of sharing a godbolt link in a comment this is what happened. Indeed, it is bad to give the impression this example of gcc's output would help to answer any of OPs questions and then there is nothing more than "use memcpy". My mistake was to not recognize a high quality question and then trying to phrase something that isnt an answer as if it was one. This needs a serious rewrite, but it'll take some timeGratulate
B
18

First of all, you assume that sizeof(long) == sizeof(int) == sizeof(float). This is not always true, and totally unspecified (platform dependent). Actually, this is true on my Windows using clang-cl and wrong on my Linux using the same 64-bit machine. Different compilers on the same OS/machine can give different results. A static assert is at least required to avoid sneaky bugs.

Plain C casts, reinterpret casts and static casts are invalid here because of the strict aliasing rule (to be pedantic, the program is ill-formed in this case regarding the C++ standard). The union solution is not valid too (it is only valid in C, not in C++). Only the std::bit_cast and the std::memcpy solution are "safe" (assuming the size of the types are matching on the target plateform). Using std::memcpy is often fast as it is optimized by most mainstream compiler (when optimizations are enabled, like with -O3 for GCC/Clang): the std::memcpy call can be inlined and replaced by faster instructions. std::bit_cast is the new way of doing this (only since C++20). The last solution is cleaner for a C++ code as std::memcpy use unsafe void* types and thus by-pass the type system.

Blackandwhite answered 21/5, 2021 at 11:58 Comment(1)
I believe the OP knows all that. Repetition of Q is not an answer.Fredfreda
G
14

This is what I get from gcc 11.1 with -O3:

int_to_float4(int):
        movd    xmm0, edi
        ret
int_to_float1(int):
        movd    xmm0, edi
        ret
int_to_float2(int):
        movd    xmm0, edi
        ret
int_to_float3(int):
        movd    xmm0, edi
        ret
int_to_float5(int):
        movd    xmm0, edi
        ret
int_to_float6(int):
        movd    xmm0, edi
        ret
int_to_float7(int):
        mov     DWORD PTR [rsp-4], edi
        movss   xmm0, DWORD PTR [rsp-4]
        ret
int_to_float8(int):
        movd    xmm0, edi
        ret

I had to add a auto x = &int_to_float4; to force gcc to actually emit anything for int_to_float4, I guess thats the reason it appears first.

Live Example

I am not that familiar with std::launder so I cannot tell why it is different. Otherwise they are identical. This is what gcc has to say about it (in this context, with that flags). What the standard says is different story. Though, memcpy(&destination, &x, sizeof(x)); is well defined and most compilers know how to optimize it. std::bit_cast was introduced in C++20 to make such casts more explicit. Note that in the possible implementation on cppreference they use std::memcpy ;).


TL;DR

what would be the safest, most performant and best way to rewrite the fast inverse square root function?

std::memcpy and in C++20 and beyond std::bit_cast.

Gratulate answered 21/5, 2021 at 12:7 Comment(3)
GCC is not generating any code for int_to_float4 because the function doesn't do anything and you marked it constexpr. It's always going to inline it, and it's not used, so there's no need to emit code for the function. As an alternative to taking the address of the function, you can just remove the constexpr. (Note that the [[noinline]] attribute is ignored for constexpr functions.) That being said, I don't see how an assembly listing of what one particular compiler generates is anywhere close to being an answer to this question.Stoical
Indeed, looking at compiler output for trivial examples is often misleading. Several of these approaches may look fine in isolation, but when slotted into more complex code, may break badly, due to things like strict aliasing violations.Queer
@CodyGray It isn't. Out of curiosity I was looking at gcc's output for OPs code (no modifications, thats the reason for constexpr and [[noinline]]). Instead of sharing a godbolt link in a comment this is what happened. Indeed, it is bad to give the impression this example of gcc's output would help to answer any of OPs questions and then there is nothing more than "use memcpy". My mistake was to not recognize a high quality question and then trying to phrase something that isnt an answer as if it was one. This needs a serious rewrite, but it'll take some timeGratulate

© 2022 - 2024 — McMap. All rights reserved.