Safe (and costless) reinterpretation of sized data

I wanted to write my own "small vector" type, and the first hurdle has been figuring out how to implement the on-stack storage.

I stumbled upon std::aligned_storage, which seems purpose-designed for implementing arbitrary on-stack storage, but I'm very unclear as to what is and isn't safe to do. cppreference.com conveniently has an example of using std::aligned_storage, which I'll repeat here:

template<class T, std::size_t N>
class static_vector
{
    // properly aligned uninitialized storage for N T's
    typename std::aligned_storage<sizeof(T), alignof(T)>::type data[N];
    std::size_t m_size = 0;

public:
    // Create an object in aligned storage
    template<typename ...Args> void emplace_back(Args&&... args) 
    {
        if( m_size >= N ) // possible error handling
            throw std::bad_alloc{};

        // construct value in memory of aligned storage
        // using inplace operator new
        new(&data[m_size]) T(std::forward<Args>(args)...);
        ++m_size;
    }

    // Access an object in aligned storage
    const T& operator[](std::size_t pos) const 
    {
        // note: needs std::launder as of C++17
        return *reinterpret_cast<const T*>(&data[pos]);
    }

    // Delete objects from aligned storage
    ~static_vector() 
    {
        for(std::size_t pos = 0; pos < m_size; ++pos) {
            // note: needs std::launder as of C++17
            reinterpret_cast<T*>(&data[pos])->~T();
        }
    }
};

Almost everything here makes sense to me, except for those two comments saying:

note: needs std::launder as of C++17

The "as of" clause on its own is fairly confusing; does that mean that

This code is incorrect or non-portable, and a portable version should use std::launder (which was introduced in C++17), or
C++17 made a breaking change to memory aliasing / reinterpretation rules?

Moving past that, the use of std::launder concerns me from a performance perspective. My understanding is that, in most cases, the compiler is allowed to make very strong assumptions about memory aliasing (notably that pointers to different types do not refer to the same memory) in order to avoid redundant memory loads.

I'd like to keep that level of aliasing-certainty on the part of the compiler (i.e. have accesses to T from my small vector be equally optimizable as those to a normal T[] or T *), though from what I've read of std::launder, it sounds like a full aliasing barrier, i.e. the compiler has to assume it knows nothing of the origin of the laundered pointer. I'd be worried that using this at every operator[] would interfere with the usual load-store elimination.

Perhaps the compiler is smarter than that, or perhaps I'm misunderstanding how std::launder works in the first place. Regardless, I really don't feel like I know what I'm doing with this level of C++ memory-hacking. It would be great to know what I have to do for this particular use case, but if someone could enlighten me on the more general rules, that would be much appreciated.

Update (Further Exploration)

Reading up on this issue a bit more, my current understanding is that the example I pasted here has undefined behavior under the standard unless std::launder is used. That said, smaller experiments which demonstrate what I would believe to be undefined behavior, aren't showing either Clang or GCC to be as strict as the standard would seem to allow.

Let's start with something that is clearly unsafe in the case of aliasing pointers:

float definitelyNotSafe(float *y, int *z) {
    *y = 5.0;
    *z = 7;
    return *y;
}

As one might expect, both Clang and GCC (with optimizations and strict aliasing enabled) generate code that always returns 5.0; this function will not have the "desired" behavior if it is passed a y and z that alias:

.LCPI1_0:
        .long   1084227584              # float 5
definitelyNotSafe(float*, int*):              # @definitelyNotSafe(float*, int*)
        mov     dword ptr [rdi], 1084227584
        mov     dword ptr [rsi], 7
        movss   xmm0, dword ptr [rip + .LCPI1_0] # xmm0 = mem[0],zero,zero,zero
        ret

Things get a bit weirder, though, when the creation of aliasing pointers is visible to the compiler:

float somehowSafe(float x) {
    // Make some aliasing pointers
    auto y = &x;
    auto z = reinterpret_cast<int *>(y);

    *y = 5.0;
    *z = 7;

    return x;
}

In this case, both Clang and GCC (with -O3 and -fstrict-aliasing) generate code that observes the modification of x through z:

.LCPI0_0:
        .long   7                       # float 9.80908925E-45
somehowSafe(float):                       # @somehowSafe(float)
        movss   xmm0, dword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero,zero,zero
        ret

That said, it's not like the compiler is guaranteed to "take advantage" of undefined behavior; it is undefined, after all. And in that case, there was no profit in assuming *z = 7 had no effect. So what if we "motivated" the compiler to take advantage of strict aliasing?

int stillSomehowSafe(int x) {
    // Make some aliasing pointers
    auto y = &x;
    auto z = reinterpret_cast<float *>(y);

    auto product = float(x) * x * x * x * x * x;

    *y = 5;
    *z = product;

    return *y;
}

It would clearly be to the compiler's advantage to assume that *z = product has no effect on the value of *y; doing so would allow the compiler to simplify this function to one which simply always returns 5. Nonetheless, the generated code makes no such assumption:

stillSomehowSafe(int):                  # @stillSomehowSafe(int)
        cvtsi2ss        xmm0, edi
        movaps  xmm1, xmm0
        mulss   xmm1, xmm0
        mulss   xmm1, xmm0
        mulss   xmm1, xmm0
        mulss   xmm1, xmm0
        mulss   xmm1, xmm0
        movd    eax, xmm1
        ret

I'm rather puzzled by this behavior. I understand that we're given zero guarantees about what a compiler will do in the presence of undefined behavior, but I'm also surprised that neither Clang nor GCC is more aggressive with these sorts of optimizations. It makes me wonder if I misunderstand the standard, or if both Clang and GCC have weaker (and documented) definitions of "strict aliasing".

Update (Further Exploration)

Recommended topics

Hot tags