I wanted to write my own "small vector" type, and the first hurdle has been figuring out how to implement the on-stack storage.
I stumbled upon std::aligned_storage
, which seems purpose-designed for implementing arbitrary on-stack storage, but I'm very unclear as to what is and isn't safe to do. cppreference.com conveniently has an example of using std::aligned_storage
, which I'll repeat here:
template<class T, std::size_t N>
class static_vector
{
// properly aligned uninitialized storage for N T's
typename std::aligned_storage<sizeof(T), alignof(T)>::type data[N];
std::size_t m_size = 0;
public:
// Create an object in aligned storage
template<typename ...Args> void emplace_back(Args&&... args)
{
if( m_size >= N ) // possible error handling
throw std::bad_alloc{};
// construct value in memory of aligned storage
// using inplace operator new
new(&data[m_size]) T(std::forward<Args>(args)...);
++m_size;
}
// Access an object in aligned storage
const T& operator[](std::size_t pos) const
{
// note: needs std::launder as of C++17
return *reinterpret_cast<const T*>(&data[pos]);
}
// Delete objects from aligned storage
~static_vector()
{
for(std::size_t pos = 0; pos < m_size; ++pos) {
// note: needs std::launder as of C++17
reinterpret_cast<T*>(&data[pos])->~T();
}
}
};
Almost everything here makes sense to me, except for those two comments saying:
note: needs
std::launder
as of C++17
The "as of" clause on its own is fairly confusing; does that mean that
This code is incorrect or non-portable, and a portable version should use
std::launder
(which was introduced in C++17), orC++17 made a breaking change to memory aliasing / reinterpretation rules?
Moving past that, the use of std::launder
concerns me from a performance perspective. My understanding is that, in most cases, the compiler is allowed to make very strong assumptions about memory aliasing (notably that pointers to different types do not refer to the same memory) in order to avoid redundant memory loads.
I'd like to keep that level of aliasing-certainty on the part of the compiler (i.e. have accesses to T
from my small vector be equally optimizable as those to a normal T[]
or T *
), though from what I've read of std::launder
, it sounds like a full aliasing barrier, i.e. the compiler has to assume it knows nothing of the origin of the laundered pointer. I'd be worried that using this at every operator[]
would interfere with the usual load-store elimination.
Perhaps the compiler is smarter than that, or perhaps I'm misunderstanding how std::launder
works in the first place. Regardless, I really don't feel like I know what I'm doing with this level of C++ memory-hacking. It would be great to know what I have to do for this particular use case, but if someone could enlighten me on the more general rules, that would be much appreciated.
Update (Further Exploration)
Reading up on this issue a bit more, my current understanding is that the example I pasted here has undefined behavior under the standard unless std::launder
is used. That said, smaller experiments which demonstrate what I would believe to be undefined behavior, aren't showing either Clang or GCC to be as strict as the standard would seem to allow.
Let's start with something that is clearly unsafe in the case of aliasing pointers:
float definitelyNotSafe(float *y, int *z) {
*y = 5.0;
*z = 7;
return *y;
}
As one might expect, both Clang and GCC (with optimizations and strict aliasing enabled) generate code that always returns 5.0
; this function will not have the "desired" behavior if it is passed a y
and z
that alias:
.LCPI1_0:
.long 1084227584 # float 5
definitelyNotSafe(float*, int*): # @definitelyNotSafe(float*, int*)
mov dword ptr [rdi], 1084227584
mov dword ptr [rsi], 7
movss xmm0, dword ptr [rip + .LCPI1_0] # xmm0 = mem[0],zero,zero,zero
ret
Things get a bit weirder, though, when the creation of aliasing pointers is visible to the compiler:
float somehowSafe(float x) {
// Make some aliasing pointers
auto y = &x;
auto z = reinterpret_cast<int *>(y);
*y = 5.0;
*z = 7;
return x;
}
In this case, both Clang and GCC (with -O3
and -fstrict-aliasing
) generate code that observes the modification of x
through z
:
.LCPI0_0:
.long 7 # float 9.80908925E-45
somehowSafe(float): # @somehowSafe(float)
movss xmm0, dword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero,zero,zero
ret
That said, it's not like the compiler is guaranteed to "take advantage" of undefined behavior; it is undefined, after all. And in that case, there was no profit in assuming *z = 7
had no effect. So what if we "motivated" the compiler to take advantage of strict aliasing?
int stillSomehowSafe(int x) {
// Make some aliasing pointers
auto y = &x;
auto z = reinterpret_cast<float *>(y);
auto product = float(x) * x * x * x * x * x;
*y = 5;
*z = product;
return *y;
}
It would clearly be to the compiler's advantage to assume that *z = product
has no effect on the value of *y
; doing so would allow the compiler to simplify this function to one which simply always returns 5
. Nonetheless, the generated code makes no such assumption:
stillSomehowSafe(int): # @stillSomehowSafe(int)
cvtsi2ss xmm0, edi
movaps xmm1, xmm0
mulss xmm1, xmm0
mulss xmm1, xmm0
mulss xmm1, xmm0
mulss xmm1, xmm0
mulss xmm1, xmm0
movd eax, xmm1
ret
I'm rather puzzled by this behavior. I understand that we're given zero guarantees about what a compiler will do in the presence of undefined behavior, but I'm also surprised that neither Clang nor GCC is more aggressive with these sorts of optimizations. It makes me wonder if I misunderstand the standard, or if both Clang and GCC have weaker (and documented) definitions of "strict aliasing".
std::launder
. But as the linked answer mentions, as far as I am also aware, no compiler optimizes based on that. – Kennykeno