Assume that you have to implement a static_vector<T, N>
class, which is a fixed capacity container that entirely lives on the stack and never allocates, and exposes an std::vector
-like interface. (Boost provides boost::static_vector
.)
Considering that we must have uninitialized storage for maximum N
instances of T
, there are multiple choices that can be made when designing the internal data layout:
Single-member
union
:union U { T _x; }; std::array<U, N> _data;
Single
std::aligned_storage_t
:std::aligned_storage_t<sizeof(T) * N, alignof(T)> _data;
Array of
std::aligned_storage_t
:using storage = std::aligned_storage_t<sizeof(T), alignof(T)>; std::array<storage, N> _data;
Regardless of the choice, creating the members will require the use of "placement new
" and accessing them will require something along the lines of reinterpret_cast
.
Now assume that we have two very minimal implementations of static_vector<T, N>
:
with_union
: implemented using the "single-memberunion
" approach;with_storage
: implemented using the "singlestd::aligned_storage_t
" approach.
Let's perform the following benchmark using both g++
and clang++
with -O3
. I used quick-bench.com for this task:
void escape(void* p) { asm volatile("" : : "g"(p) : "memory"); }
void clobber() { asm volatile("" : : : "memory"); }
template <typename Vector>
void test()
{
for(std::size_t j = 0; j < 10; ++j)
{
clobber();
Vector v;
for(int i = 0; i < 123456; ++i) v.emplace_back(i);
escape(&v);
}
}
(escape
and clobber
are taken from Chandler Carruth's CppCon 2015 talk: "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!")
- Results for
g++ 7.2
(live here):
- Results for
clang++ 5.0
(live here):
As you can see from the results, g++
seems to be able to aggressively optimize (vectorization) the implementation that uses the "single std::aligned_storage_t
" approach, but not the implementation using the union
.
My questions are:
Is there anything in the Standard that prevents the implementation using
union
from being aggressively optimized? (I.e. does the Standard grant more freedom to the compiler when usingstd::aligned_storage_t
- if so, why?)Is this purely a "quality of implementation" issue?
aligned_storage
is itself a union, though it's to anunsigned char
array + a struct with an__attribute__((__align__((T))))
. – Terzas-fstrict-aliasing
enabled. This might be the reason here. – Anchorage