Static vector internal data layout - `union` vs `std::aligned_storage_t` - huge performance difference
Asked Answered
A

1

21

Assume that you have to implement a static_vector<T, N> class, which is a fixed capacity container that entirely lives on the stack and never allocates, and exposes an std::vector-like interface. (Boost provides boost::static_vector.)

Considering that we must have uninitialized storage for maximum N instances of T, there are multiple choices that can be made when designing the internal data layout:

  • Single-member union:

    union U { T _x; };
    std::array<U, N> _data;
    
  • Single std::aligned_storage_t:

    std::aligned_storage_t<sizeof(T) * N, alignof(T)> _data;
    
  • Array of std::aligned_storage_t:

    using storage = std::aligned_storage_t<sizeof(T), alignof(T)>;
    std::array<storage, N> _data;
    

Regardless of the choice, creating the members will require the use of "placement new" and accessing them will require something along the lines of reinterpret_cast.


Now assume that we have two very minimal implementations of static_vector<T, N>:

  • with_union: implemented using the "single-member union" approach;

  • with_storage: implemented using the "single std::aligned_storage_t" approach.

Let's perform the following benchmark using both g++ and clang++ with -O3. I used quick-bench.com for this task:

void escape(void* p) { asm volatile("" : : "g"(p) : "memory"); }
void clobber()       { asm volatile("" : :        : "memory"); }

template <typename Vector>
void test()
{
    for(std::size_t j = 0; j < 10; ++j)
    {
        clobber();
        Vector v;
        for(int i = 0; i < 123456; ++i) v.emplace_back(i);
        escape(&v);
    }
}

(escape and clobber are taken from Chandler Carruth's CppCon 2015 talk: "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!")

g++ results

clang++ results


As you can see from the results, g++ seems to be able to aggressively optimize (vectorization) the implementation that uses the "single std::aligned_storage_t" approach, but not the implementation using the union.

My questions are:

  • Is there anything in the Standard that prevents the implementation using union from being aggressively optimized? (I.e. does the Standard grant more freedom to the compiler when using std::aligned_storage_t - if so, why?)

  • Is this purely a "quality of implementation" issue?

Aim answered 6/2, 2018 at 14:22 Comment(6)
Even more interesting is that gcc's aligned_storage is itself a union, though it's to an unsigned char array + a struct with an __attribute__((__align__((T)))).Terzas
Changing your union to store an array of unsigned char instead gives it the same timing as the aligned_storage implementation.Terzas
GCC is more permissive than required by the standard when coming to type-punning through unions, even with -fstrict-aliasing enabled. This might be the reason here.Anchorage
May caused by the same issue as #43652423Foxe
"clang++ fails to aggressively optimize either of them." You sure? It looks like both clang times are comparable with the [faster] gcc aligned storage time.Anglicize
@Barry: nope, I just failed at reading graphs.Aim
A
11

xskxzr is right, this is the same issue as in this question. Fundamentally, gcc is missing an optimization opportunity in forgetting that std::array's data is aligned. John Zwinck helpfully reported bug 80561.

You can verify this in your benchmark by making one of two changes to with_union:

  1. Change _data from a std::array<U, N> to simply a U[N]. Performance becomes identical

  2. Remind gcc that _data is actually aligned by changing the implementation of emplace_back() to:

    template <typename... Ts> 
    T& emplace_back(Ts&&... xs)
    {
        U* data = static_cast<U*>(__builtin_assume_aligned(_data.data(), alignof(U)));
        T* ptr = &data[_size++]._x;
        return *(new (ptr) T{std::forward<Ts>(xs)...});
    }
    

Either of those changes with the rest of your benchmark gets me comparable results between WithUnion and WithAlignedStorage.

Anglicize answered 6/2, 2018 at 18:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.