The following question is related, however answers are old, and comment from user Marc Glisse suggests there are new approaches since C++17 to this problem that might not be adequately discussed.
I'm trying to get aligned memory working properly for SIMD, while still having access to all of the data.
On Intel, if I create a float vector of type __m256
, and reduce my size by a factor of 8, it gives me aligned memory.
E.g. std::vector<__m256> mvec_a((N*M)/8);
In a slightly hacky way, I can cast pointers to vector elements to float, which allows me to access individual float values.
Instead, I would prefer to have an std::vector<float>
which is correctly aligned, and thus can be loaded into __m256
and other SIMD types without segfaulting.
I've been looking into aligned_alloc.
This can give me a C-style array that is correctly aligned:
auto align_sz = static_cast<std::size_t> (32);
float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));
However I'm unsure how to do this for std::vector<float>
. Giving the std::vector<float>
ownership of marr_a
doesn't seem to be possible.
I've seen some suggestions that I should write a custom allocator, but this seems like a lot of work, and perhaps with modern C++ there is a better way?
_mm256_loadu_ps(&vec[i])
. (Although note that with default tuning options, GCC splits not-guaranteed-aligned 256-bit loads/stores into vmovups xmm / vinsertf128. So there is an advantage to using_mm256_load
overloadu
if you care about how your code compiles on GCC if someone forgets to use-mtune=...
or-march=
options.) – Therapistboost::alignment::aligned_allocator
. Then I could allocate the vector withstd::vector<T, aligned_allocator<float>>
. It does make normalstd::vectors
not directly compatible with this type of aligned vector, but you can always write ways around that. – Loveinidleness