How is a vector's data aligned?
Asked Answered
P

8

62

If I want to process data in a std::vector with SSE, I need 16 byte alignment. How can I achieve that? Do I need to write my own allocator? Or does the default allocator already align to 16 byte boundaries?

Peal answered 10/12, 2011 at 11:38 Comment(3)
In C++11, there's an aligned_storage. Maybe there's also an aligned_allocator? Lemme check.Aweless
possible duplicate of SSE and C++ containersMomentous
see C++ STL data structure alignment, algorithm vectorizationUkulele
O
50

C++ standard requires allocation functions (malloc() and operator new()) to allocate memory suitably aligned for any standard type. As these functions don't receive the alignment requirement as an argument, in practice it means that the alignment for all allocations is the same, and is that of a standard type with the largest alignment requirement, which often is long double and/or long long (see boost max_align union).

Vector instructions, such as SSE and AVX, have stronger alignment requirements (16-byte aligned for 128-bit access and 32-byte aligned for 256-bit access) than that provided by the standard C++ allocation functions. posix_memalign() or memalign() can be used to satisfy such allocations with stronger alignment requirements.


In C++17 the allocation functions accept an additional argument of type std::align_val_t.

You can make use of it like:

#include <immintrin.h>
#include <memory>
#include <new>

int main() {
    std::unique_ptr<__m256i[]> arr{new(std::align_val_t{alignof(__m256i)}) __m256i[32]};
}

Moreover, in C++17 the standard allocators have been updated to respect type's alignment, so you can simply do:

#include <immintrin.h>
#include <vector>

int main() {
    std::vector<__m256i> arr2(32);
}

Or (no heap allocation involved and supported in C++11):

#include <immintrin.h>
#include <array>

int main() {
    std::array<__m256i, 32> arr3;
}
Obelisk answered 10/12, 2011 at 12:27 Comment(4)
Brilliant answer, love it!Traduce
The (std::align_val_t{alignof(__m256i)}) part in new(std::align_val_t{alignof(__m256i)}) is superfluous. The new-expression will call the operator new[] overload with a correct align_val_t argument anyway if the array type has a stricter alignment requirement than the non-aligning operator new[] guarantees. Specifying it explicitly is a bit misleading, because alignments stricter than the given array type are not guaranteed to work correctly.Origen
@Origen Would you like to amend this 2011 answer with your C++17 addition?Obelisk
Ah, I was just fiddling with my aligned_allocator class, not realising C++17 respects the type alignment regardless. Thanks.Football
S
28

You should use a custom allocator with std:: containers, such as vector. Can't remember who wrote the following one, but I used it for some time and it seems to work (you might have to change _aligned_malloc to _mm_malloc, depending on compiler/platform):

#ifndef ALIGNMENT_ALLOCATOR_H
#define ALIGNMENT_ALLOCATOR_H

#include <stdlib.h>
#include <malloc.h>

template <typename T, std::size_t N = 16>
class AlignmentAllocator {
public:
  typedef T value_type;
  typedef std::size_t size_type;
  typedef std::ptrdiff_t difference_type;

  typedef T * pointer;
  typedef const T * const_pointer;

  typedef T & reference;
  typedef const T & const_reference;

  public:
  inline AlignmentAllocator () throw () { }

  template <typename T2>
  inline AlignmentAllocator (const AlignmentAllocator<T2, N> &) throw () { }

  inline ~AlignmentAllocator () throw () { }

  inline pointer adress (reference r) {
    return &r;
  }

  inline const_pointer adress (const_reference r) const {
    return &r;
  }

  inline pointer allocate (size_type n) {
     return (pointer)_aligned_malloc(n*sizeof(value_type), N);
  }

  inline void deallocate (pointer p, size_type) {
    _aligned_free (p);
  }

  inline void construct (pointer p, const value_type & wert) {
     new (p) value_type (wert);
  }

  inline void destroy (pointer p) {
    p->~value_type ();
  }

  inline size_type max_size () const throw () {
    return size_type (-1) / sizeof (value_type);
  }

  template <typename T2>
  struct rebind {
    typedef AlignmentAllocator<T2, N> other;
  };

  bool operator!=(const AlignmentAllocator<T,N>& other) const  {
    return !(*this == other);
  }

  // Returns true if and only if storage allocated from *this
  // can be deallocated from other, and vice versa.
  // Always returns true for stateless allocators.
  bool operator==(const AlignmentAllocator<T,N>& other) const {
    return true;
  }
};

#endif

Use it like this (change the 16 to another alignment, if needed):

std::vector<T, AlignmentAllocator<T, 16> > bla;

This, however, only makes sure the memory block std::vector uses is 16-bytes aligned. If sizeof(T) is not a multiple of 16, some of your elements will not be aligned. Depending on your data-type, this might be a non-issue. If T is int (4 bytes), only load elements whose index is a multiple of 4. If it's double (8 bytes), only multiples of 2, etc.

The real issue is if you use classes as T, in which case you will have to specify your alignment requirements in the class itself (again, depending on compiler, this might be different; the example is for GCC):

class __attribute__ ((aligned (16))) Foo {
    __attribute__ ((aligned (16))) double u[2];
};

We're almost done! If you use Visual C++ (at least, version 2010), you won't be able to use an std::vector with classes whose alignment you specified, because of std::vector::resize.

When compiling, if you get the following error:

C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector(870):
error C2719: '_Val': formal parameter with __declspec(align('16')) won't be aligned

You will have to hack your stl::vector header file:

  1. Locate the vector header file [C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector]
  2. Locate the void resize( _Ty _Val ) method [line 870 on VC2010]
  3. Change it to void resize( const _Ty& _Val ).
Systematic answered 17/12, 2011 at 14:28 Comment(2)
error: ‘_aligned_malloc’ was not declared in this scopWarily
Be warned that the "hack" presented at the end could introduce real bugs regarding lifetime of the referenced object. e.g. vector<T> v(1); v.resize(v[0]); was legal with the pass-by-value parameter. After changing to a reference, it might be broken.Cockpit
U
22

Instead of writing your own allocator, as suggested before, you can use boost::alignment::aligned_allocator for std::vector like this:

#include <vector>
#include <boost/align/aligned_allocator.hpp>

template <typename T>
using aligned_vector = std::vector<T, boost::alignment::aligned_allocator<T, 16>>;
Underclothes answered 23/8, 2017 at 7:37 Comment(0)
A
3

Write your own allocator. allocate and deallocate are the important ones. Here is one example:

pointer allocate( size_type size, const void * pBuff = 0 )
{
    char * p;

    int difference;

    if( size > ( INT_MAX - 16 ) )
        return NULL;

    p = (char*)malloc( size + 16 );

    if( !p )
        return NULL;

    difference = ( (-(int)p - 1 ) & 15 ) + 1;

    p += difference;
    p[ -1 ] = (char)difference;

    return (T*)p;
}

void deallocate( pointer p, size_type num )
{
    char * pBuffer = (char*)p;

    free( (void*)(((char*)p) - pBuffer[ -1 ] ) );
}
Afroasiatic answered 10/12, 2011 at 14:17 Comment(3)
Your code won't probably work on a 64-bit platform. You should rather use intptr_t (which is guaranteed to have the size of a pointer) instead of int and remove this INT_MAX (size is most probably unsigned anyway).Strath
@Christian, it's just an idea on how to solve the problem. I can explain better with C/C++ while the rest of falks are just commenting. That's why I wrote that. Fred is the only one who knows how exactly is he gonna solve it. I hope this will lead him in the right direction.Afroasiatic
While I understand your argument it isn't that hard to turn this code into something more robust with just a few changes that won't make it more complicated. But you got my +1.Strath
S
2

Short Answer:

If sizeof(T)*vector.size() > 16 then Yes.
Assuming you vector uses normal allocators

Caveat: As long as alignof(std::max_align_t) >= 16 as this is the max alignment.

Long Answer:

Updated 25/Aug/2017 new standard n4659

If it is aligned for anything that is greater than 16 it is also aligned correctly for 16.

6.11 Alignment (Paragraph 4/5)

Alignments are represented as values of the type std::size_t. Valid alignments include only those values returned by an alignof expression for the fundamental types plus an additional implementation-defined set of values, which may be empty. Every alignment value shall be a non-negative integral power of two.

Alignments have an order from weaker to stronger or stricter alignments. Stricter alignments have larger alignment values. An address that satisfies an alignment requirement also satisfies any weaker valid alignment requirement.

new and new[] return values that are aligned so that objects are correctly aligned for their size:

8.3.4 New (paragraph 17)

[ Note: when the allocation function returns a value other than null, it must be a pointer to a block of storage in which space for the object has been reserved. The block of storage is assumed to be appropriately aligned and of the requested size. The address of the created object will not necessarily be the same as that of the block if the object is an array. — end note ]

Note most systems have a maximum alignment. Dynamically allocated memory does not need to be aligned to a value greater than this.

6.11 Alignment (paragraph 2)

A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all contexts, which is equal to alignof(std::max_align_t) (21.2). The alignment required for a type might be different when it is used as the type of a complete object and when it is used as the type of a subobject.

Thus as long as your vector memory allocated is greater than 16 bytes it will be correctly aligned on 16 byte boundaries.

Safier answered 10/12, 2011 at 18:18 Comment(12)
What is the reference for these quotations?Thoraco
And yet std::vector<__m256> will segfault on most implementations due to using vmovaps loads/stores that require 32-byte alignment on memory that's only 16-byte (or less) aligned. SIMD vectors are not considered fundamental types, so new doesn't return memory aligned enough to hold them, on existing x86 C++ implementations. In some implementations (especially 32-bit), new only returns 8B-aligned memory, and even std::vector<__m128> will fault.Splenomegaly
sizeof(T)*vector.size() has nothing to do with it. First, T could be a struct type, and second vector.size() is unrelated to how aligned the memory is. (Except for implementation details, like large allocations usually allocating new whole pages.) What the OP wants is (for example) std::vector<float> with 16-byte alignment, but most implementations don't guarantee that without a custom allocator. (It's dumb that std::vector<__m128> doesn't either, but that's not the usual use-case.)Splenomegaly
@PeterCordes The compiler aligns the __m256, __m256d, and __m256i local and global data to 32-byte boundaries on the stack. But vector does not use dynamic allocation so no guarantees broken. So the question becomes what is the maximum alignment of you system. If this is only 16 (as they are powers of 2) then the vector is not breaking rules. To find out try: alignof(std::max_align_t)Safier
@PeterCordes Added a new paragraph just for you. See section 6.11 The problem is because your maximum alignment required by the allocator is only 16 (or less) and not the required 32 you need for those types. My original statements still ALL hold.Safier
@LokiAstari: Right, I didn't think any actual rules were being broken by gcc/clang/icc/msvc, but the end result is an inconvenient system for SIMD with standard containers. If there was a std:: allocator that took an alignment as a template parameter, it would be a lot more convenient.Splenomegaly
I'm not following your argument for why the normal new[] gives you 16B alignment. Wouldn't that only be true on a platform with a 16B-aligned type, like x86-64 SystemV where sizeof(long double) == 16? If you're relying on that, your answer should say so. I don't use Windows, but I think MSVC's implementation only gives you 8B-aligned memory because no fundamental types have alignment guarantees stronger than that in the Windows ABI. And maybe 32-bit Linux as well.Splenomegaly
related: in C++11 I think it's really poor design that a type declared with alignas(64) or whatever doesn't have that alignment information respected by std::vector<T>, only by automatic and static storage.Splenomegaly
@PeterCordes Added a caveat for you. Now you can get the standard containers to align to other values by using aligned_storage See: en.cppreference.com/w/cpp/types/aligned_storageSafier
Thanks for clearing up my questions / concerns, and for that link.Splenomegaly
Err, std::aligned_storage is just a buffer. The example there implements a container on top of it (with the storage array as a member array, not dynamically allocated). There's no obvious way to get standard containers to use it for anything. The "sample implementation" says it could just be built on top of alignas, which doesn't help with dynamic storage.Splenomegaly
BTW, std::vector<__m256> does in fact work with -std=gnu++17 or -std=c++17 on correct implementations. Only with C++14 and earlier is it broken, and my earlier comments from 2017 were probably based on only having tried g++ without -std=gnu++17 if that was even supported at the time. But you would then have to use vector<__m256> instead of vector<float> that can easily be accessed by other code, and that can a size not a multiple of 8 floats.Splenomegaly
D
1

Contemporary answer to a dated (but important) question.

Writing your own Allocator class [template] immediately comes to mind, as said by others. Since C++11 and until C++17, an implementation would be mostly limited (by standard) to using alignas and placement new. C++17 lifts C11's aligned_alloc which is convenient. Furthermore, C++17's std::pmr namespace (header <memory_resource>) introduces the polymorphic_allocator class template and the memory_resource abstract interface for polymorphic allocations, heavily inspired by Boost. Aside from allowing for truly generic, dynamic code, these have been shown to offer speed improvements in some cases; in which case, your SIMD code will perform even better.

Dorrie answered 27/5, 2022 at 0:51 Comment(0)
A
-2

Use declspec(align(x,y)) as explained in vectorization tutorial for Intel, http://d3f8ykwhia686p.cloudfront.net/1live/intel/CompilerAutovectorizationGuide.pdf

Aquamarine answered 13/7, 2015 at 15:26 Comment(1)
Pretty sure that doesn't work on a std::vector: it would just align the control block, not the .data() it points to. The doc you linked doesn't mention std::vector or vector< so I assume any mention of the word vector is talking about SIMD vectors, not C++ std::vector<T>.Splenomegaly
O
-5

The Standard mandates that new and new[] return data aligned for any data type, which should include SSE. Whether or not MSVC actually follows that rule is another question.

Obtund answered 10/12, 2011 at 11:41 Comment(7)
Where did @Fred say anything about MSVC?Aweless
"which should include SSE" - but typically doesn't. Last I checked, both Windows and Linux only guaranteed 8-alignment for allocated memory, and neither MSVC nor glibc did anything to increase the alignment. I believe the weasel-wording is that since SSE operations are non-standard, they can have any behavior the implementer chooses, including it being undefined to perform SSE ops on memory that is not 16-aligned. You can store extended SSE types in 8-aligned memory, so lip service is paid to the standard, but in practice you have to read it as not applying to non-standard types.Pterosaur
Really for any type? So if I myself come up with a data type that needs 4096-alignment that would be supported, too? Of course this example is rubbish, but I hope you see that the any is a bit misplaced. I guess it's any standard type, to which I'm pretty sure SSE types don't belong as the C++ standard doesn't make any mentioning of SSE.Strath
@ChristianRau: It means any alignment requirement imposed by the implementation on a type. Yes, you could write a library which takes a char* or MYTHING* pointer parameter, and checks whether that pointer is 4096-aligned, and aborts, throws, or does something undefined if it isn't. This does not imply that char or MYTHING has a 4096 alignment requirement within the meaning of the standard. I think the standard did intend that any alignment requirement imposed by the implementation would be catered for by malloc and new, but implementers deemed it impractical due to wasted space.Pterosaur
And it means any type in the sense that struct {char c; long long ll; }; is a type, it is not a standard type, but the standard guarantees that memory returned from malloc or new (and that is big enough, in the case of new) is sufficiently aligned to store it. They could have expressed this by saying that malloc/new return memory aligned for any standard type, and that object types involving only standard types never have an alignment requirement greater than that of at least one standard type. But it's not expressed that way.Pterosaur
Strictly, it's perfectly accurate- that is how the Standard is worded. However, it's not true that MSVC implements it this way.Obtund
The x86-64 SystemV ABI has alignof(long double) = 16, as recommended by Intel's optimization manual for 80-bit x87. malloc/new do give you 16B-aligned memory on 64-bit Linux (with glibc). That doesn't help for AVX or AVX512, though. (And as you say, 64-bit Windows and 32-bit anything only give you 8B aligned memory.)Splenomegaly

© 2022 - 2024 — McMap. All rights reserved.