Why is dynamically allocated memory always 16 bytes aligned?

Asked 29/11, 2019 at 2:58 Answered 29/11, 2019 at 13:57

c++dynamic-memory-allocation memory-alignment

I wrote a simple example:

#include <iostream>

int main() {
    void* byte1 = ::operator new(1);
    void* byte2 = ::operator new(1);
    void* byte3 = malloc(1);
    std::cout << "byte1: " << byte1 << std::endl;
    std::cout << "byte2: " << byte2 << std::endl;
    std::cout << "byte3: " << byte3 << std::endl;
    return 0;
}

Running the example, I get the following results:

byte1: 0x1f53e70

byte2: 0x1f53e90

byte3: 0x1f53eb0

Each time I allocate a single byte of memory, it's always 16 bytes aligned. Why does this happen?

I tested this code on GCC 5.4.0 as well as GCC 7.4.0, and got the same results.

Sportive answered 29/11, 2019 at 2:58 Comment(4)

@MosheRabaev As far as I know, the alignas is used on specific variable or type. How can I set the default alignas to every object? – Sportive 29/11, 2019 at 3:7

@MosheRabaev If there is a default alignment, does it apply to objects on the stack too? – Clodhopper 29/11, 2019 at 3:13

There is no global alignas, I don't know what @MosheRabaev wants to say with the comment. – Elephus 29/11, 2019 at 13:30

I have no clue why by default it's aligning to 16 bytes. I phrased it wrongly, I mean to say use alignas for custom behavior. – Jules 29/11, 2019 at 16:8

Why does this happen?

Because the standard says so. More specifically, it says that the dynamic allocations¹ are aligned to at least the maximum fundamental² alignment (it may have stricter alignment). There is a pre-defined macro (since C++17) just for the purpose of telling you exactly what this guaranteed alignment is: __STDCPP_DEFAULT_NEW_ALIGNMENT__. Why this might be 16 in your example... that is a choice of the language implementation, restricted by what is allowed by the target hardware architecture.

This is (was) a necessary design, considering that there is (was) no way to pass information about the needed alignment to the allocation function (until C++17 which introduced aligned-new syntax for the purpose of allocating "over-aligned" memory).

malloc doesn't know anything about the types of objects that you intend to create into the memory. One might think that new could in theory deduce the alignment since it is given a type... but what if you wanted to reuse that memory for other objects with stricter alignment, like for example in implementation of std::vector? And once you know the API of the operator new: void* operator new ( std::size_t count ), you can see that the type or its alignment are not an argument that could affect the alignment of the allocation.

¹ Made by the default allocator, or malloc family of functions.

² The maximum fundamental alignment is alignof(std::max_align_t). No fundamental type (arithmetic types, pointers) has stricter alignment than this.

Kipkipling answered 29/11, 2019 at 3:8 Comment(7)

Is there any synonym for __STDCPP_DEFAULT_NEW_ALIGNMENT__ in C++11？ – Sportive 29/11, 2019 at 5:52

According to your explanation, __STDCPP_DEFAULT_NEW_ALIGNMENT__ is 16, which is consistent with my test result in gcc 7.4 with C++17. But I found the value of sizeof(std::max_align_t) is 32 in gcc 5.4 with C++11 and gcc 7.4 with C++17. – Sportive 29/11, 2019 at 5:59

@Sportive interesting. Then I may have gotten something wrong about their relation. I thought STDCPP_DEFAULT_NEW_ALIGNMENT would have been bigger. – Kipkipling 29/11, 2019 at 11:33

@Kipkipling Since C++17 [new.delete.single]/1 says that this overload of operator new only needs to return a pointer suitably aligned for any complete object type of the given size given that it doesn't have new-extended alignment, where new-extended means larger than __STDCPP_DEFAULT_NEW_ALIGNMENT__. I didn't find anything requiring this to be at least as large as the largest fundamental alignment, which is alignof(std::max_align_t) (I think you mixed up sizeof and alignof.). – Elephus 29/11, 2019 at 13:16

Before C++17 this overload of operator new did have to return a pointer suitably aligned for any object with fundamenal alignment, but if I understand correctly, since C++17, there can be alignments in-between for which the operator new(std::size_t, std::align_val_t) overload would be called by a new-expression. – Elephus 29/11, 2019 at 13:19

See [basic.align]/3 for the definition of extended vs new-extended alignment in C++17. – Elephus 29/11, 2019 at 13:23

@Sportive Try alignof(std::max_align_t) instead of sizeof(std::max_align_t) and you will get the same result as for __STDCPP_DEFAULT_NEW_ALIGNMENT__. As I mentioned in the comments above, this was probably a mistake by eerorika, but as I also mentioned I don't think the two values are required to be ordered in a certain way (I don't know for sure though.) – Elephus 29/11, 2019 at 13:41

There are actually two reasons. The first reason is, that there are some alignment requirements for some kinds of objects. Usually, these alignment requirements are soft: A misaligned access is "just" slower (possibly by orders of magnitude). They can also be hard: On the PPC, for instance, you simply could not access a vector in memory if that vector was not aligned to 16 bytes. Alignment is not something optional, it is something that must be considered when allocating memory. Always.

Note that there is no way to specify an alignment to malloc(). There's simply no argument for it. As such, malloc() must be implemented to provide a pointer that is correctly aligned for any purposes on the platform. The ::operator new() in C++ follows the same principle.

How much alignment is needed is fully platform dependent. On a PPC, there is no way that you can get away with less than 16 bytes alignment. X86 is a bit more lenient in this, afaik.

The second reason is the inner workings of an allocator function. Typical implementations have an allocator overhead of at least 2 pointers: Whenever you request a byte from malloc() it will usually need to allocate space for at least two additional pointers to do its own bookkeeping (the exact amount depends on the implementation). On a 64 bit architecture, that's 16 bytes. As such, it is not sensible for malloc() to think in terms of bytes, it's more efficient to think in terms of 16 byte blocks. At least. You see that with your example code: The resulting pointers are actually 32 bytes apart. Each memory block occupies 16 bytes payload + 16 bytes internal bookkeeping memory.

Since the allocators request entire memory pages from the kernel (4096 bytes, 4096 bytes aligned!), the resulting memory blocks are naturally 16 bytes aligned on a 64 bit platform. It's simply not practical to provide less aligned memory allocations.

So, taken these two reasons together, it is both practical and required to provide seriously aligned memory blocks from an allocator function. The exact amount of alignment depends on the platform, but will usually not be less than the size of two pointers.

Dunnage answered 29/11, 2019 at 13:57 Comment(0)

It's probably the way the memory allocator manages to get the necessary information to the deallocation function: the issue of the deallocation function (like free or the general, global operator delete) is that there is exactly one argument, the pointer to the allocated memory and no indication of the size of the block that was requested (or the size that was allocated if it's larger), so that indication (and much more) needs to be provided in some other form to the deallocation function.

The most simple yet efficient approach is to allocate room for that additional information plus the requested bytes, and return a pointer to the end of the information block, let's call it IB. The size and alignment of IB automatically aligns the address returned by either malloc or operator new, even if you allocate a minuscule amount: the real amount allocated by malloc(s) is sizeof(IB)+s.

For such small allocations the approach is relatively wasteful and other strategies might be used, but having multiple allocation methods complicate deallocation as the function must first determine which method was used.

Clodhopper answered 29/11, 2019 at 3:23 Comment(0)

Why does this happens?

Because in general case library does not know what kind of data you are going to store in that memory so it has to be aligned to the biggest data type on that platform. And if you store data unaligned you will get significant penalty of hardware performance. On some platforms you will even get segfault if you try to access data unaligned.

Kora answered 29/11, 2019 at 3:18 Comment(2)

And on other platforms you may even read/write the wrong data because the CPU simply ignores the last few bits of the address... (That's even worse than a SEGFAULT, imho.) – Dunnage 29/11, 2019 at 14:1

@cmaster In some cases, an incorrect address is even decoded as a shift instruction on the one word at the correct address. That is you get a diff result, w/o an error indication. – Clodhopper 30/11, 2019 at 2:10

Due to the platform. On X86 it isn't necessary but gains performance of the operations. As I know on newer models it doesn't make a difference but compiler goes for the optimum. When not aligned properly for example a long not aligned 4 byte on a m68k processor will crash.

Ascidian answered 29/11, 2019 at 3:23 Comment(2)

Here are some tests: lemire.me/blog/2012/05/31/… – Ascidian 29/11, 2019 at 3:33

Also, alignment makes the memory-allocator more general purpose and a bit more efficient. It always returns values that are correctly aligned for anything that might need alignment, and that are always, internally, some multiple of the size needed to maintain that alignment. "Memory is plentiful now." – Martinmas 3/12, 2019 at 15:5

-1

It isn't. It depends on the OS/CPU requirements. In the case of 32bit version of linux/win32, the allocated memory is always 8 byte aligned. In the case of 64bit versions of linux/win32, since all 64bit CPUs have SSE2 at a minimum, it kinda made sense at the time to align all memory to 16bytes (because working with SSE2 was less efficient when using unaligned memory). With the latest AVX based CPUs, this performance penalty for unaligned memory has been removed, so really they could allocate on any boundary.

If you think of it, aligning the addresses for memory allocations to 16bytes gives you 4bits of blank space in the pointer address. This may be useful internally for storing some additional flags (e.g. readable, writable, executable, etc).

At the end of the day, the reasoning is entirely dictated by the OS and/or hardware requirements. It's nothing to do with the language.

Warwick answered 29/11, 2019 at 3:12 Comment(5)

"aligning the addresses for memory allocations to 16bytes gives you 4bits of blank space in the pointer address" this is not the reason. Main reason - penalty of unaligned data stored in that memory. – Kora 29/11, 2019 at 3:14

What does this sentence mean? "aligning the addresses for memory allocations to 16bytes gives you 4bits of blank space in the pointer address" – Sportive 29/11, 2019 at 5:50

@Sportive Knowing a priori that all addresses will be aligned means that there is exactly zero information in some bits of the address. These bits are effectively "unused" in the stored value and could be attributed to something else, as with a bitfield. – Clodhopper 30/11, 2019 at 2:9

Cache line splits are still slower with AVX, only misalignment within a cache line is free on Intel CPUs. Some AMD CPUs with AVX do care about boundaries narrower than 64B. It would be more accurate to say that AVX made it free to use unaligned-capable instructions for the common case where they are in fact aligned at run-time. (Actually Nehalem did that, making movups cheap, but AVX allows folding loads into memory source operands, because the VEX-encoded versions don't require alignment.) – Gaye 28/6, 2020 at 6:11

The real source of the alignment requirement is the ABI, which is designed for the ISA's hardware at the time (e.g. early 2000s for the x86-64 System V ABI which has alignof(max_align_t) = 16) – Gaye 28/6, 2020 at 6:13

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags