Why is dynamically allocated memory always 16 bytes aligned?
Asked Answered
S

6

8

I wrote a simple example:

#include <iostream>

int main() {
    void* byte1 = ::operator new(1);
    void* byte2 = ::operator new(1);
    void* byte3 = malloc(1);
    std::cout << "byte1: " << byte1 << std::endl;
    std::cout << "byte2: " << byte2 << std::endl;
    std::cout << "byte3: " << byte3 << std::endl;
    return 0;
}

Running the example, I get the following results:

byte1: 0x1f53e70

byte2: 0x1f53e90

byte3: 0x1f53eb0

Each time I allocate a single byte of memory, it's always 16 bytes aligned. Why does this happen?

I tested this code on GCC 5.4.0 as well as GCC 7.4.0, and got the same results.

Sportive answered 29/11, 2019 at 2:58 Comment(4)
@MosheRabaev As far as I know, the alignas is used on specific variable or type. How can I set the default alignas to every object?Sportive
@MosheRabaev If there is a default alignment, does it apply to objects on the stack too?Clodhopper
There is no global alignas, I don't know what @MosheRabaev wants to say with the comment.Elephus
I have no clue why by default it's aligning to 16 bytes. I phrased it wrongly, I mean to say use alignas for custom behavior.Jules
K
6

Why does this happen?

Because the standard says so. More specifically, it says that the dynamic allocations1 are aligned to at least the maximum fundamental2 alignment (it may have stricter alignment). There is a pre-defined macro (since C++17) just for the purpose of telling you exactly what this guaranteed alignment is: __STDCPP_DEFAULT_NEW_ALIGNMENT__. Why this might be 16 in your example... that is a choice of the language implementation, restricted by what is allowed by the target hardware architecture.

This is (was) a necessary design, considering that there is (was) no way to pass information about the needed alignment to the allocation function (until C++17 which introduced aligned-new syntax for the purpose of allocating "over-aligned" memory).

malloc doesn't know anything about the types of objects that you intend to create into the memory. One might think that new could in theory deduce the alignment since it is given a type... but what if you wanted to reuse that memory for other objects with stricter alignment, like for example in implementation of std::vector? And once you know the API of the operator new: void* operator new ( std::size_t count ), you can see that the type or its alignment are not an argument that could affect the alignment of the allocation.

1 Made by the default allocator, or malloc family of functions.

2 The maximum fundamental alignment is alignof(std::max_align_t). No fundamental type (arithmetic types, pointers) has stricter alignment than this.

Kipkipling answered 29/11, 2019 at 3:8 Comment(7)
Is there any synonym for __STDCPP_DEFAULT_NEW_ALIGNMENT__ in C++11?Sportive
According to your explanation, __STDCPP_DEFAULT_NEW_ALIGNMENT__ is 16, which is consistent with my test result in gcc 7.4 with C++17. But I found the value of sizeof(std::max_align_t) is 32 in gcc 5.4 with C++11 and gcc 7.4 with C++17.Sportive
@Sportive interesting. Then I may have gotten something wrong about their relation. I thought STDCPP_DEFAULT_NEW_ALIGNMENT would have been bigger.Kipkipling
@Kipkipling Since C++17 [new.delete.single]/1 says that this overload of operator new only needs to return a pointer suitably aligned for any complete object type of the given size given that it doesn't have new-extended alignment, where new-extended means larger than __STDCPP_DEFAULT_NEW_ALIGNMENT__. I didn't find anything requiring this to be at least as large as the largest fundamental alignment, which is alignof(std​::​max_­align_­t) (I think you mixed up sizeof and alignof.).Elephus
Before C++17 this overload of operator new did have to return a pointer suitably aligned for any object with fundamenal alignment, but if I understand correctly, since C++17, there can be alignments in-between for which the operator new(std::size_t, std::align_val_t) overload would be called by a new-expression.Elephus
See [basic.align]/3 for the definition of extended vs new-extended alignment in C++17.Elephus
@Sportive Try alignof(std::max_align_t) instead of sizeof(std::max_align_t) and you will get the same result as for __STDCPP_DEFAULT_NEW_ALIGNMENT__. As I mentioned in the comments above, this was probably a mistake by eerorika, but as I also mentioned I don't think the two values are required to be ordered in a certain way (I don't know for sure though.)Elephus
D
6

There are actually two reasons. The first reason is, that there are some alignment requirements for some kinds of objects. Usually, these alignment requirements are soft: A misaligned access is "just" slower (possibly by orders of magnitude). They can also be hard: On the PPC, for instance, you simply could not access a vector in memory if that vector was not aligned to 16 bytes. Alignment is not something optional, it is something that must be considered when allocating memory. Always.

Note that there is no way to specify an alignment to malloc(). There's simply no argument for it. As such, malloc() must be implemented to provide a pointer that is correctly aligned for any purposes on the platform. The ::operator new() in C++ follows the same principle.

How much alignment is needed is fully platform dependent. On a PPC, there is no way that you can get away with less than 16 bytes alignment. X86 is a bit more lenient in this, afaik.


The second reason is the inner workings of an allocator function. Typical implementations have an allocator overhead of at least 2 pointers: Whenever you request a byte from malloc() it will usually need to allocate space for at least two additional pointers to do its own bookkeeping (the exact amount depends on the implementation). On a 64 bit architecture, that's 16 bytes. As such, it is not sensible for malloc() to think in terms of bytes, it's more efficient to think in terms of 16 byte blocks. At least. You see that with your example code: The resulting pointers are actually 32 bytes apart. Each memory block occupies 16 bytes payload + 16 bytes internal bookkeeping memory.

Since the allocators request entire memory pages from the kernel (4096 bytes, 4096 bytes aligned!), the resulting memory blocks are naturally 16 bytes aligned on a 64 bit platform. It's simply not practical to provide less aligned memory allocations.


So, taken these two reasons together, it is both practical and required to provide seriously aligned memory blocks from an allocator function. The exact amount of alignment depends on the platform, but will usually not be less than the size of two pointers.

Dunnage answered 29/11, 2019 at 13:57 Comment(0)
C
2

It's probably the way the memory allocator manages to get the necessary information to the deallocation function: the issue of the deallocation function (like free or the general, global operator delete) is that there is exactly one argument, the pointer to the allocated memory and no indication of the size of the block that was requested (or the size that was allocated if it's larger), so that indication (and much more) needs to be provided in some other form to the deallocation function.

The most simple yet efficient approach is to allocate room for that additional information plus the requested bytes, and return a pointer to the end of the information block, let's call it IB. The size and alignment of IB automatically aligns the address returned by either malloc or operator new, even if you allocate a minuscule amount: the real amount allocated by malloc(s) is sizeof(IB)+s.

For such small allocations the approach is relatively wasteful and other strategies might be used, but having multiple allocation methods complicate deallocation as the function must first determine which method was used.

Clodhopper answered 29/11, 2019 at 3:23 Comment(0)
K
0

Why does this happens?

Because in general case library does not know what kind of data you are going to store in that memory so it has to be aligned to the biggest data type on that platform. And if you store data unaligned you will get significant penalty of hardware performance. On some platforms you will even get segfault if you try to access data unaligned.

Kora answered 29/11, 2019 at 3:18 Comment(2)
And on other platforms you may even read/write the wrong data because the CPU simply ignores the last few bits of the address... (That's even worse than a SEGFAULT, imho.)Dunnage
@cmaster In some cases, an incorrect address is even decoded as a shift instruction on the one word at the correct address. That is you get a diff result, w/o an error indication.Clodhopper
A
0

Due to the platform. On X86 it isn't necessary but gains performance of the operations. As I know on newer models it doesn't make a difference but compiler goes for the optimum. When not aligned properly for example a long not aligned 4 byte on a m68k processor will crash.

Ascidian answered 29/11, 2019 at 3:23 Comment(2)
Here are some tests: lemire.me/blog/2012/05/31/…Ascidian
Also, alignment makes the memory-allocator more general purpose and a bit more efficient. It always returns values that are correctly aligned for anything that might need alignment, and that are always, internally, some multiple of the size needed to maintain that alignment. "Memory is plentiful now."Martinmas
W
-1

It isn't. It depends on the OS/CPU requirements. In the case of 32bit version of linux/win32, the allocated memory is always 8 byte aligned. In the case of 64bit versions of linux/win32, since all 64bit CPUs have SSE2 at a minimum, it kinda made sense at the time to align all memory to 16bytes (because working with SSE2 was less efficient when using unaligned memory). With the latest AVX based CPUs, this performance penalty for unaligned memory has been removed, so really they could allocate on any boundary.

If you think of it, aligning the addresses for memory allocations to 16bytes gives you 4bits of blank space in the pointer address. This may be useful internally for storing some additional flags (e.g. readable, writable, executable, etc).

At the end of the day, the reasoning is entirely dictated by the OS and/or hardware requirements. It's nothing to do with the language.

Warwick answered 29/11, 2019 at 3:12 Comment(5)
"aligning the addresses for memory allocations to 16bytes gives you 4bits of blank space in the pointer address" this is not the reason. Main reason - penalty of unaligned data stored in that memory.Kora
What does this sentence mean? "aligning the addresses for memory allocations to 16bytes gives you 4bits of blank space in the pointer address"Sportive
@Sportive Knowing a priori that all addresses will be aligned means that there is exactly zero information in some bits of the address. These bits are effectively "unused" in the stored value and could be attributed to something else, as with a bitfield.Clodhopper
Cache line splits are still slower with AVX, only misalignment within a cache line is free on Intel CPUs. Some AMD CPUs with AVX do care about boundaries narrower than 64B. It would be more accurate to say that AVX made it free to use unaligned-capable instructions for the common case where they are in fact aligned at run-time. (Actually Nehalem did that, making movups cheap, but AVX allows folding loads into memory source operands, because the VEX-encoded versions don't require alignment.)Gaye
The real source of the alignment requirement is the ABI, which is designed for the ISA's hardware at the time (e.g. early 2000s for the x86-64 System V ABI which has alignof(max_align_t) = 16)Gaye

© 2022 - 2024 — McMap. All rights reserved.