For the answers provided, there seems to be some confusion about what alignment actually is. The confusion probably arises because there are 2 kinds of alignment.
1. Member alignment
This is a qualitative measure that spells out how large an instance is in number of bytes for a specific ordering of the members within the structure/class type. Generally, compilers can compact structure/class instances if the members are ordered by their byte-size in descending order (i.e. largest first, smallest members last) within the structure. Consider:
struct A
{
char c; float f; short s;
};
struct B
{
float f; short s; char c;
};
Both structures contain exactly the same information. For the sake of this example; the float type takes 4 bytes, the short type takes 2 and the character takes 1 byte. However, the first structure A has members in random order, while second structure B orders members according to their byte size (this may be different on certain architectures, I'm assuming x86 intel CPU architecture with 4-byte alignment in this example). Now consider the size of the structures:
printf("size of A: %d", sizeof (A)); // size of A: 12;
printf("size of B: %d", sizeof (B)); // size of B: 8;
If you would expect the size to be 7 bytes, you would be assuming that the members are packed into the structure using a 1-byte alignment. While some compilers allow this, in general most compilers use 4-byte or even 8-byte alignments due to historic reasons (most CPU's work with DWORD (double-word) or QWORD (quad-word) general purpose registers).
There are 2 padding mechanisms at work to achieve the packing.
First, each member that has a byte size smaller than the byte-alignment is 'merged' with the next member(s) if the resulting byte size is smaller or equal to the byte-alignment. In structure B, members s and c can be merged in this way; their combined size is 2 bytes for s + 1 byte for c == 3 bytes <= 4-byte alignment. For structure A, no such merging can occur, and each member effectively consumes 4 bytes in the structure's packing.
The total size of the structure is again padded so that the next structure can start at the alignment boundary. In example B the total number of bytes would be 7. The next 4-byte boundary lies at byte 8, hence the structure is padded with 1 byte to allow array allocations as a tight sequence of instances.
Note that Visual C++ / GCC allow different alignments of 1 byte, 2 and higher multiples of 2 bytes. Understand that this works against your compiler's ability to produce optimal code for your architecture. Indeed, in the following example, each byte would be read as a single byte using a single-byte instruction for each read operation. In practice, the hardware would still fetch the entire memory line that contains each byte read into the cache, and execute the instruction 4 times, even if the 4 bytes sit in the same DWORD and could be loaded in the CPU register in 1 instruction.
#pragma pack(push,1)
struct Bad
{
char a,b,c,d;
};
#pragma pack(pop)
2. Allocation alignment
This is closely related to the 2nd padding mechanism explained in the previous section, however, allocation alignments can be specified in variants of malloc() / memalloc() allocation functions, e.g. std::aligned_alloc(). Hence, it is possible to allocate an object at a different (typically higher multiple of 2) alignment boundary than the structure/object type's byte-alignment suggests.
size_t blockAlignment = 4*1024; // 4K page block alignment
void* block = std::aligned_alloc(blockAlignment, sizeof(T) * count);
The code will place the block of count instances of type T on addresses that end on multiples of 4096.
The reason for using such allocation alignments are again purely architectural. For instance, reading and writing blocks from page-aligned addresses is faster because the range of addresses fit nicely into the cache layers. Ranges that are split over different 'pages' trash the cache when crossing the page boundary. Different media (bus architectures) have different access patterns and may benefit from different alignments. Generally, alignments of 4, 16, 32 and 64 K page sizes are not uncommon.
Note that the language version and platform will usually provide a specific variant of such aligned allocation functions. E.g. the Unix/Linux compatible posix_memalign() function return the memory by ptr argument and returns non-zero error values in case of failure.
- int posix_memalign(void **memptr, size_t alignment, size_t size); // POSIX(Linux/UX)
- void *aligned_alloc( size_t alignment, size_t size ); // C++11
- void *std::aligned_alloc( size_t alignment, size_t size ); // c++17
- void *aligned_malloc( size_t size, size_t alignment ); MicrosoftVS2019
Returns alignment in bytes (an integer power of two) required for any instance of the given type
- en.cppreference.com/w/cpp/language/alignof.sizeof
just gives the size, in bytes, of course. – Unspoiled