Why is the "alignment" the same on 32-bit and 64-bit systems?
Asked Answered
D

4

22

I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project:

struct Z
{
    char s;
    __int64 i;
};

int main()
{
    std::cout << sizeof(Z) <<"\n"; 
}

What I expected on each "Platform" setting:

x86: 12
X64: 16

Actual result:

x86: 16
X64: 16

Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i in two different words. So I thought the compiler would do padding this way:

struct Z
{
    char s;
    char _pad[3];
    __int64 i;
};

So may I know what the reason behind this is?

  1. For forward-compatibility with the 64-bit system?
  2. Due to the limitation of supporting 64-bit numbers on the 32-bit processor?
Dilator answered 30/4, 2019 at 11:38 Comment(1)
C
15

Size and alignof() (minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.

Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.

MSVC targeting 32-bit x86 gives __int64 a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T)) relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)

(An 8-byte struct containing a char[8] still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16) member still gets 16-byte alignment inside another struct.)

Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T). Also note that MSVC's definition of alignof() doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8, but some __int64 objects have less than that alignment2.


So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with alignas() on the variable, or on a struct member to imply that for the type. (e.g. a local struct Z tmp on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8 to round the stack pointer down to an 8-byte boundary.)

However, new / malloc does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.


This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is min(16, sizeof(T)), so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas.)


The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.

In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for double vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on an int64_t* or double* being naturally aligned.

(I'm not sure if MSVC will ever create even less aligned int64_t or double objects on its own. Certainly yes if you use #pragma pack 1 or -Zp1, but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t out of a buffer manually and don't bother to align it. But assuming alignof(int64_t) is still 8, that would be C++ undefined behaviour.)

If you use alignas(8) int64_t tmp, MSVC emits extra instructions to and esp, -8. If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp ends up 8-byte aligned or not.


Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4 but sizeof(long long) = 8. These choices

Outside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild to do int64_t -> double conversion).

This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.


When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double in a single cache access if it's 64-bit aligned.

Or for fild / fistp, load/store a 64-bit integer when converting to/from double. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?


Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall), but the sizes and alignment-requirements for primitive types like long long are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)

Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise s.x = 10; foo(&x); might write to a different offset relative to the base of the struct than separately-compiled foo() (maybe in a DLL) was expecting to read it at.


Footnote 2:

GCC had this C++ alignof() bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof(). See that bug report for some discussion based on quotes from the standard which conclude that alignof(T) should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using an int64_t* with less than alignof(int64_t) alignment is undefined behaviour.

(It will usually work fine on x86, but vectorization that assumes a whole number of int64_t iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)

The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t and double objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.

Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8 but locals on the stack are always potentially under-aligned unless you use alignas() to specifically request alignment.

32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp is not the same as int64_t tmp;, and emits extra instructions to align the stack. That's because alignas(int64_t) is like alignas(8), which is more aligned than the actual minimum.

void extfunc(int64_t *);

void foo_align8(void) {
    alignas(int64_t) int64_t tmp;
    extfunc(&tmp);
}

(32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):

_tmp$ = -8                                          ; size = 8
void foo_align8(void) PROC                       ; foo_align8, COMDAT
        push    ebp
        mov     ebp, esp
        and     esp, -8                             ; fffffff8H  align the stack
        sub     esp, 8                                  ; and reserve 8 bytes
        lea     eax, DWORD PTR _tmp$[esp+8]             ; get a pointer to those 8 bytes
        push    eax                                     ; pass the pointer as an arg
        call    void extfunc(__int64 *)           ; extfunc
        add     esp, 4
        mov     esp, ebp
        pop     ebp
        ret     0

But without the alignas(), or with alignas(4), we get the much simpler

_tmp$ = -8                                          ; size = 8
void foo_noalign(void) PROC                                ; foo_noalign, COMDAT
        sub     esp, 8                             ; reserve 8 bytes
        lea     eax, DWORD PTR _tmp$[esp+8]        ; "calculate" a pointer to it
        push    eax                                ; pass the pointer as a function arg
        call    void extfunc(__int64 *)           ; extfunc
        add     esp, 12                             ; 0000000cH
        ret     0

It could just push esp instead of LEA/push; that's a minor missed optimization.

Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an int64_t* as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.

If alignof(int64_t) was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like _mm_load_si128() that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.

But with MSVC's actual behaviour, it's possible that none of the int64_t array elements are aligned by 16, because they all span an 8-byte boundary.


BTW, I wouldn't recommend using compiler-specific types like __int64 directly. You can write portable code by using int64_t from <cstdint>, aka <stdint.h>.

In MSVC, int64_t will be the same type as __int64.

On other platforms, it will typically be long or long long. int64_t is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long is normally exactly 64 bits and can be used as int64_t. Or if long is a 64-bit type, then <cstdint> might use that as the typedef.)

I assume __int64 and long long are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.

Castiron answered 1/5, 2019 at 1:33 Comment(2)
OK, I must admit I had never suspected I could get an answer like this which covered so many aspects. Thanks!Dilator
@ShenYuan: I was surprised how non-simple a sufficient explanation turned out to be. MSVC's struct-packing rules (using preferred alignment instead of actual minimum alignment) are a lot different from what I'm familiar with (i386 System V), so the sizeof(Z) == 16 in your question got me curious. I knew 32-bit Windows only keeps the stack 4-byte aligned, so there was a real mystery to solve in terms of when it does do extra stack alignment for locals and what the actual minimum alignments were.Castiron
T
13

The padding is not determined by the word size, but by the alignment of each data type.

In most cases, the alignment requirement is equal to the type's size. So for a 64 bit type like int64 you will get an 8 byte (64 bit) alignment. Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned.

You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types (int*).

Tsunami answered 30/4, 2019 at 11:43 Comment(7)
Default alignment is determined by wordsize however. Reason beeing that words are adressed in memory so that they fit into registers perfectly. On x86(_64) unaligned data requires a shift operation to work with it. On other paltforms like sun sparc unaligned data will cause a bus exception. If you want to remove padding try adding __attribute__((packed)) (GCC) to the struct definition.Aun
@Aun Do you have a reference for that? I am not aware of any such behavior for the C or C++ built-in datatypes.Tsunami
I Experienced this behaviour while porting hand aligned memory sections from x86 to 64 bit. Back then I found a good explenation here. (and I have mistaken spark for itanium, my bad) Also I learned the details of why in my Computer Science Studies (which I can´t reference here)Aun
Also this is not C/C++ Language behaviour but rather compiler behaviourAun
@Nefrin: x86-64 has efficient unaligned loads that handle the required shift in hardware. Yes it's normal for types wider than an integer register to only get aligned to the register width in some ABIs (like i386 System V, and 32-bit Windows). But for x86-64 System V, alignof(__int128) = 16 so it can be copied with SSE vectors, or for lock cmpxchg16b. But as far as the C++ standard is concerned, that's all up to the implementation. And the struct-packing rules are allowed to be different from what you'd expect based on just alignof(member), as P.W's answer shows is the case for MSVC.Castiron
@ComicSansMS: alignof(int64_t) == 8 in 32-bit MSVC, but it doesn't actually bother to ensure that for locals on the stack so that's not really the minimum alignment necessary for any int64_t object. If you use alignas(8) int64_t tmp;, you get extra instructions to align the stack pointer which you don't get with just int64_t tmp. godbolt.org/z/lsuXAQ. The struct-packing rules are allowed to be more complicated than just padding to alignof(T) relative to the start of the struct, as @P.W's answer shows.Castiron
Expanded on that last comment in my own answer (which ended up way longer than I planned >.<) The struct-packing rules make sense because many structs are dynamically or statically allocated, not in automatic storage (on the stack), so it does actually help a lot of the time.Castiron
P
9

This is a matter of alignment requirement of the data type as specified in Padding and Alignment of Structure Members

Every data object has an alignment-requirement. The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either /Zp or the pack pragma, whichever is less).

And the default value for structure member alignment is specified in /Zp (Struct Member Alignment)

The available packing values are described in the following table:

/Zp argument Effect
1 Packs structures on 1-byte boundaries. Same as /Zp.
2 Packs structures on 2-byte boundaries.
4 Packs structures on 4-byte boundaries.
8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64).
16 Packs structures on 16-byte boundaries (default for x64).

Since the default for x86 is /Zp8 which is 8 bytes, the output is 16.

However, you can specify a different packing size with /Zp option.
Here is a Live Demo with /Zp4 which gives the output as 12 instead of 16.

Pacian answered 30/4, 2019 at 11:57 Comment(0)
W
-3

A struct's alignment is the size of its largest member.

That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes.

In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary.


Say we have a CPU that has a 16-byte cache line. Consider a struct like this:

struct Z
{
    char s;      // 1-4 byte
    __int64 i;   // 5-12 byte
    __int64 i2;  // 13-20 byte, need two cache line fetches to read this variable
};
Wreath answered 30/4, 2019 at 15:2 Comment(1)
Nope, a struct's alignment is the alignment of its most-aligned member. Not all C types have a size that matches their alignment, notably other composite types like a struct or array inside another struct, but primitive types aren't guaranteed to have alignof(T) == sizeof(T). On an ABI like i386 System V (32-bit x86 Linux), alignof(int64_t) == 4, so the OP would see their expected sizeof(struct)==12 and alignof(struct)==4.Castiron

© 2022 - 2024 — McMap. All rights reserved.