Why is the default alignment for `int64_t` 8 byte on 32 bit x86 architecture?
Asked Answered
B

1

3

Why is the default alignment 8 byte for int64_t (e.g. long long) in 32 bit x86 ABIs? 4 byte alignment would appear to be fine, because it can only be accessed as two 4B halves.

Botzow answered 29/12, 2015 at 4:27 Comment(6)
Possible duplicate of stackoverflow.com/questions/1054657: "The usual rule of thumb (straight from Intels and AMD's optimization manuals) is that every data type should be aligned by its own size. An int32 should be aligned on a 32-bit boundary, an int64 on a 64-bit boundary, and so on. A char will fit just fine anywhere." A long long Is 64bits in size, so it is best aligned using 8 byte alignment.Nathan
@RemyLebeau yes, I know that. I was looking for reasoning behind the rule of thumb. the question that you linked mentions that 32 bit processors have 64 bit data bus, then 8 byte alignment makes sense. Thanks!Botzow
All 32-bit C and C++ compilers I know (but I don't get out much) make an effort to keep 64-bit types aligned to 8. It matters most of all for double, accessing them when they are not aligned is very expensive, fat x3 when it straddles a cache line. Whether yours does as well is unclear, very odd that SO users keep the name of their compiler a secret.Tynan
@HansPassant what do you mean by fat x 3? I have 64 bit compiler. I read wikipedia article about the alignment requirements, and this was mentioned there.Botzow
Over 3 times as slow. If you use a 64-bit compiler then a question about a 32-bit x86 architecture is quite irrelevant.Tynan
If you allowed splitting an ordinary int64_t, then you'd need special, new alignment rules for std::atomic<int64_t>.Desist
H
2

Interesting point: If you only ever load it as two halves into 32bit GP registers, then 4B alignment means those operations will happen with their natural alignment.

However, it's probably best if both halves of the variable are in the same cache line, since almost all accesses will read / write both halves. Aligning to the natural alignment of the whole thing takes care of that, even ignoring the other reasons below.


32bit x86 can load 64bit integers in a single 64bit-load using MMX or SSE2 movq. Handling 64bit add/sub/shift/ and bitwise booleans using vector instructions is more efficient (single instruction), as long as you don't need immediate constants or mul or div. The vector instructions with 64b elements are still available in 32b mode.


Atomic 64bit compare-and-exchange is also available in 32bit mode (lock CMPXCHG8B m64 works just like 64bit mode's lock CMPXCHG16B m128, using two implicit registers (edx:eax)). IDK what kind of penalty it has for crossing a cache-line boundary.


Modern x86 CPUs have essentially no penalty for misaligned loads/stores unless they cross cache-line boundaries, which is why I'm only saying that, and not saying that misaligned 64b would be bad in general. See the links in the wiki, esp. Agner Fog's guides.

Hagen answered 29/12, 2015 at 5:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.