__int128 alignment segment fault with gcc -O SSE optimize
Asked Answered
F

1

1

I use __int128 as struct's member. It works find with -O0 (no optimization).

However it crashes for segment fault if optimization enabled (-O1).

It crashes at instruction movdqa, which need the var aligned by 16. While the address is allocated by malloc() which align only by 8.

I tried to disable SSE optimization by -mno-sse, but it fails to compile:

/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:27:1: error: SSE register return with SSE disabled

So what can I do if I want to use __int128 and -O1 both?

Thanks in advance Wu

BTW, it seems OK if __int128 is used only on stack (not on heap).

==== EDIT ====

Sorry that I did not say the truth.

In fact I did not use malloc(). I used a memory pool lib which returns address aligned by 8. I said malloc() just to want to make things simple.

After testing, I have known that malloc() aligns by 16. And the __int128 member also align by 16 in struct.

So the problem is my memory pool lib only.

Thanks very much.

Franni answered 27/9, 2018 at 7:29 Comment(4)
Check aligned malloc() in GCC?Pauwles
For x86-64, alignof(maxalign_t) == 16 so malloc always returns 16-byte aligned pointers. It sounds like your malloc is broken, and would violate the ABI if used for long double as well. malloc is guaranteed to be aligned enough to hold any standard type. This can't be 32-bit code, because gcc doesn't support __int128 in 32-bit targets.Tobacconist
Can you provide a minimal reproducible example? What compiler and library versions are you using, and on what Linux distro?Tobacconist
I was wrong, I took it for granted that malloc() align by 8. Thanks.Franni
T
7

For x86-64 System V, alignof(max_align_t) == 16 so malloc always returns 16-byte aligned pointers. It sounds like your allocator is broken, and would violate the ABI if used for long double as well. (Reposting this as an answer because it turns out it was the answer).

Memory returned by malloc is guaranteed to be able to hold any standard type, so that means being aligned enough if the size is large enough.

This can't be 32-bit code, because gcc doesn't support __int128 in 32-bit targets. (32-bit glibc malloc only guarantees 8-byte alignment I think. Actually on current systems, alignof(max_align_t) == 16 in 32-bit mode as well.)


In general, the compiler is allowed to make code that faults if you violate the alignment requirements of types. On x86 things typically just work with misaligned memory until the compiler uses alignment-required SIMD instructions. Even auto-vectorization with a mis-aligned uint16_t* can fault (Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?), so don't assume that narrow types are always safe. Use memcpy if you need to express an unaligned load in C.


Apparently alignof(__int128) is 16. So they aren't repeating the weirdness from i386 System V where even 8-byte objects are only guaranteed 4-byte alignment, and struct-packing rules mean that compilers can't always give them natural alignment.

This is a Good Thing, because it makes it efficient to copy with SSE, and means _Atomic __int128 doesn't need any extra special handling to avoid cache-line splits that would make lock cmpxchg16b very slow.

Tobacconist answered 27/9, 2018 at 8:17 Comment(7)
Isn't __int128 always represented as a pair of 64-bit integers ? I find it weird for it to have 16 byte alignment instead of 8 byte alignment, which is what a struct containing two 64-bit integers would have. Also, AFAICT, because the calling convention requires passing it in 2 x 64-bit registers, it is (almost) never worth it to use SIMD instructions for copying. You would have to move the type into SIMD registers first, and then back, but there aren't any SIMD operations for working with 128-bit integers, so this is unlikely to be worth it.Maher
@gnzlbg: That's true for function args. But __int128 can be used anywhere, including structs, arrays, globals, etc. There are cases (like in the question) where copying a __int128 from memory to memory is what the compiler decides to do, and it doesn't need the value in integer registers later to actually do arithmetic on it. But yes, in most use-cases for __int128, you aren't just copying them without also doing math, so compilers would use a pair of integer loads/stores. Or they'd already be in registers as args or return values, if there's no indirection.Tobacconist
If you are copying these by memory, e.g., because you are copying a struct or an array, then that's going to happen via memcpy, which is going to use SIMD internally when appropriate, and the less bytes it has to copy the better. I don't see how raising the alignment to 16 helps here. That basically raises the alignment of anything containing a __int128, and might introduce padding, which in turn might end up making memcpy slower.Maher
@gnzlbg: For copying a single __int128, gcc inlines movdqa; it doesn't literally call memcpy. Back in ~2001 when x86-64 was still a paper spec and the ABI was being designed, the designers assumed that movdqa would always be faster than movdqu. (That was a correct assumption until Nehalem / Bulldozer.) Remember that SSE2 is baseline for x86-64, so the ABI is set up to take advantage. Even on modern CPUs, a cache-line split still has some cost. Using movdqa requires a compile-time guarantee that the __int128 object is 16-byte aligned; the ABI chose to do that everywhere.Tobacconist
@gnzlbg: giving __int128 full alignment doesn't increase the size of an array, or (if you put it first) a struct. You often want to arrange your structs to avoid padding, often by ordering from most-aligned to least-aligned.Tobacconist
@gnzlbg: higher alignment requirements can increase the space lost to fragmentation between objects, but giving natural alignment to a 16-byte object doesn't have to hurt. It does waste space on 10-byte long double vs. i386 System V storing long double as a 12-byte object with 4-byte alignment. (More padding bytes inside the object itself).Tobacconist
@gnzlbg: related: How do I organize members in a struct to waste the least space on alignment?Tobacconist

© 2022 - 2024 — McMap. All rights reserved.