Clang runtime fault when throwing aligned type. Compiler bug?
Asked Answered
A

2

13

I have a type that is declared with __attribute__((aligned(16))). When building with clang on OS X on x86_64, the following code causes a GP fault when attempting to throw a value containing this type. The fault happens because the compiler generates a 128-bit move instruction which must be aligned on a 16-byte boundary, but the address is not correctly aligned.

Here is a program that reproduces the problem:

#include <stdint.h>
#include <stdio.h>

struct __attribute__((aligned(16))) int128 {
    uint64_t w[2];
};

int main()
{
    try {
        int128 x;
        throw x;
    } catch (int128 &e) {
        printf("%p %lu\n", &e, sizeof(e));
    }
}

And the disassembly with the fault location marked with ->:

a.out`main:
    0x100000db0 <+0>:   pushq  %rbp
    0x100000db1 <+1>:   movq   %rsp, %rbp
    0x100000db4 <+4>:   subq   $0x40, %rsp
    0x100000db8 <+8>:   movl   $0x10, %eax
    0x100000dbd <+13>:  movl   %eax, %edi
    0x100000dbf <+15>:  callq  0x100000e8c               ; symbol stub for: __cxa_allocate_exception
    0x100000dc4 <+20>:  movaps -0x10(%rbp), %xmm0
->  0x100000dc8 <+24>:  movaps %xmm0, (%rax)
    0x100000dcb <+27>:  movq   0x23e(%rip), %rsi         ; (void *)0x0000000100001058
    0x100000dd2 <+34>:  xorl   %ecx, %ecx
    0x100000dd4 <+36>:  movl   %ecx, %edx
    0x100000dd6 <+38>:  movq   %rax, %rdi
    0x100000dd9 <+41>:  callq  0x100000e9e               ; symbol stub for: __cxa_throw

Current register:

(lldb) register read rax
       rax = 0x0000000100905b08

It looks like what is happening is the __cxa_allocate_exception function has no knowledge of the alignment requirements of the type for which it is allocating storage. On my system it happens to allocate an address that ends in 8, and is therefore not 16-byte aligned. When the movaps instruction attempts to move data into that memory location, the CPU faults due to unaligned access.

Compiler info (clang from Xcode 6.3.2):

$ clang --version
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.3.0
Thread model: posix

Is this a compiler bug? What might be a way to work around this?

UPDATE: I have submitted this to the LLVM bug database: https://llvm.org/bugs/show_bug.cgi?id=23868

Adjunction answered 17/6, 2015 at 8:19 Comment(8)
Reproduced on my machine with clang, but no error with GCC 4.9. Smells like a fairly obscure little compiler bug...Lauralauraceous
The obvious workaround would be to declare the struct field as uint16_t w[8] and manage the uint64_t values using accessor member functions, if needed.Allysonalma
@rodrigo: The int128 struct in my example is actually the BID_UINT128 structure from the Intel Decimal Floating-Point Math Library, so it's not practical to change its definition.Adjunction
Also reproduced using the SSE intrinsic type __m128i. I think you ought to post this to llvm.org/bugs.Lauralauraceous
@rodrigo: I think the alignment is a critical part of the structure here, especially if it needs to interoperate with e.g. SSE instructions.Lauralauraceous
And it turns out that I just got "lucky" with GCC 4.9. Looks like their allocator happens to return values aligned to 16, and GCC doesn't output SSE instructions for this particular sequence. It would probably still break if it used 256-bit instructions with 32-byte alignment requirements, though I'm not convinced GCC ever does.Lauralauraceous
Ok... a not-so-obvious workaround is to throw a std::unique_ptr<int128> ;-)Allysonalma
@rodrigo: Yeah, I'm considering throwing a pointer here. Just so I can make this build for now. :)Adjunction
L
4

Looking into this a bit further, it seems like __cxa_allocate_exception is basically never defined to understand alignment (for Clang or GCC), so throwing aligned objects basically falls into UB (well, alignment was a compiler-specific extension anyway...). The only alignment it appears to guarantee is 8 bytes since that is the largest alignment required by any built-in type (double).

The easiest workaround I can think of would be simply to use an unaligned type in throw:

struct unaligned_int128 {
    uint64_t w[2];
    unaligned_int128(const int128 &x) { w[0] = x.w[0]; w[1] = x.w[1]; }
};

int main()
{
    try {
        int128 x;
        throw unaligned_int128(x);
    } catch (unaligned_int128 &e) {
        printf("%p %lu\n", &e, sizeof(e));
    }
}
Lauralauraceous answered 17/6, 2015 at 8:39 Comment(8)
In my situation, the aligned type in question is actually a field of a field of a class used in an exception hierarchy. So it would be pretty awkward to declare an unaligned version of the type just to throw the exception, and then have to copy it back to a properly aligned instance to actually use it in the exception handler. Thanks for the idea, though.Adjunction
This begs the question: why are you using aligned objects inside an exception hierarchy anyway? Throwing aligned objects looks like it's UB basically no matter what...Lauralauraceous
The aligned object is a 128-bit decimal floating point number as noted above. The implementation comes from a library I didn't write. A field of my exception contains one of these objects which happens to be declared as aligned. (I didn't even know it was declared aligned until I ran into this bug.)Adjunction
Yeah, I figured it was something convoluted like that. Maybe try -mno-sse then for any compilation unit that catches or handles one of these? It's a hack but that workaround should definitely work (although it may hurt your performance a bit).Lauralauraceous
Awesome. -mno-sse causes the compiler itself to crash! (not on the example code above, but on my real code.)Adjunction
what, just what...well, that at least is definitely worth a trip to the LLVM bugs page.Lauralauraceous
I'm submitting that one directly to Apple as requested by clang when it crashed. The preprocessed source is 4.2 MB.Adjunction
x86-64 System V has alignof(max_align_t) == 16. So does i386 System V, at least these days godbolt.org/z/gBi2PR. But MSVC does have only 8, even in Windows x64 not just 32-bit code.Defloration
M
1

According to the LLVM bug tracking this issue, the internal definition of alignment used by __cxa_allocation_exception was updated to match GCC which assumes a "maximum useful alignment".

The Clang and libunwind unwind.h header state:

The Itanium ABI requires that _Unwind_Exception objects are "double-word aligned". GCC has interpreted this to mean "use the maximum useful alignment for the target"; so do we.

https://bugs.llvm.org/show_bug.cgi?id=23868

So it's technically an ABI issue that's been resolved by a looser implementation by clang and gcc.

Margaux answered 25/2, 2019 at 20:44 Comment(4)
x86-64 System V has alignof(max_align_t) == 16. So does i386 System V, at least these days godbolt.org/z/gBi2PR. But MSVC does have only 8, even in Windows x64 not just 32-bit code. So alignment up to alignof(max_align_t) is fine, but that depends on target platform?Defloration
We did not experience the issue using types with 16-byte alignment requirements with Visual Studio on Windows. Have you?Margaux
No, I haven't tried to repro this, and I don't have a Windows machine to test on anyway. MSVC's exception machinery is presumably different from clang's, and MSVC avoids alignment-required vector loads/stores when it can. (e.g. it will use movups instead of movaps for _mm_store_ps). And the Windows ABI for exceptions is different-ish, I think.Defloration
The real issue here seems to be that the exception object is in unspecified storage. Would be nice if the expectation was clearer. en.cppreference.com/w/cpp/language/throwMargaux

© 2022 - 2024 — McMap. All rights reserved.