Prohibit unaligned memory accesses on x86/x86_64
Asked Answered
Q

4

21

I want to emulate the system with prohibited unaligned memory accesses on the x86/x86_64. Is there some debugging tool or special mode to do this?

I want to run many (CPU-intensive) tests on the several x86/x86_64 PCs when working with software (C/C++) designed for SPARC or some other similar CPU. But my access to Sparc is limited.

As I know, Sparc always checks alignment in memory reads and writes to be natural (reading a byte from any address, but reading a 4-byte word only allowed when address is divisible by 4).

May be Valgrind or PIN has such mode? Or special mode of compiler? I'm searching for Linux non-commercial tool, but windows tools allowed too.

or may be there is secret CPU flag in EFLAGS?

Quiz answered 6/8, 2012 at 23:57 Comment(1)
bugzilla.mozilla.org/show_bug.cgi?id=476122 turns up after some googlingLura
D
8

It's tricky and I haven't done it personally, but I think you can do it in the following way:

x86_64 CPUs (specifically I've checked Intel Corei7 but I guess others as well) have a performance counter MISALIGN_MEM_REF which counter misaligned memory references.

So first of all, you can run your program and use "perf" tool under Linux to get a count of the number of misaligned access your code has done.

A more tricky and interesting hack would be to write a kernel module that programs the performance counter to generate an interrupt on overflow and get it to overflow the first unaligned load/store. Respond to this interrupt in your kernel module but sending a signal to your process.

This will, in effect, turn the x86_64 into a core that doesn't support unaligned access.

This wont be simple though - beside your code, the system libraries also use unaligned accesses, so it will be tricky to separate them from your own code.

Debut answered 7/8, 2012 at 6:15 Comment(4)
"kernel module that programs the performance counter to generate an interrupt" - isn't it a mode of perf/oprofile when we doing profiling? (perf record -e MISALIGN_MEM_REF:u -c 1.) And perf already has code to separate libraries and user code. The interrupt from perf will not stop the program; but perf will record where unaligned access was. I think this mode can be more helpful then killing program and do one-by-one fixes.Quiz
@Quiz you are correct. If generating an exception-like interrupt in the same way that would happened on a CPU that does not support unaligned load/store is not important, "perf record -e MISALIGN_MEM_REF:u -c 1" can be used to find every location in the program that does them, I agree.Debut
@osgx, for what version of perf does your above command work? I have to use -e alighment-faults on my perf_3.13 (Ubuntu 14.04) but it never records any actual faults for my test code with explicit faults in it.Mouflon
Nathan Kidd, don't use high-level event "alignment-faults" of perf (it is not mapped to anything on x86), find a raw hardware perf event of your CPU. Not every Intel CPU has the event MISALIGN_MEM_REF.Quiz
R
13

I've just read question Does unaligned memory access always cause bus errors? which linked to Wikipedia article Segmentation Fault.

In the article, there's a wonderful reminder of rather uncommon Intel processor flags AC aka Alignment Check.

And here's how to enable it (from Wikipedia's Bus Error example, with a red-zone clobber bug fixed for x86-64 System V so this is safe on Linux and MacOS, and converted from Basic asm which is never a good idea inside functions: you want changes to AC to be ordered wrt. memory accesses.

#if defined(__GNUC__)
# if defined(__i386__)
    /* Enable Alignment Checking on x86 */
    __asm__("pushf\n orl $0x40000,(%%esp)\n popf" ::: "memory");
# elif defined(__x86_64__) 
     /* Enable Alignment Checking on x86_64 */
    __asm__("add $-128, %%rsp \n"    // skip past the red-zone, in case there is one and the compiler has local vars there.
            "pushf\n"
            "orl $0x40000,(%%rsp)\n"
            "popf \n"
            "sub $-128, %%rsp"       // and restore the stack pointer.
           ::: "memory");       // ordered wrt. other mem access
# endif
#endif

Once enable it's working a lot like ARM alignment settings in /proc/cpu/alignment, see answer How to trap unaligned memory access? for examples.

Additionally, if you're using GCC, I suggest you enable -Wcast-align warnings. When building for a target with strict alignment requirements (ARM for example), GCC will report locations that might lead to unaligned memory access.

But note that libc's handwritten asm for memcpy and other functions will still make unaligned accesses, so setting AC is often not practical on x86 (including x86-64). GCC will sometimes emit asm that makes unaligned accesses even if your source doesn't, e.g. as an optimization to copy or zero two adjacent array elements or struct members at once.

Reptant answered 19/7, 2013 at 14:20 Comment(5)
A note for anyone using this on recent Linux: the C library will crash in strcmp, which is used in the dynamic loader. So do export LD_BIND_NOW=1 before running, so that ld.so will resolve all library symbols at startup instead of on demand.Cephalo
There is also 'STAC' instruction "0F 01 CB" - felixcloutier.com/x86/STAC.html "Sets the AC flag bit in EFLAGS register. This may enable alignment checking of user-mode data accesses."Quiz
@osgx: stac was new with the SMAP (Supervisor Mode Access Prevention) feature (Broadwell?), and is illegal at privilege level > 0. i.e. it faults in user-space. User-space has to continue using pushf/popf to set AC for itself. IDK why they decided not to let stac/clac decode in user-space since it's something user-space can do using the stack.Aguilar
Warning: Your code to enable it on x86_64 will clobber the red zone under gcc's nose.Tedesco
@JosephSible-ReinstateMonica: good point, I fixed that here. The code on wikipedia is now on the Bus Error article; I might get around to updating that page, too.Aguilar
D
8

It's tricky and I haven't done it personally, but I think you can do it in the following way:

x86_64 CPUs (specifically I've checked Intel Corei7 but I guess others as well) have a performance counter MISALIGN_MEM_REF which counter misaligned memory references.

So first of all, you can run your program and use "perf" tool under Linux to get a count of the number of misaligned access your code has done.

A more tricky and interesting hack would be to write a kernel module that programs the performance counter to generate an interrupt on overflow and get it to overflow the first unaligned load/store. Respond to this interrupt in your kernel module but sending a signal to your process.

This will, in effect, turn the x86_64 into a core that doesn't support unaligned access.

This wont be simple though - beside your code, the system libraries also use unaligned accesses, so it will be tricky to separate them from your own code.

Debut answered 7/8, 2012 at 6:15 Comment(4)
"kernel module that programs the performance counter to generate an interrupt" - isn't it a mode of perf/oprofile when we doing profiling? (perf record -e MISALIGN_MEM_REF:u -c 1.) And perf already has code to separate libraries and user code. The interrupt from perf will not stop the program; but perf will record where unaligned access was. I think this mode can be more helpful then killing program and do one-by-one fixes.Quiz
@Quiz you are correct. If generating an exception-like interrupt in the same way that would happened on a CPU that does not support unaligned load/store is not important, "perf record -e MISALIGN_MEM_REF:u -c 1" can be used to find every location in the program that does them, I agree.Debut
@osgx, for what version of perf does your above command work? I have to use -e alighment-faults on my perf_3.13 (Ubuntu 14.04) but it never records any actual faults for my test code with explicit faults in it.Mouflon
Nathan Kidd, don't use high-level event "alignment-faults" of perf (it is not mapped to anything on x86), find a raw hardware perf event of your CPU. Not every Intel CPU has the event MISALIGN_MEM_REF.Quiz
S
6

Both GCC and Clang have UndefinedBehaviorSanitizer built in. One of those checks, alignment, can be enabled with -fsanitize=alignment. It'll emit code to check pointer alignment at runtime and abort if unaligned pointers are dereferenced.

See online documentation at:

Samirasamisen answered 20/12, 2020 at 23:25 Comment(2)
Nice, that should catch C source-level misaligned pointers without tripping over potentially-unaligned accesses that compilers generate on purpose when optimizing narrow aligned accesses for a platform with fast known-safe unaligned access (like x86). Also, memcpy and other libc functions use unaligned accesses in hand-written asm, (e.g. for small non-power-of-2 sized copies in glibc). So enabling x86's AC flag generally isn't usableAguilar
As an example of compilers generating potentially-misaligned loads from safe C, godbolt.org/z/Y3WM1e1ba shows GCC and Clang copying two adjacent bytes of a char array with two separate assignments in the C++, but coalescing them into word load/store when compiling for x86-64 or other ISAs with safe and usually-cheap unaligned accesses.Aguilar
M
0

Perhaps you somehow could compile to SSE, with all aligned moves. Unaligned accesses with movaps are illegal and probably would behave as illegal unaligned accesses on other architechtures.

Mouldon answered 7/8, 2012 at 0:14 Comment(3)
not every operation in my code is vectorizable, I think. And task is to find all unaligned accesses.Quiz
You don't need to vectorize code to use SSE, it can do scalar arithmetic.Parental
@JensBjörnhager. Yes, but only 16-byte loads and stores have alignment-required versions like movaps and movdqa. Narrow instructions like movss (scalar single), movsd (scalar double) and movd/movq are just like regular GP-integer mov, not requiring any alignment. (Unless you enable the AC flag.) Of course, if GCC knows a pointer may not be aligned by 16, it will auto-vectorize with movups instead. Even if it's known to be aligned by 4 or 8.Aguilar

© 2022 - 2024 — McMap. All rights reserved.