Debugging SIGBUS on x86 Linux
Asked Answered
S

9

23

What can cause SIGBUS (bus error) on a generic x86 userland application in Linux? All of the discussion I've been able to find online is regarding memory alignment errors, which from what I understand doesn't really apply to x86.

(My code is running on a Geode, in case there are any relevant processor-specific quirks there.)

Spiceberry answered 18/1, 2010 at 20:58 Comment(0)
E
16

You can get a SIGBUS from an unaligned access if you turn on the unaligned access trap, but normally that's off on an x86. You can also get it from accessing a memory mapped device if there's an error of some kind.

Your best bet is using a debugger to identify the faulting instruction (SIGBUS is synchronous), and trying to see what it was trying to do.

Epizootic answered 18/1, 2010 at 21:10 Comment(2)
The debugger showed that the SIGBUS occurred immediately upon entering the function. Maybe I have some memory corruption, or maybe one of the function parameters is bad? I'll have to check the disassembly in the debugger for more details if the error occurs again.Spiceberry
@Josh -- check to see what the actual failing instruction is -- if its a push or pop, then your stack pointer is corrupted. If its something else, then the address in the instruction is the issue.Epizootic
W
24

SIGBUS can happen in Linux for quite a few reasons other than memory alignment faults - for example, if you attempt to access an mmap region beyond the end of the mapped file.

Are you using anything like mmap, shared memory regions, or similar?

Wrangler answered 19/1, 2010 at 0:10 Comment(4)
Yes, we're using shared memory regions. I'll investigate that possibility the next time this error comes up. Thanks.Spiceberry
mmap is necessarily used by any program calling malloc, since today malloc is a forward to mmap.Bova
@v.oddou: That's anonymous mmap, which doesn't have a concept of "beyond the end of the mapped file".Wrangler
@Wrangler ooooh. ok.Bova
E
16

You can get a SIGBUS from an unaligned access if you turn on the unaligned access trap, but normally that's off on an x86. You can also get it from accessing a memory mapped device if there's an error of some kind.

Your best bet is using a debugger to identify the faulting instruction (SIGBUS is synchronous), and trying to see what it was trying to do.

Epizootic answered 18/1, 2010 at 21:10 Comment(2)
The debugger showed that the SIGBUS occurred immediately upon entering the function. Maybe I have some memory corruption, or maybe one of the function parameters is bad? I'll have to check the disassembly in the debugger for more details if the error occurs again.Spiceberry
@Josh -- check to see what the actual failing instruction is -- if its a push or pop, then your stack pointer is corrupted. If its something else, then the address in the instruction is the issue.Epizootic
F
12

SIGBUS on x86 (including x86_64) Linux is a rare beast. It may appear from attempt to access past the end of mmaped file, or some other situations described by POSIX.

But from hardware faults it's not easy to get SIGBUS. Namely, unaligned access from any instruction — be it SIMD or not — usually results in SIGSEGV. Stack overflows result in SIGSEGV. Even accesses to addresses not in canonical form result in SIGSEGV. All this due to #GP being raised, which almost always maps to SIGSEGV.

Now, here're some ways to get SIGBUS due to a CPU exception:

  1. Enable AC bit in EFLAGS, then do unaligned access by any memory read or write instruction. See this discussion for details.

  2. Do canonical violation via a stack pointer register (rsp or rbp), generating #SS. Here's an example for GCC (compile with gcc test.c -o test -masm=intel):

int main()
{
    __asm__("mov rbp,0x400000000000000\n"
            "mov rax,[rbp]\n"
            "ud2\n");
}
Freidafreight answered 20/9, 2016 at 10:59 Comment(0)
B
9

Oh yes there's one more weird way to get SIGBUS.

If the kernel fails to page in a code page due to memory pressure (OOM killer must be disabled) or failed IO request, SIGBUS.

Beheld answered 19/1, 2010 at 18:32 Comment(0)
K
6

You may see SIGBUS when you're running the binary off NFS (network file system) and the file is changed. See https://rachelbythebay.com/w/2018/03/15/core/.

Kristoforo answered 17/1, 2020 at 9:24 Comment(0)
P
5

This was briefly mentioned above as a "failed IO request", but I'll expand upon it a bit.

A frequent case is when you lazily grow a file using ftruncate, map it into memory, start writing data and then run out of space in your filesystem. Physical space for mapped file is allocated on page faults, if there's none left then process receives a SIGBUS.

If you need your application to correctly recover from this error, it makes sense to explicitly reserve space prior to mmap using fallocate. Handling ENOSPC in errno after fallocate call is much simpler than dealing with signals, especially in a multi-threaded application.

Pectoralis answered 5/12, 2017 at 11:10 Comment(0)
B
1

If you request a mapping backed by hugepages with mmap and the MAP_HUGETLB flag, you can get SIGBUS if the kernel runs out of allocated huge pages and thus cannot handle a page fault.

In this case, you'll need to raise the number of allocated huge pages via

  • /sys/kernel/mm/hugepages/hugepages-<size>/nr_hugepages or
  • /sys/devices/system/node/nodeX/hugepages/hugepages-<size>/nr_hugepages on NUMA systems.
Bicameral answered 13/8, 2019 at 14:52 Comment(0)
I
0

A common cause of a bus error on x86 Linux is attempting to dereference something that is not really a pointer, or is a wild pointer. For example, failing to initialize a pointer, or assigning an arbitrary integer to a pointer and then attempting to dereference it will normally produce either a segmentation fault or a bus error.

Alignment does apply to x86. Even though memory on an x86 is byte-addressable (so you can have a char pointer to any address), if you have for example an pointer to a 4-byte integer, that pointer must be aligned.

You should run your program in gdb and determine which pointer access is generating the bus error to diagnose the issue.

Irrepressible answered 18/1, 2010 at 21:7 Comment(3)
All SSE load/store instructions have aligned and unaligned versions. For SSE (128-bit) accesses, they actually run at full-speed on contemporary Intel architectures, so there's no real penalty to just using unaligned moves unconditionally (unless you're optimizing to the level that the shorter length of aligned move instructions is significant, which is unlikely).Impromptu
In any case, with bad unaligned access I get SIGSEGV, not SIGBUS.Freidafreight
For me recently -- function pointer in recently freed/reused memory was intemittently SIGSEGV, SIGBUS.Klaus
T
-1

It's a bit off the beaten path, but you can get SIGBUS from an unaligned SSE2 (m128) load.

Talkfest answered 18/4, 2012 at 19:59 Comment(1)
Can you? It normally results in #GP, which maps to SIGSEGV.Freidafreight

© 2022 - 2024 — McMap. All rights reserved.