Motivation
I am trying to improve my understanding of a SIGBUS error in Xwayland. This has been seen by several Fedora Linux users since around the 20th of February 2018, with Xwayland 1.19.6-5.fc27.x86_64
and Linux kernel 4.15.3-300.fc27.x86-64
.
Sadly I do not have the kernel "segfault" log message (or equivalent for SIGBUS). Xwayland has some pointless code which traps the fatal signal. But I can see siginfo
by debugging the coredump, and this seems to be nearly as good.
Definition
I understand that a "major page fault" occurs when a page of virtual memory is not available in RAM, and must be read from disk. I think I'm specifically interested in pages backed by a ext4 filesystem (e.g. no direct access to block devices) for this question.
Therefore a "minor page fault" is when no disk access is necessary. I assume the difference is fairly well-defined because Linux exposes counters for major and minor page faults.
My question
If the kernel sends a program SIGBUS, I wonder if I should generally expect that this would have been a major page fault.
According to the coredump and disassembly, the program is reading memory when it receives SIGBUS, not writing it. The fault address in siginfo->si_addr
is within a mapped system executable, which is not writeable by the user, and the address seems within the bounds of the current file length. In fact when debugging the coredump, I have read very convincing values from the memory address. It seems the coredump generation process had no difficulty reading this address :-(.
I'm also confident in ruling out the "invalid address alignment" case (BUS_ADRALN), because siginfo->si_code
is 2, i.e. BUS_ADRERR, "non-existent physical address". Also because I'm on x86, which permits unaligned accesses in most cases, and the trap isn't in any SSE extended instruction.
I considered what the kernel is normally responsible for, when it handles a page fault which it determines is "minor". I suppose minor faults could fail to allocate memory, and hence raise SIGBUS. However, I believe I would have noticed such an allocation failure:
I have plenty of free swap to evict user pages to, and I did not notice the usual obvious slowdown that occurs when my system starts swapping. The crash happened a few seconds after waking a laptop from suspend to ram, which would not have been long enough to fill 8GB of swap even at ~100MB/s. Nor did I see the dread Out Of Memory (OOM) killer appear in kernel logs, as I would expect if the kernel failed allocating a page frame or page table.
Is there some other possibility that a minor page fault could have failed and caused the SIGBUS? I.e. is there some cause which I would not have noticed, when looking for errors in the kernel log? And which could have a quick onset?
Again, multiple coredumps are showing this as a page fault triggered by reading from a mapped file on the filesystem.
Ulterior motivation
I would really like to have missed a case for minor page faults. Because the horrifying flipside of this is that I don't see how this SIGBUS could have been cause by the hard page fault side of things. Several of us users have very similar-looking errors, starting a few months ago. There is no IO error in my kernel logs. During normal operation, I have no IO errors when reading the indicated file. I have no errors when running rpm --verify --all
, or when running an extended SMART test on the HDD. Unfortunately I seem to have very few suspects. The closest suspect I have is a kernel upgrade, which I would obviously prefer to rule out; the dates don't exactly prove it but it's not entirely ruled out. Next closest in the dates is this years microcode update; this seems like it would be even harder to nail down.
Known causes of minor page faults
- Logically, it sounds like minor page faults occur when implementing copy-on-write for MAP_PRIVATE mappings.
- It should also include read faults on /dev/zero or MAP_ANONYMOUS, assuming a kernel did not implement them as reading a shared zero page and did not implement them to allocate pages for the entire mapping immediately.
But more generally, it could be any first access to a page. This is because it seems that the page tables for memory mappings are generally populated on-demand. (Which would be done by a page fault, and if the file page was already in cache, it would only be a minor page fault).
MAP_NONBLOCK (since Linux 2.5.46)
This flag is meaningful only in conjunction with MAP_POPULATE. Don't perform read-ahead: create page tables entries only for pages that are already present in RAM. Since Linux 2.6.23, this flag causes MAP_POPULATE to do nothing. One day, the combina‐ tion of MAP_POPULATE and MAP_NONBLOCK may be reimplemented.
EDIT: Further excerpts detailing the above
A commenter asked for more concrete details, to clarify the faulting address and instruction. There are many excerpts in the initial link https://bugzilla.redhat.com/show_bug.cgi?id=1557682
The fault varies as described in the bug link. Here are fresh excerpts from a recent instance.
$ gdb 2018-03-21.core
...
Core was generated by `/usr/bin/Xwayland :0 -rootless -terminate -core -listen 4 -listen 5 -displayfd'.
Program terminated with signal SIGBUS, Bus error.
#0 _dl_fixup (l=0x7fc0be2e0130, reloc_arg=203) at ../elf/dl-runtime.c:73
73 const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
[Current thread is 1 (Thread 0x7fc0be29fa80 (LWP 1918))]
(gdb) p $_siginfo.si_signum
$1 = 7
(gdb) p $_siginfo.si_code
$2 = 2
(gdb) p $_siginfo._sifields._sigfault.si_addr
$3 = (void *) 0x41bd80
(gdb) disassemble
Dump of assembler code for function _dl_fixup:
0x00007fc0be0c8bd0 <+0>: push %rbx
0x00007fc0be0c8bd1 <+1>: mov %rdi,%r10
0x00007fc0be0c8bd4 <+4>: mov %esi,%esi
0x00007fc0be0c8bd6 <+6>: lea (%rsi,%rsi,2),%rdx
0x00007fc0be0c8bda <+10>: sub $0x10,%rsp
0x00007fc0be0c8bde <+14>: mov 0x68(%rdi),%rax
0x00007fc0be0c8be2 <+18>: mov 0x8(%rax),%rdi
0x00007fc0be0c8be6 <+22>: mov 0xf8(%r10),%rax
0x00007fc0be0c8bed <+29>: mov 0x8(%rax),%rax
0x00007fc0be0c8bf1 <+33>: lea (%rax,%rdx,8),%r8
0x00007fc0be0c8bf5 <+37>: mov 0x70(%r10),%rax
=> 0x00007fc0be0c8bf9 <+41>: mov 0x8(%r8),%rcx
(gdb) p/x $r8
$4 = 0x41bd78
(gdb) p/x $r8 + 8
$5 = 0x41bd80
Note this instruction is fetching the value reloc->r_info
as per the highlighted source line.
(gdb) p reloc
$6 = (const Elf64_Rela * const) 0x41bd78
(gdb) p &reloc->r_info
$7 = (Elf64_Xword *) 0x41bd80
(gdb) p *reloc
$8 = {r_offset = 8443504, r_info = 936302870535, r_addend = 0}
The faulting address falls within the text mapping below (from maps
file captured by abrtd
):
00400000-0060b000 r-xp 00000000 fd:00 1708508 /usr/bin/Xwayland
0080a000-0080d000 r--p 0020a000 fd:00 1708508 /usr/bin/Xwayland
0080d000-00817000 rw-p 0020d000 fd:00 1708508 /usr/bin/Xwayland
$ size -x /usr/bin/Xwayland
text data bss dec hex filename
0x209ffb 0xbe9d 0x1f3e0 2314872 235278 /usr/bin/Xwayland
sigbus
, and what values were in the registers it used for an addressing mode? – Roeslerx86, which doesn't care about alignment in most cases
I beg to differ. How about an unaligned acces crossing a page boundary? – Endbrainmov 0x8(%r8),%rcx
looks normal, andr8
is 8-byte aligned (not that it matters), and pointing to a valid part of the mapping. Did you check your kernel log for I/O errors? Have you verified theXWayland
file itself to see if it reads ok? Maybe try copying it to/tmp
and running from there to avoid problems with a bad disk. Does XWayland fault every time, or just occasionally? – Roeslerrpm --verify
to verify all system files and Xwayland specifically against package checksums, as specifically mentioned; I don't see anything. It appears to fault about 10ms after the system wakes from suspend, but not every time. @Sommelier I think it is extremely unlikely that a privileged process is truncating /usr/bin/Xwayland, and then resetting the size with the correct data before I rungdb
on the coredump. – Glendoraglendower