Why does "empty" loop cause bus error when compiling C program with clang -O2 on macOS?
Asked Answered
M

2

7

I'm on macOS High Sierra.

$ uname -v
Darwin Kernel Version 17.2.0: Fri Sep 29 18:27:05 PDT 2017; root:xnu-4570.20.62~3/RELEASE_X86_64

I have the following synthesized program.

void nop1() {
  for (;;);
}

void nop2() {
  while (1);
}

void nop3() {
  int i = 0;
  while(1) {
    i++;
  }
}

void nop4() {
  static int i = 0;
  while(1) {
    i++;
  };
}

int main() {
  nop1();
  return 0;
}

Edit 2: I've now explicitly compiled with clang in the below examples.

When I compile and run the following C program with clang -O2 I get bus error when main() calls nop1(), nop2(), nop3() but not for nop4().

$ ./a.out
[1]    93029 bus error (core dumped)  ./a.out

When compiling without -O2 all versions runs without bus error. I guess the optimizer transforms nop3() to nop2(). I would like to understand what causes the bus error in each case and why using a static variable in nop4() does not causes a bus error.

This is my clang version:

$ clang -v
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin17.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

I've also tested with gcc on Linux:

$ uname -a
Linux trygger 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

, and the programs runs fine for all nop-functions, both with and without -O2.

This is my gcc version on Linux.

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.6' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.6)

Edit 4

Maybe the output from otool is easer to analyse. First with -O2.

$ clang -O2 segfault.c
$ otool -vt a.out
a.out:
(__TEXT,__text) section
_nop1:
0000000100000f30    pushq   %rbp
0000000100000f31    movq    %rsp, %rbp
0000000100000f34    nopw    %cs:(%rax,%rax)
0000000100000f40    jmp 0x100000f40
0000000100000f42    nopw    %cs:(%rax,%rax)
_nop2:
0000000100000f50    pushq   %rbp
0000000100000f51    movq    %rsp, %rbp
0000000100000f54    nopw    %cs:(%rax,%rax)
0000000100000f60    jmp 0x100000f60
0000000100000f62    nopw    %cs:(%rax,%rax)
_nop3:
0000000100000f70    pushq   %rbp
0000000100000f71    movq    %rsp, %rbp
0000000100000f74    nopw    %cs:(%rax,%rax)
0000000100000f80    jmp 0x100000f80
0000000100000f82    nopw    %cs:(%rax,%rax)
_nop4:
0000000100000f90    pushq   %rbp
0000000100000f91    movq    %rsp, %rbp
0000000100000f94    nopw    %cs:(%rax,%rax)
0000000100000fa0    jmp 0x100000fa0
0000000100000fa2    nopw    %cs:(%rax,%rax)
_main:
0000000100000fb0    pushq   %rbp
0000000100000fb1    movq    %rsp, %rbp

And without -O2.

$ clang segfault.c
$ otool -vt a.out
a.out:
(__TEXT,__text) section
_nop1:
0000000100000f30    pushq   %rbp
0000000100000f31    movq    %rsp, %rbp
0000000100000f34    jmp 0x100000f39
0000000100000f39    jmp 0x100000f39
0000000100000f3e    nop
_nop2:
0000000100000f40    pushq   %rbp
0000000100000f41    movq    %rsp, %rbp
0000000100000f44    jmp 0x100000f49
0000000100000f49    jmp 0x100000f49
0000000100000f4e    nop
_nop3:
0000000100000f50    pushq   %rbp
0000000100000f51    movq    %rsp, %rbp
0000000100000f54    movl    $0x0, -0x4(%rbp)
0000000100000f5b    movl    -0x4(%rbp), %eax
0000000100000f5e    addl    $0x1, %eax
0000000100000f61    movl    %eax, -0x4(%rbp)
0000000100000f64    jmp 0x100000f5b
0000000100000f69    nopl    (%rax)
_nop4:
0000000100000f70    pushq   %rbp
0000000100000f71    movq    %rsp, %rbp
0000000100000f74    jmp 0x100000f79
0000000100000f79    movl    0x81(%rip), %eax
0000000100000f7f    addl    $0x1, %eax
0000000100000f82    movl    %eax, 0x78(%rip)
0000000100000f88    jmp 0x100000f79
0000000100000f8d    nopl    (%rax)
_main:
0000000100000f90    pushq   %rbp
0000000100000f91    movq    %rsp, %rbp
0000000100000f94    subq    $0x10, %rsp
0000000100000f98    movl    $0x0, -0x4(%rbp)
0000000100000f9f    callq   0x100000f40
0000000100000fa4    xorl    %eax, %eax
0000000100000fa6    addq    $0x10, %rsp
0000000100000faa    popq    %rbp
0000000100000fab    retq

Edit 3

As requested by @Olaf I've add the assembly generated by clang -S.

    .section    __TEXT,__text,regular,pure_instructions
    .macosx_version_min 10, 12
    .globl  _nop1
    .p2align    4, 0x90
_nop1:                                  ## @nop1
    .cfi_startproc
## BB#0:
    pushq   %rbp
Ltmp0:
    .cfi_def_cfa_offset 16
Ltmp1:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp2:
    .cfi_def_cfa_register %rbp
    jmp LBB0_1
LBB0_1:                                 ## =>This Inner Loop Header: Depth=1
    jmp LBB0_1
    .cfi_endproc

    .globl  _nop2
    .p2align    4, 0x90
_nop2:                                  ## @nop2
    .cfi_startproc
## BB#0:
    pushq   %rbp
Ltmp3:
    .cfi_def_cfa_offset 16
Ltmp4:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp5:
    .cfi_def_cfa_register %rbp
    jmp LBB1_1
LBB1_1:                                 ## =>This Inner Loop Header: Depth=1
    jmp LBB1_1
    .cfi_endproc

    .globl  _nop3
    .p2align    4, 0x90
_nop3:                                  ## @nop3
    .cfi_startproc
## BB#0:
    pushq   %rbp
Ltmp6:
    .cfi_def_cfa_offset 16
Ltmp7:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp8:
    .cfi_def_cfa_register %rbp
    movl    $0, -4(%rbp)
LBB2_1:                                 ## =>This Inner Loop Header: Depth=1
    movl    -4(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -4(%rbp)
    jmp LBB2_1
    .cfi_endproc

    .globl  _nop4
    .p2align    4, 0x90
_nop4:                                  ## @nop4
    .cfi_startproc
## BB#0:
    pushq   %rbp
Ltmp9:
    .cfi_def_cfa_offset 16
Ltmp10:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp11:
    .cfi_def_cfa_register %rbp
    jmp LBB3_1
LBB3_1:                                 ## =>This Inner Loop Header: Depth=1
    movl    _nop4.i(%rip), %eax
    addl    $1, %eax
    movl    %eax, _nop4.i(%rip)
    jmp LBB3_1
    .cfi_endproc

    .globl  _main
    .p2align    4, 0x90
_main:                                  ## @main
    .cfi_startproc
## BB#0:
    pushq   %rbp
Ltmp12:
    .cfi_def_cfa_offset 16
Ltmp13:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp14:
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    movl    $0, -4(%rbp)
    callq   _nop1
    xorl    %eax, %eax
    addq    $16, %rsp
    popq    %rbp
    retq
    .cfi_endproc

.zerofill __DATA,__bss,_nop4.i,4,2      ## @nop4.i

.subsections_via_symbols
Managing answered 16/2, 2018 at 22:20 Comment(25)
@user2357112 You are probably right, make sense for the behaviour of a non terminating loop with no side effects to be undefined.Managing
Still a bit odd. 'Bus error' is.. unusual for this.Describe
You might be interested in this and the linked question.Moore
Get the tags right. This is clearly not about gcc, but clang, a completely different compiler.Centrifugal
A loop with a constant or missing terminating condition, no side effects, and no exit is defined: It loops. Something else is going on here. It is okay for nop3 to fail, since it overflows int. But I do not see why nop1 or nop2 fail. C++ has a rule allowing the compiler to assume loops terminate or do something observable, but, for C, that applies only for loops with non-constant terminating conditions. (An empty terminating condition counts as a constant.)Null
@Olaf You were hiding for a while.. :)Moore
There is information missing, namely the assembly code.Centrifugal
@EugeneSh.: I just kept away from the C tag for a long time. And this is hopefully just an intermezzo. Too frustrating.Centrifugal
Sorry @Olaf, I was not aware of the differences between gcc and clang.Managing
@EricPostpischil: Thank you for the info. Kindly notice I already did.Centrifugal
@KarlMarklund: "Apple LLVM version 8.1.0 (clang-802.0.42)" is pretty clear. And the line before might indicate you indeed don't compile as C, but C++, a different language. Please get the basic information correct before asking.Centrifugal
@Olaf: I confirmed the described behavior compiled as C, not C++.Null
@Olaf We all need to learn the basics at some point :) I'm here to learn and thankful for all your efforts.Managing
@KarlMarklund: Well, in that case as you start digging deep from the start, you should get comfortable with Assembler language for your target architecture/platform and check the Assembler code generated by clang. Also learn how to use the debugger, that would have shown instantly what was going on.Centrifugal
@Olaf: Margherita is a version of pizza. One cannot tell from the name alone what a thing is. A novice cannot be blamed for believing the version string printed by a gcc command tells what version of GCC it is.Null
When all is said and done, and all relevant standards read, it's very umm 'unfriendly' for a compiler to generate a function prologue with no return and so generate a bus error at runtime:(Describe
@Olaf: I do not see how the assembly answers this. It shows what the compiler did, not why.Null
Looks like it is clearly violating the C standard. So I would call it a compiler bug if it is actually compiled as C.Moore
@EricPostpischil: "Why does “empty” loop cause bus error" would very well be answered by this. At least it would be the first step to find out what's going on. Next would be to find out why the compiler does this. But if that is compiled as C, the compiler would indeed be wrong here. That's why I'm asking to thoughly check it is not compiled as C++, expecially as the code uses discouraged legacy (in C) non-prototype function declarators.Centrifugal
@EugeneSh.: That's why I would like to see the assembler code in the question.Centrifugal
@Olaf I've edit the question and added the assembly generated by gcc -S.Managing
@KarlMarklund: Is that ggenerated by gcc or clang? You should remove the gcc part and be clar you use clang in the text, too. It was one of the most stu**d decissions by Apple to create a gcc link to clang when they changed the compiler platofrm.Centrifugal
@Olaf I've now edit the question and explicitly used clang.Managing
@Olaf I've now added the output from otool -vt and will try to analyze the differences between compiling with and without -O2.Managing
I removed my downvote. Sure you pasted all code for -O2? If yes, compare the code for _main to that without optimisation and you should see the problem instantly. That should give you enough pointer to see what's going on. The why (known clang bug) has already been answered. Btw, I'm sure I saw a dupe about this here some time ago. Too weekendy to search, though.Centrifugal
G
5

This is a known bug in LLVM. The behavior you see is valid for C++ but not for C.

See bug report #965 back from 2006 here.

Recently, this problem emerged again due to Rust being hit by this.

There is a patch with a fix here, which was merged in Nov 2017, but I am not which version it will be released in.

See also a discussion in the mailing list here.

Gramicidin answered 16/2, 2018 at 23:18 Comment(1)
Article which has nice explanation about this: blog.regehr.org/archives/140Grimmett
D
0

It looks like clang generates the prologue to the function and nothing else, letting execution fall through to another unrelated bit of code. On my machine, it produces:

0000000100000fa0 <_main>:
   100000fa0:   55                      push   rbp
   100000fa1:   48 89 e5                mov    rbp,rsp

Disassembly of section __TEXT.__unwind_info:

0000000100000fa4 <__TEXT.__unwind_info>:
   100000fa4:   01 00                   add    DWORD PTR [rax],eax
   100000fa6:   00 00                   add    BYTE PTR [rax],al
   100000fa8:   1c 00                   sbb    al,0x0
   100000faa:   00 00                   add    BYTE PTR [rax],al
   100000fac:   00 00                   add    BYTE PTR [rax],al

The bus error is caused by the first add instruction, as rax points to _main, and it tries to write to read only memory.

Interestingly enough, placing __asm__ volatile("nop\n"); as the first line in nop1 gives correct behavior.

Doctrine answered 16/2, 2018 at 22:34 Comment(3)
How did you disassemble the unwind_info section? With otool -s __TEXT __unwind_info a.out I only get raw hex bytes?Managing
if you have gnu binutils installed, gobjdump -d filename -M intelDoctrine
Thanks, now updating Xcode and then I'll hopefully will be able to brew install binutils.Managing

© 2022 - 2024 — McMap. All rights reserved.