C code with undefined results, compiler generates invalid code (with -O3)
Asked Answered
G

1

11

I know that when you do certain things in a C program, the results are undefined. However, the compiler should not be generating invalid (machine) code, right? It would be reasonable if the code did the wrong thing, or if the code generated a segfault or something...

Is this supposed to happen according to the compiler spec, or is it a bug in the compiler?

Here's the (simple) program I'm using:

int main() {
    char *ptr = 0;
    *(ptr) = 0;
}

I'm compiling with -O3. That shouldn't generate invalid hardware instructions though, right? With -O0, I get a segfault when I run the code. That seems a lot more sane.

Edit: It's generating a ud2 instruction...

Garret answered 10/10, 2014 at 23:10 Comment(7)
UB means all bets are off, sanity was discarded long ago, no grounds to complain left!Lib
possible duplicate of Undefined, unspecified and implementation-defined behaviorLib
I didn't know GCC generated a ud2 with undefined behavior, but CLang does: blog.llvm.org/2011/05/what-every-c-programmer-should-know.htmlPannier
In a situation where the compiler knows there is undefined behaviour, it's absolutely fine and arguably the most secure thing to do to generate an instruction that just crashes your app.Ilka
How do you define "generate invalid code"? A wild branch can easily lead to trying to execute data, or the middle of a multibyte instruction, either of which would be highly likely to eventually produce something completely nonsensical.Humanly
It's not generating an invalid instruction. It's generating a valid instruction whose effect is to trigger an invalid opcode exception. Does this cause you any problems?Septal
As I stated in my answer, it is valid instruction and the compiler is in some ways being helpful since it would probably be worse for a serious problem to go unnoticed for a long time.Prostitute
P
17

The ud2 instruction is a "valid instruction" and it stands for Undefined Instruction and generates an invalid opcode exception clang and apparently gcc can generate this code when a program invokes undefined behavior.

From the clang link above the rationale is explained as follows:

Stores to null and calls through null pointers are turned into a __builtin_trap() call (which turns into a trapping instruction like "ud2" on x86). These happen all of the time in optimized code (as the result of other transformations like inlining and constant propagation) and we used to just delete the blocks that contained them because they were "obviously unreachable".

While (from a pedantic language lawyer standpoint) this is strictly true, we quickly learned that people do occasionally dereference null pointers, and having the code execution just fall into the top of the next function makes it very difficult to understand the problem. From the performance angle, the most important aspect of exposing these is to squash downstream code. Because of this, clang turns these into a runtime trap: if one of these is actually dynamically reached, the program stops immediately and can be debugged. The drawback of doing this is that we slightly bloat code by having these operations and having the conditions that control their predicates.

at the end of the day once your are invoking undefined behavior the behavior of your program is unpredictable. The philosophy here is that is probably better to crash hard and give the developer an indication that something is seriously wrong and allow them to debug fro the right point than to produce a program that seems to work but actually is broken.

As Ruslan notes, it is "valid" in the sense that it guaranteed to raise an invalid opcode exception as opposed to other unused sequences which may in the future become valid.

Prostitute answered 11/10, 2014 at 2:49 Comment(1)
Well, ud2 is as valid as is dw 0xffff. The only thing which makes it "valid" is that it's guaranteed to always be invalid, while other invalid byte sequences can be thought of as reserved and may become valid in future CPU implementations.Maurist

© 2022 - 2024 — McMap. All rights reserved.