In C++, a compiler can assume that no UB will happen, affecting behaviour (even visible side-effects like I/O) in paths of execution that will encounter UB but haven't yet, if I understand the phrasing correctly.
Does C have any requirement to execute a program "correctly" up to the last visible side-effect before the abstract machine encounters UB? Compilers seem to behave this way, but do so in C++ mode as well as C, so it could just be a missed optimization or an intentional choice to be less "programmer-hostile".
Would such an optimization be allowed by the ISO C standard? (Compilers might still reasonably choose not to do so for various reasons including difficulty of implementation without mis-compiling any other cases, or "quality of implementation" factors.)
The ISO C++ standard is fairly explicit about this point
This question is (primarily) about C, but C++ is at least an interesting point of comparison because the concept of UB is at least similar in both languages. I don't see any similarly explicit language in ISO C, hence this question.
ISO C++ [intro.abstract]/5 says this (and has since at least C++11, probably earlier):
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
I think the intended meaning of places no requirement on the implementation executing that program with that input is that even visible side-effects sequenced before the abstract machine encounters UB (such as a volatile
access, or I/O including an unbuffered fprintf(stderr, ...)
) aren't required to happen.
The phrasing "executing that program with that input" is talking about whole program, right from the start of its execution. (Some people talk about "time travel", but it's really a matter of things like later code allowing value-range assumptions (such as non-null) that affect earlier branching in compile-time decision making, as others have put it in a previous SO question. Compilers are allowed to assume that an execution of the whole program won't encounter UB.)
Test cases for real compiler behaviour
I tried to get a compiler to do the optimization I was wondering about. That would pretty definitively indicate it was allowed according the the compiler developers' interpretation of the standard. (Unless it was actually a compiler bug.) But everything I've tried so far has shown compilers preserving visible side-effects.
I've only tried with volatile
accesses (not putchar
or std::cout<<
or whatever), on the assumption that it should be easier for the optimizer to see around and understand. Calls to non-inline functions like printf
are generally black-boxes for optimizers, unless they're special-cased based on function name like for some very important functions such as memcpy
. Also, a call to an I/O function could hypothetically block forever or even possibly abort, and thus never encounter the UB in later code.
Actually I've only tried with volatile
stores, not volatile
reads. Compilers might handle that differently some some reason, although you'd hope not.
Compilers do assume that volatile
accesses don't trap, e.g. they do dead-store elimination around them (Godbolt). So a volatile
load or store shouldn't stop the optimizer from seeing that UB in this path of execution will happen. (Update: this may not have proved as much as I thought, since if it did trap to a signal handler inside this program, ISO C and C++ both say that only volatile sig_atomic_t
variables will have their "expected" values in a signal handler. So dead-store elimination of a non-volatile
global across something that might raise a signal and then resume or not would still be allowed. But it still shows that volatile
accesses are assumed not to be too weird.)
Some previous examples (such as Undefined behavior causing time travel) revolve around if/else examples where UB would be encountered in one side so compilers can assume the other side is taken.
But those have no visible side effects in the path of execution that definitely does lead to UB, only in the other path. This example does have that:
volatile int sink; // same code-gen with plain int sink;
void foo(int *p) {
if (p) // null pointer check *could* be deleted due to unconditional deref later.
sink = 1; // but GCC / clang / MSVC don't
*p = 2;
}
GCC13 and clang16 compile it the same way for x86-64 (with -O3
). (Godbolt: I'm compiling with -xc++
to tell them to treat it as C++.) Also MSVC19.37 but with the p
arg in RCX instead of RDI.
foo(int*):
test rdi, rdi
je .LBB0_2 # if (!p) goto .LBB0_2, skipping the if body
mov dword ptr [rip + sink], 1 # then fall-through, rejoining the other path
.LBB0_2:
mov dword ptr [rdi], 2
ret
Using if(!p)
as the loop condition, MSVC's code gen is the same except for jne
instead of je
. GCC and Clang do tail-duplication, making two blocks that each end with a ret
, the first being just *p=2;
and the second doing both stores. (Which is interesting since clang compiles *(int*)0
to zero instructions, but with tail-duplication it creates a block where it's proved p
is null but still emits an actual store instruction.)
If we put *p = 2;
before the if()
, the null pointer check will indeed be deleted. (baz()
in the Godbolt link: compiles to 2 unconditional stores.)
The fact that the "expected" optimization doesn't happen even with non-volatile (with -xc++
or -xc
) could be a sign that compilers try to avoid retroactive effects in general
as a way to avoid changing visible side-effects before UB is reached. Or it could just tell us that compilers aren't aggressive enough to demo my point. Inventing stores in non-UB cases is a tricky thread-safety violation, so I could imagine compilers being cautious about it.
One example of some success, at least for a non-volatile
store, is:
volatile int sink;
void bar_nv(int *p) {
/*volatile*/ int sink2;
if (p) {
sink = 3; // volatile
}else{
sink2 = 4; // non-volatile
*p = 4; // reachable only with p == NULL, so compilers can assume it's *not* reached. Only clang takes advantage
}
}
Clang16 -O3
, compiling as either C or C++. (Unlike GCC which still branches).
bar_nv(int*):
mov dword ptr [rip + sink], 3
ret
This optimizes away the entire branch containing the sink2
non-volatile side-effect.
If we make sink2
also volatile
, then it branches and still does the visible side-effect of storing to sink2
in that path of execution before falling off the end of the function (not actually dereferencing p
which is known to be null in that side of the if
). See bar_v
in the Godbolt link.
Another case I was playing around with: https://godbolt.org/z/vjqeb59TG puts *p
derefs in both side of an if/else, leading to similar results to bar_nv
vs. bar_v
.
So I wasn't able to get compilers to optimize away a volatile side-effect from a path of execution that definitely leads to UB even in C++. But that doesn't prove the ISO C++ standard doesn't allow it. (I'm still somewhat curious if this is intentional, or if there is a case where such optimization happens.)
Doing a visible side-effect without actually faulting on a null-deref is different: null-deref is UB so nothing is guaranteed, not even actually faulting. It's UB so anything can happen, including doing nothing or doing random I/O.
Earlier Q&As (mostly I found C++ questions, not C):
This question was motivated by discussion in comments on a recent Q&A with @user541686, who claimed that even the C++ wording doesn't permit a compiler to ignore visible side-effects (especially
printf
orvolatile
accesses) before an undefined operation is reached. In later discussion, they may have narrowed their argument to a claim that such optimization is impossible because I/O might fault or block forever, thus not actually reaching the undefined operation. But I was able to show that GCC and clang do assume thatvolatile
operations won't fault, or at least that they won't trap to other code within this program that could observe the state of other global variables.So I think they're wrong about C++, but find it plausible that ISO C could at least be interpreted to require all visible side-effects before the undefined operation to actually happen. (Which is what compilers actually do for C and C++.) But is that common, or is it normally interpreted to not require that?
Undefined behavior causing time travel - c++, asking about Raymond Chen's article Undefined behavior can result in time travel. That example doesn't have any visible side-effects before the UB in the path of execution which encounters UB and thus is assumed not to be reached by the earlier branch. Answers on that question describe the compiler being allowed to assume that UB is not reachable, but in that context it's not discussing omitting a visible side-effect that would have happened before the undefined operation.
C++ What is the earliest undefined behavior can manifest itself? - c++, most answers agree that the whole execution of the program is undefined, not just after UB is reached.
Are there any barriers that a time travelling undefined behavior may not cross? - A c++ version of this question, with a similar litmus test. Answered only in comments, but opinions are that the visible side-effect is not guaranteed to happen.
If a part of the program exhibits undefined behavior, would it affect the remainder of the program? - cc++ haccks's answer quotes the C standard (n1570-3.4.3 (P2)) about the consequences of UB, and then asserts without justification that it applies to the whole program. That's not obvious from that wording in the C standard, and IDK if there's anything else relevant. Bathsheba's answer says "Paradoxically, the behaviour of statements that have ran prior to that are undefined too." but doesn't specify if that's talking about C or C++ or both, and doesn't cite any standardese to support it.
Does an expression with undefined behaviour that is never actually executed make a program erroneous? c++ question, but @supercat posted a c answer saying
A C compiler is allowed to do anything it likes as soon as a program enters a state via which there is no defined sequence of events which would allow the program to avoid invoking Undefined Behavior at some point in the future
They don't support that with a citation from the standard, but they commented on another question:
Don't use the term "once Undefined Behavior occurs", but rather "Once conditions have been established which would make Undefined Behavior inevitable". Language in the C Standard which may have been intended to make Undefined Behavior be unsequenced relative to other code has instead been interpreted by some compiler writers to imply that it should be bound by laws of neither time nor causality.
So it sounds like C is a lot less explicit that C++ about a retroactive lack of requirements on executions that will encounter UB. Which language specifically in the ISO C standard, and what's the argument for this interpretation of it, assuming that's actually what compiler writers think but still choose not to make their compilers optimize away visible side-effects along paths that are already headed for UB.
(@supercat is notable for opinions that modern C and C++ aggressive optimization based on the assumption of no UB has missed the intent of the original authors of the standard. Especially when that includes things like signed integer overflow or comparing unrelated pointers which aren't a problem in asm on the machines we're compiling for. It's certainly not great, but promoting
int
variables in loops to pointer width is a fairly important optimization for 64-bit machines so there was obvious justification to start down this road which left modern C and C++ full of land-mines for programmers.)
In this question, I'm asking what the ISO C standard as written allows, either explicitly or per any commonly agreed-on interpretations. Especially whether that's even more permissive than what compilers actually did in my test cases. I'm not arguing whether or not real compilers should optimize even more; it seems reasonable not to.
volatile
access that was followed by a deref they could see was a null pointer, but none of them did it. – Wunmain
, and there's a division by0
or null deref there, the C++ standard allows the entire program to compile to one illegal instruction, or tomain(){return 42;}
This is normally implausible to prove, e.g. real progs tend to call library functions that real compilers don't make such strong assumptions about. (Like that they definitely return.) But in theory ISO C++ allows that. The question is whether ISO C also allows that, and if so what wording in the standard allows that. – Wunsink
itself changes the flow of control, such that the later UB is not reached? You mention it in terms of trapping; I was thinking of something like a memory-mapped hardware register that resets the machine and effectively starts the program over. I am not sure if they are fundamentally different. – Ministerialvolatile
globals if it happens, let alone private locals. e.g. I wondered about a case where writing avolatile
object modified the machine state including the program counter, e.g. creating a loop between the twovolatile
writes in the function. (That's why the dead-store elim test-case has two volatile stores, but I didn't end up writing about that because it seemed too much of a tangent.) – Wun-ffreestanding
(although that option doesn't change anything for the test-case.) Still, restarting would mean existing non-volatile
state is blown away, so dead-store elim around it would still be allowed. But visible side-effects would have to stay. So yeah, a compiler that wants to support reset via MMIO stores shouldn't optimize awayvolatile
accesses in paths that would otherwise lead to UB. Good point. – Wunvolatile int x
, meaning that the store doesn't happen if division traps, even though it would in the abstract machine with the right_Bool
arg. (Clang does the division after the conditional store.) – Wunbool
but didn't on all paths and that testing the return value of that function happened after writing some output to a file to determine whether more output should be written. The compiler (gcc) decided to not only optimize away parts of thebool
function, but to not generate any file output at all. Addingreturn true;
Fixed everything. – Lustfulvoid
function, and the I/O that didn't happen was in the caller, after the call? So you had already "used" the return value by assigning it to abool
variable, copying the retval. (Or if C++, falling off the end of a non-void
function is itself UB, and compilers will omit theret
instruction in that case, letting execution fall off into whatever code is next in memory.) Unless you mean something different, those are both cases of weird behaviour after the abstract machine reaches UB. (Thanks for sharing the details.) – Wunbool
returning function without returning a value (where it should have returnedtrue
) and that caused code like "write_stuff_to_file(); if (bool_returning_function()) { write_more_stuff(); }" to not even write the initial stuff to the file. As if "write_stuff_to_file();" had not even been called - even though that should have written something to the file regardless of what "bool_returning_function()" returned. – Lustfulvolatile
accesses, because I/O in practice is done by opaque library functions that compilers can't assume will even return. Which is why I find your report interesting, since it contradicts expectations from n3128. Could have been a compiler bug, or making valid(?) assumptions about I/O functions not faulting. – Wunif (c) x = 1;
into load/cmov/store like MSVC did until 19.36 for the example function in my question, because ifc
is false, another thread could be writingx
without any UB happening. – Wunx++
onint x
which is subject to the as-if rule, not guaranteed to ever actually happen in the asm. – Wunprintf
andfflush
, so the output is visible even when the division faults in the non-inlined version. (In clang'smain
, it doesn't do the division at all. In the GCC executor you linked, it is printing before SIGFPE, so whatever code different from clang is still executing that way.) – Wun