Why is optimization forbidden if a C compiler cannot prove lack of UB?
Asked Answered
S

3

24

If a C program has undefined behavior, anything can happen. Therefore compilers may assume that any given program does not contain UB. So, suppose our program contains the following:

x += 5;
/* Do something else without x in the meantime. */ 
x += 7;

Of course, this can be optimized to

/* Do something without x. */
x += 12;

or similarly the other way.

If x has type unsigned int then there is no possibility of UB in the above program. On the other hand, if x has type signed int then there is a chance of overflow and hence UB. Since the compiler may assume that our program contains no UB, we can do the same optimization above. In fact, in this case the compiler can even assume that x - 12 <= MAX_INT.

However, this seems to contradict Jens Gustedt's famous "Modern C" (pg 42):

But such an optimization can also be forbidden because the compiler can’t prove that a certain operation will not force program termination. In our example, much depends on the type of x. If the current value of x could be close to the upper limit of the type, the innocent-looking operation x += 7 may produce an overflow. Such overflows are handled differently according to the type. As we have seen, overflow of an unsigned type is not a problem, and the result of the condensed operation will always be consistent with the two separate ones. For other types, such as signed integer types (signed) and floating-point types (double), an overflow may raise an exception and terminate the program. In that case, the optimization cannot be performed.

(Emphasis mine). If the compiler can (and does) assume our program has no UB, why can this optimization not be performed?

[1]: Jens Gustedt. Modern C. Manning, 2019, 9781617295812. hal-02383654

Scuppernong answered 18/9, 2023 at 22:45 Comment(15)
"If x has type unsigned int then there is no possibility of UB in the above program." — false if you allow arbitrary code in between, one can also get an array out-of-bounds there and UB in theoretical terms, overwrite of x in practical terms. However you cannot really talk about "practical terms" if you're talking about "UB not happening".Blen
Good question. I'm not sure what Jens Gustedt meant here either. For the record, it's in chapter 5 "Basic values and data", section 5.1.4 "Optimization", currently available on manning.com/books/modern-c if anyone else would like to check the full context.Blen
Sometimes books are wrong.Linell
Surely if x += 5 would cause undefined behaviour it doesn't matter what the compiler does next?Jobie
@Linell Of course! However since he appears to be one of the editors of the C standard, I didn't want to be too presumptuous in my understanding. :-)Scuppernong
There is insufficient context here. Is the only criterion for optimization conformance with the C standard, or is the compiler writer concerned about additional features? It may be Jens Gustedt is concerned that we want x += 5; to raise an exception if it overflows and thus terminate the program before doing the “something else,” even though this is not required by the C standard alone.Kerf
@EricPostpischil After this discussion the author provides the takeaway: "Type determines optimization opportunities." So, presumably, the implication is that a compiler can only make this optimization if x is unsigned and not signed. But my contention is that this distinction is irrelevant since the compiler can safely assume our program has no UB.Scuppernong
Of course, the larger message here is absolutely correct (i.e., the types we use to declare data in our program affect what kind of optimizations the compiler can make). Just this particular example isn't clear to me.Scuppernong
I think he's simply wrong. The compiler doesn't have to prove that program termination doesn't happen, because it's allowed to assume it. This often enables optimizations, like the one you show, that would not be possible without the assumption.Fsh
@Joshua: That is not relevant to my comment. You are reasoning about what the compiler may do given the rules of the C standard. That reasoning is not determinative if the compiler writer has other goals. This is not an appropriate question for Stack Overflow without sufficient context from the book provided directly in the question, not links to external material, especially material only available at a cost.Kerf
@EricPostpischil Sorry for the misunderstanding - to answer your question "Is the only criterion for optimization conformance with the C standard, or is the compiler writer concerned about additional features?", I believe it is the former. There is no suggestion that we want x += 5; to "raise an exception" in this case. The point of his example was to demonstrate what a (spec compliant) C compiler can and cannot do with our code.Scuppernong
By the way the book is available for free at the link provided at the end of the question. A moderator seems to have added the link to the publisher website but it is also available for free (legally) on HAL.Scuppernong
@Scuppernong With "In fact, in this case the compiler can even assume that x - 12 <= MAX_INT.", did you mean x <= MAX_INT - 12?Plumber
If you generalize this to 2 signed values (may be constant) than you can get something like x += 7; x += -3;. In this case, it may be that x += 7 would cause overflow, but x += 4 won't. I'm not sure if it affects the rules, just pointing out that addition of signed int is not associative in terms of causing overflow.Kalmia
Your first couple of sentences have the implication backwards; it should be: "Compilers may assume that any given program does not contain UB. Therefore if a C program has undefined behavior, anything can happen." due to the principle of explosion.Imposition
S
28

TL:DR: You're right, such optimization is not forbidden for signed int, only for float/double, and not just because of exceptions in that case.

One reason for things to be UB is that some obscure machine could raise an exception. But hitting UB is not guaranteed to raise an exception on all machines (unless you're compiling with gcc -fsanitize=undefined, for types of UB it or clang can reliably detect, or gcc -ftrapv to define the behaviour of signed int overflow as trapping). When a compiler treats UB as an optimization opportunity by assuming something won't happen, things are very different: UB is not a synonym for "fault" or "trap".

There are operations that might trap on normal CPUs, such as deref of unknown pointers, and integer division on some ISAs (such as x86 but not ARM). These would work as examples if you're looking for an operation that compilers potentially need to be careful of to avoid introducing an exception before side-effects that need to happen, or before a branch that might cause the abstract machine not to reach an undefined operation at all.


Signed integer overflow is UB so anything can happen at any point in the execution of a program where that happens (in C++, and according to some interpretations of the C standard), even when compiling for a machine with non-trapping add instructions (like all modern ISAs).

Some implementations might define the behaviour as raising an exception. If they define where that exception gets raised, then it would prevent the optimization; each addition needs to happen as written so it can trap there if that operation in the abstract machine overflows. But that would be defining the behaviour, the exact opposite of UB meaning absolutely zero guarantees about what your program actually does.

In C, if n3128 is accepted1, any visible side-effects sequenced before the abstract machine encounters UB must happen. But after UB is encountered, literally anything is allowed, including doing I/O. UB doesn't have to fault and stop execution. If a compiler was compiling the += operations with MIPS signed-overflow-trapping add instructions instead of the usual addu, it would be legal to optimize to x+=12 after the intervening code even if it contained I/O operations or other visible side-effects (like a volatile read or write). Even if the x+=5 caused signed overflow UB in the abstract machine, it's fine if the actual behaviour is to trap later (for example when the abstract machine would have done the x+=7 part). As long as it's at or after the abstract machine hit UB, literally anything is allowed. (In C++, it would also be legal to do the possibly-trapping addi $s0, $s0, 12 even before a printf or something, due to the explicit lack of requirements on behaviour even before the first undefined operation, for an execution that does encounter UB. But only if the compiler can prove that printf definitely returns, so in practice this optimization can usually only happen for volatile accesses if at all. But even without retroactive effects, we can either do x+=5 before and x+=7 after, or x+=12 after. Not faulting is valid behaviour for signed overflow, but the abstract machine has done an undefined operation so anything that happens later, like printing and then trapping, or just having the addition wrap, is allowed.)

The compiler just has to avoid introducing exceptions on paths of execution that shouldn't have any. (Which isn't a problem for integer addition on mainstream ISAs; most don't even have a trapping signed-add instruction, and compilers targeting MIPS use addu even for signed math so they can optimize freely, and because historically programmers didn't want trapping on int math.)

Footnote 1: C vs. C++, and whether C UB should be "concrete" or "abstract"

See Does undefined behaviour retroactively mean that earlier visible side-effects aren't guaranteed? and n3128: Taming the Demons -- Undefined Behavior and Partial Program Correctness, a proposal to have ISO C clearly specify that visible side-effects (like I/O) before the abstract machine reaches an undefined operation must still happen. (Common interpretations of the current ISO C standard treat UB like in C++, where the C++ standard explicitly allows "breaking" stuff along an inevitable path to UB.)


Practical example of compilers doing this

Both int and unsigned can do this optimization, it's only FP types that can't, but that's (also) because of rounding even if you compile with gcc -fno-trapping-math (an FP math option). See it in action on Godbolt with GCC13 and Clang 16

int sink;    // volatile int sink doesn't make a difference
int foo_signed(int x) {
    x += 5;
    sink = 1;
    x += 7;
    return x;
}
// also unsigned and float versions
# GCC -O3 -fno-trapping-math
foo_signed:                               # input in EDI, retval in EAX
        mov     DWORD PTR sink[rip], 1
        lea     eax, [rdi+12]             # x86 can use LEA as a copy-and-add
        ret
foo_unsigned:
        mov     DWORD PTR sink[rip], 1
        lea     eax, [rdi+12]
        ret
foo_float:                    # first arg and retval in XMM0
        addss   xmm0, DWORD PTR .LC0[rip]     # add Scalar Single-precision
        mov     DWORD PTR sink[rip], 1
        addss   xmm0, DWORD PTR .LC1[rip]     # two separate 5.0f and 7.0f adds
        ret

Earlier version of an answer, making some different points for the same conclusion

You're correct; assuming x is a local variable so literally nothing can possibly use the x += 5 result, it's safe to optimize x+=5; ... ; x+=7 to x+=12 for both signed and unsigned integer types.

Unsigned integer math is of course fine.

Signed integer math has to produce the right result in any case where the abstract machine doesn't encounter UB. x+=12 does that. There's no guarantee that signed overflow raises an exception at any specific point in your program, that's the whole point of optimization in modern C based on the assumption that undefined behaviour won't happen. For an execution that will encounter UB, literally anything can happen anywhere before or after that point (but see footnote 1 above re: "breaking" stuff along an inevitable path to UB).

This optimization would be safe even for turning x-=5; x+=7 into x+=2, where the abstract machine could wrap twice (encountering UB) but the asm doesn't wrap, since "happens to work" is an allowed behaviour, and common in practice. (Even using MIPS trapping add instructions, for example.)

If you use compiler options like gcc -fwrapv, that defines the behaviour of signed integer math to be 2's complement wrapping, removing UB and making the situation identical to unsigned.

GCC does sometimes miss optimizations with signed integer math because of some reluctance for GCC internals to create signed overflow in a temporary in the asm where none would have existed in the abstract machine. This is a missed optimization when compiling for a machine that allows non-trapping integer math (i.e. all modern ISAs.) For example, GCC will optimize a+b+c+d+e+f into (a+b)+(c+d)+(e+f) for unsigned int but not for signed int without -fwrapv. Clang does for both for AArch64 and RISC-V, although chooses not to for x86. (Godbolt). Again, this is a missed optimization due to GCC being over-cautious for some unknown reason; it's perfectly valid. 2's complement signed math is identical to unsigned binary math, so is associative; the final result will be correct in cases where the optimized computation wrapped back and forth but the abstract machine didn't, for example.

Signed overflow UB is only a thing in the abstract machine, not asm; most mainstream ISAs don't even have integer addition instructions that trap on overflow. (MIPS does, but compilers don't use them for int math, so they can do optimizations that produce values that didn't exist in the abstract machine.)

Semi related: Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)? (answers show that compilers do optimize to three multiplies for integer math, even for signed int.)


FP exceptions aren't the only issue for float/double

Floating-point math can't do this optimization because it could give a different result in non-overflowing cases, due to different rounding. Two smaller numbers could both round down, vs. one larger number overcoming the threshold.

e.g. for a number large enough that the nearest representable double values are 16 apart from each other, 8 would get half-way and round to nearest-even (assuming the default rounding mode). But any less, like 7 or 5, will always round back down; x + 7 == x, so both the 5 and the 7 would be lost, but x+12 all in one go would get over the hump to the next representable float or double, producing x+16.

(The magnitude of 1 unit-in-the-last-place (of the mantissa) depends on the exponent of a float/double. For large enough FP values, it's 1.0. For even larger values, e.g. double from 253 to 254 only even numbers are representable, and so on with larger exponents.)

If you compile with GCC's buggy default of -ftrapping-math, it will try to respect FP exception semantics. It doesn't reliably generate 2 FP exceptions if overflow happens twice, so it might not care about that.

But yes, with #pragma STDC FENV_ACCESS ON, every separate FP operation is supposed to have an observable effect. (https://en.cppreference.com/w/c/numeric/fenv). But if you don't call fegetexcept to actually observe FP exception flags between two operations, they could in theory still be optimized if we can prove that rounding would be the same, since I don't think even ISO C's FENV_ACCESS ON is supposed to support actually running exception / signal handlers for each trapping operation.

For example two identity operations like x *= 1.0; could be collapsed to one, which will raise exceptions on NaN. Or x *= 2; x *= 2; could be optimized to x *= 4; because multiplying by exact powers of 2 doesn't change the mantissa and thus doesn't cause rounding. It doesn't matter if the first or second multiply overflowed to +-Inf, that will still be the final result. (Unless Inf * 2 raises exception flags that an overflowing multiply wouldn't have already raised? I don't think so.)

And they're both changing the exponent in the same direction, unlike x *= 4; x *= 0.5; which could overflow to +Inf for large numbers, so isn't equivalent to x *= 2. Also, if x *= 0.5; x *= 0.5; produces subnormal results, it actually could round twice when right-shifting the mantissa; IEEE FP has gradual underflow (subnormals with a special encoding for the exponent) but non-gradual overflow to +Inf.

Figuring out whether it's safe to optimize x * 0.5 * 0.5 to x *= 0.25 is beyond the scope of this answer. GCC and clang don't optimize x *= 2.0f; x *= 2.0f; into x *= 4.0f; even with -fno-trapping-math, but I think that's a missed optimization.

Sinusoidal answered 18/9, 2023 at 23:40 Comment(51)
Re “But that would be defining the behaviour, the exact opposite of UB meaning absolutely zero guarantees”: “Undefined behavior” has a specific meaning in the C standard, and it is not “absolutely” free rein. It is very definitely a qualified free rein; the standard is clear that “undefined behavior” means only that the C standard does not impose any requirements. It does not negate or nullify any requirements from sources outside the C standard.…Kerf
… You’ve acknowledged in this answer the compiler may define behavior beyond the standard, but it is not clear from your writing what you are saying about that with regard to this statement about undefined behavior.Kerf
@EricPostpischil: What I meant was that the optimizer doesn't have to care what happens for cases that involve UB. The quote in the question seems to be saying that signed-integer overflow UB means the program needs to produce an exception at a certain point in its execution (before or after the intervening code depending on the initial value of x). But that's of course not the case for UB, only for an implementation that defines int math as trapping on signed overflow, a bit like clang -fsanitize=something.Sinusoidal
I'm pretty sure my answer could make the same point with much less text, and that I ended up with at least 3 different versions of an answer all crammed into one :/Sinusoidal
@EricPostpischil: I rephrased that paragraph; I think that helps some (with avoiding the implication of "free rein" applying more broadly than I intended, not with the answer being too long and redundantly repeating its points :P)Sinusoidal
@EricPostpischil: It's a shame the Standard failed to articulate a principle articulated in the Rationale: the term UB was used in many situations where some implementations would behave usefully, but others wouldn't, and the standard waived jurisdiction over whether to behave usefully as a Quality of Implementation issue. If the normal answer to questions of the form "Would the Standard allow the compiler I'm writing to perform this nonsensical transformation" had been "It would probably allow garbage-quality implementations to do so", the language would be in a much better state.Cradlesong
//that's the whole point of C's concept of undefined behaviour.// Can you cite a primary source for that claim? The C99 Rationale states that one of the purposes was to avoid requiring that implementations diagnose certain error conditions, but hardly suggests that's the "whole point". Another purpose stated by the Rationale was to allow implementations to, as a form of "conforming language extension", define how they would process structures or cases over which the Standard waives jurisdiction (typically by behaving "in a documented manner characteristic of the environment").Cradlesong
"...UB in ISO C so anything can happen at any point in the execution of a program where that happens"... pretty sure this is a false yet common misconception. If you do int main() { printf("Hi"); *(void **)0 = 0; } the compiler cannot avoid the printing merely because it believes you will execute UB in the future. "Time travel" is only possible via the as-if rule, and side effects like I/O can't be reordered under that.Monotint
@user541686: It's true the GCC and clang don't avoid the call to printf in that case. Or to make it something simpler that definitely can't fail to return, an assignment to a volatile global. godbolt.org/z/sav53o4r7 I don't know if that's guaranteed or not; this path of execution will definitely encounter UB, so the compiler can assume this path of execution is unreachable, I thought. That would allow skipping code-gen for that whole path of execution. But maybe I'm wrong about that. The ISO C standard's description of UB doesn't talk about optimization.Sinusoidal
@PeterCordes: Optimization is beside the point. The important thing is the description of UB talks about executions (not about programs -- it's UB, not UP). The thing is, the compiler can't change the semantics of a program, and the semantics of an execution are defined by its side effects on its inputs (the as-if rule). Executions that include UB are undefined, so the compiler can do whatever it wants for those, but executions of the program until the final side-effect that occurs prior to UB are (by definition) well-defined, so the compiler can't alter those semantics.Monotint
@user541686: That makes sense. Absent some specific language that allows otherwise, yeah the default assumption should be that all visible side-effects before the abstract machine encounters UB must actually happen when the program executes. (I had though such language did exist, but didn't find any when I went looking; I think you're probably right that this was a common misconception.) That appears to be how GCC and Clang behave in practice. I should probably edit this answer, but probably by shrinking it, not adding yet more text :PSinusoidal
@user541686: Raymond Chen (devblogs.microsoft.com/oldnewthing/20140627-00/?p=633) quotes one of the standards as saying However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation). That's still in the current C++ draft standard (eel.is/c++draft/intro.abstract#5). It's not in the ISO C standard with that phrasing; and maybe not at all; if not, C++ is the source of the C misconceptionSinusoidal
@PeterCordes: Maybe, but I don't even think in C++ that sentence is saying what it sounds like on the surface. The precise meaning of the word "execution" is critical here. An "execution" is not merely an invocation (program + inputs); it also includes any observable behaviors -- because those include run-time interactions with the environment (like I/O), which affect the execution of the program (and they can even provide further inputs). So if your program has observable effects A, B, C in that order, then it has 3 possible executions: A, AB, and ABC. Let's say C has UB. (cont'd)Monotint
(cont'd) C cannot be executed until B has. And whether or not the program even continues past B in the first place is something that can only be known after B is executed. (For all you know, the program might block infinitely on I/O, or be terminated by the host, etc.) If the program never gets past B, then C hasn't even had a chance to occur, so there is no UB. Thus what the sentence is really saying is that it places no constraints anywhere on the execution path of ABC if C is undefined. That is... true! But it does place constraints on what the executions A and AB can do! (cont'd)Monotint
(cont'd) The implication of all these is the following subtle point: if the implementation has control over the effects of A and B, such that it can guarantee that there exists a C that would "undo" (or mess with) them, then it can ignore/mess with A and B however it wishes. For example: if B is a printf() call, the implementation can only mess with it if it can guarantee that the effect of printf will in fact not be observable to anyone. (e.g., perhaps because it can prove the effect at that point is merely writing to internal memory, vs. to the network or a printer.) (cont'd)Monotint
(cont'd) But that's a pretty damn high bar, because it requires having guarantees about the behaviors of & effects on the environment (the OS, hardware, etc.) that most implementations don't claim to have. (e.g., most implementations cannot guarantee printf won't write to the network, block indefinitely, possibly abort on I/O failure, or whatever else can happen besides "returning with externally unobservable effects".) Does that make sense?Monotint
@PeterCordes: Alternatively, here's a less pedantic way to phrase it. The standard is basically saying, "We'll execute every legal order you give us, but if you tell us to do something illegal, then we can do anything in response. In fact, you free us from any obligations we may still have in response to your past orders." Which is not the same as: "If you tell us you will provide an illegal order in the future, then we can do whatever we want in response to any orders you will place before then." (The crux is that you need to first "execute" UB for any of these disclaimers to kick in.)Monotint
@PeterCordes: Also, while this isn't proof, it's worth pondering what's happening here: godbolt.org/z/a6T15vGrq (Interestingly, Clang seems to optimize the code if you only read from that variable, but not if you write to it! I'm inclined to say that's a bug, since volatile loads can have side effects just like volatile stores. I'm guessing Clang is only modeling the memory values as side effects and forgetting to include potential trapping, etc.)Monotint
@user541686: I was playing around with godbolt.org/z/eGovKEoGz , with a store to a global variable inside an if(!p) null-pointer check that could be deleted due to a later deref. Even when the global variable isn't volatile, GCC, Clang, ICC, and MSVC weren't willing to delete the earlier branch, even compiling as C++ (-xc++), so I wasn't able to get them to change behaviour that would happen before the UB.Sinusoidal
It's hard to read the C++ standard any way other than giving permission for that, though, whether compilers choose to exercise it or not. The phrasing about executing the program (i.e. from the start) with that input clearly sidesteps any need to actually reach the eventual but inevitable UB before requirements on behaviour are lifted. Your point about printf maybe blocking or aborting is why I used assignment to volatile int sink; as a visible side-effect instead of a call to a non-inline library function. The UB must indeed be inevitable with the current inputs to the program.Sinusoidal
@user541686: Oh, I used printf in my answer as an example. I only used volatile assignment in my Godbolt link when attempting to get compilers to actually do it. I'm inclined to keep the printf in my answer as a possibly-incorrect over-simplification that makes the point more clearly. Especially for the part about optimizing so the possible trap is after instead of before a printf, that's definitely allowed even if the printf ends up not returning.Sinusoidal
@PeterCordes: thing is, if you're thinking of the trapping argument, volatiles can trap just the same way. The UB isn't inevitable with them any more than with printf. (See segfault handlers, guard pages for stack extension, writing to device mapped memory, etc.) As far as "reading the standard goes", what I would point out is that "input" in that sentence is not just argv[]; it's all the interactions that dynamically happen with the environment (which includes volatile accesses and side effects etc.). It cannot logically be otherwise, since those are semantically just as relevant as argv.Monotint
Let us continue this discussion in chat. (Edit: Ugh, I didn't mean to click this.)Monotint
@PeterCordes: but if you had an implementation where volatile memory access was genuinely unobservable, then yes, such volatile accesses could get optimized out.Monotint
@user541686: I think a volatile access that traps (or resets the machine or powers it off) would itself be UB. But you're right, my argument does depend on the volatile access not causing a jump to somewhere else. And yes, of course all volatile read results and other I/O ops are part of the "input". godbolt.org/z/ThdMfdddT shows if (p) { sink = 3; }else{ sink2 = 4; *p = 4;} - with non-volatile sink2, clang will optimize away the branch by assuming that p is non-NULL. But with volatile sink2, it makes the volatile side-effect still happen, like you're arguing C++ requiresSinusoidal
@PeterCordes: Yup.Monotint
@user541686: godbolt.org/z/ndEbxhha7 shows that compilers do dead-store elimination of non-volatile stores across volatile accesses, which is only safe if volatile can be assumed not to trap, or that it would be UB if they do. (e.g. if the user ran it under a debugger and set a watchpoint on the volatile's address, and modified things behind the compiler's back, including the program counter. Possible on x86-64 for example but not supported by GCC/Clang unless you compile with -O0).Sinusoidal
Interestingly, MSVC doesn't do dead-store elimination (even with int*p replaced with a int global godbolt.org/z/zMa1Kx99K), but it still reorders the plain stores with the volatile ones. So that object would have the "wrong" value if a volatile write trapped to other code. Anyway, this proves that compilers do assume execution comes out the other side of a volatile access in non-UB cases, so that's not why they're reluctant to optimize them out ahead of UB in paths that definitely reach UB.Sinusoidal
@PeterCordes: I think you're misreading what's happening there. What's funny there isn't how compilers treat UB, but about how they treat volatile automatic locals. I've seen msvc in past versions optimize away volatile automatic locals even without any UB. Volatile on an automatic local is practically meaningless as far as I can tell, because there isn't really anything you can say about them - they don't have a fixed location (the implementation can put them where ever it wants), so it can just put them in a place that it (pretends to?) guarantee they can't trap. Statics are different.Monotint
@user541686: GCC, Clang, and MSVC don't think globals are different: godbolt.org/z/deafG1hT6 shows identical code-gen except for the addressing modes in the stores to v1 and v2. Automatic volatile variables are usable for watching with a debugger, or for taming the optimizer in a microbenchmark if you don't have GNU C asm volatile("" : "r"(var)) to force the compiler to materialize a value in a register without storing anywhere. Or for hand-rolled atomics between threads, but that would imply taking their address escapes the function.Sinusoidal
@PeterCordes I was referring to volatile globals. My comment didn't have enough character left for me to type volatile again.Monotint
@user541686: I assumed that's what you meant. Did you look at my Godbolt link? I just moved the volatile int v1, v2; declaration from inside the function to outside, so we have volatile globals and compilers are still doing dead-store elimination around those accesses.Sinusoidal
@PeterCordes: as for debuggers, that kind of misses my point. Debuggers aren't part of the implementation. The OS and CPU are. (BTW sorry I'm on my phone so replies are sporadic and godbolt isn't phone friendly, I'll look again later)Monotint
@user541686: Understood about Godbolt links on a phone, especially with 3 compiler panes open. I only mentioned debuggers as a way to simulate a trapping volatile write even on a variable the compiler knew was backed by normal RAM (like a local var). But since behaviour is the same with global vars (I should have tried that myself before your prodding), a linker script could give a global var an address that's backed by an MMIO register or an unmapped page. Yet GCC/clang still do dead-store elim around it, even with -ffreestanding -mcmodel=kernel.Sinusoidal
@user541686: I posted a new question (Does undefined behaviour retroactively mean that earlier visible side-effects aren't guaranteed?) based on this discussion. It's asking about C; I found enough previous questions about C++ that I'm convinced that my reading of the standard's wording is the "correct" and widely-agreed reading. But for ISO C it's less clear what gives permission to not produce older visible side-effects in cases where a compiler has proved that UB is inevitable. (Mainstream compilers seem not to do that, regarldess.)Sinusoidal
@PeterCordes: Okay, so coming back to your replies -- I think you just proved Clang and GCC are buggy in your example, because the programmer has every right to expect a guard-page handler that traps on v2 to see the most recent writes to global. And notice there's no UB needed for this (mis?)compilation to happen; the optimization you're seeing (whether correct or not) relates to an aspect of volatile handling that is completely unrelated to UB.Monotint
@user541686: global isn't volatile sig_atomic_t, so no, a trap / signal handler can't expect to see any particular value in it. The point of that test case with volatile v1 and v2 is to prove that compilers are willing to reorder non-volatile accesses around volatile accesses. I earlier thought I was proving something about compilers assuming that accesses to volatile globals don't trap (since they're definitely in statically allocated space), but I was forgetting the lack of guarantees for anything except volatile sig_atomic_t.Sinusoidal
@PeterCordes: Oh interesting, cool, I wasn't aware of sig_atomic_t. But yeah, this isn't UB related.Monotint
@user541686: I know it's not UB-related. The question of whether compilers assume that execution comes out the other side of a volatile access was part of the reasoning for something we discussed earlier. But see also Nate's comment on the new question: he points out the possibility of a volatile write restarting the whole program or something (like triggering a reset). That discards non-volatile program state so NV reordering wouldn't matter.Sinusoidal
@PeterCordes: Note I never claimed compilers always model volatile as trapping. (I thought they should, but I never claimed they always correctly did so.) In fact, quite the opposite -- the very first time I used the word trap in this discussion, I said Clang sometimes seems to model it properly on stores, but not loads. i.e. I fully expect bugs in this area, given it isn't a well-trodden area of the language. (This is what happened with restrict, where Rust relied on it aggressively and exposed longstanding bugs.)Monotint
@user541686: forgot to mention, user17732522 linked open-std.org/jtc1/sc22/wg14/www/docs/n3128.pdf in comments on my new question. That basically answers it: the "concrete" interpretation of UB is what you're advocating for, and what that paper proposes the ISO C standard should be changed to more explicitly require. There are rare cases which violate that with current C compilers, all involving volatile since non-inline function calls to I/O functions don't in practice reorder, since compilers don't assume they'll return and thus can't prove UB will be reached.Sinusoidal
@PeterCordes: Thanks, yeah I saw that; I hadn't had a chance to finish a reply. To touch on that & your point about C++ vs. C -- I see no logically consistent way to allow "time-traveling" of UB past an observable effect without further changes to either language. Even if both standards explicitly said this was allowed, it still couldn't be done in general, because it would be inconsistent with the rest of the text. Because UB is about an "execution" of a program, and you don't even have an execution until you know the results of all the observable behaviors.Monotint
@PeterCordes: (cont'd) To me, claiming otherwise is equivalent to claiming that you can know the result of the execution of int main() { return getchar(); } before the user has provided any input on stdin. Like, even if the committee went out of its way tomorrow to add such a sentence to each standard claiming it's permitted, that wouldn't magically make it possible or coherent with the rest of the standard. It would just be a nonsensical assertion as far as I can see, unless you have a time machine.Monotint
@PeterCordes: Also, I'm impressed you managed to get such a relevant link so quickly to a question like that. If I'd asked it it would've gotten shut down and downvoted to oblivion immediately, with everybody telling me time travel is obviously possible and nasal demons etc.Monotint
@user541686: I did have to fend of a couple early "of course it's retroactively possible" comments, but I made sure to babysit my question for the critical first half hour to so my replies were there to clarify why it's not obvious and that I was asking for an answer justifying it in terms of wording in the ISO C standard. Probably editing to remove the [c++] tag helped. But yeah, I was also pleasantly surprised to get such a useful link so soon.Sinusoidal
@user541686: you're still missing the point about how it can be possible. You don't have to know the result of input if UB-reaching doesn't depend on it. e.g. if it only depends on the visible side-effect (such as a volatile access) eventually finishing, UB can provably happen, which at least in C++ justifies removal of previous stuff in that path of execution, if compilers are willing to do so. If you still disagree, post an answer on Does undefined behaviour retroactively mean that earlier visible side-effects aren't guaranteed?Sinusoidal
@PeterCordes: I think we're either agreeing or speaking past each other. Refer to my earlier comment about when I believe this is possible. It's similar to what you seem to be trying to say. The crucial bit here is that the compiler not only needs to know what the side effect does do, but also needs to know what it doesn't do. It needs to know the "observable" side effect is actually unobservable. (cont'd)Monotint
@PeterCordes: (cont'd) For example, if (say) instead of getchar(); UB(); you had sleep(100s); UB();, the user would be entirely justified in believing that program would sleep for 100s before any UB occurred, and thus (say) decide to write down the prior output during that time. OTOH, if the compiler somehow knew that the program results wouldn't be flushed anywhere for the next 200s anyway, then it could just optimize the sleep away entirely. Again, I say this this because the sleep side-effect has not "executed" yet, hence the UB (which is a property of executions) has not executed yet.Monotint
@PeterCordes: There's admittedly a bit of weasel-wording I'm brushing under the rug here, which is that what is considered "observable" is something that's... left up to the implementation to define, basically. For example, the standard says printf is observable as far as the standard is concerned, but leaves it up to the implementation to actually define what happens to the output. So if the implementation just routes it to /dev/null... then it could reasonably claim the behavior is nothing, i.e. unobservable. (cont'd)Monotint
(cont'd) the upshot of all this is that, in a very pedantic sense, I think a compiler could also say something like: "it's up to me to define what volatile means on my implementation, and I hereby declare that it has no observable effect" despite the fact that your debugger, hardware, etc. might beg to differ. Critically, though, the justification wouldn't be that UB can time-travel past observable side effects, but rather the fact that the implementation can dictate the meanings of those effects.Monotint
(It kind of becomes a philosophical question once you dig deep enough into it, I think.) As to posting an answer there... I've engaged in this argument far too much on the internet and wasted far too much on it in the past to repeat it yet again with a large crowd. The fact that there's a recent paper on clarifying it is as much closure as I can see myself ever getting on this, so I'd rather leave it at that and hope the committees will address it in the future!Monotint
M
3

IMO, you are right. UB is UB, there is no obligation to raise an exception and terminate the program so the optimization should be allowed.

Anyway, if the compiler is set to cause a trap on signed overflow, the trap must be honored where it occurs and the merging of the additions is not possible.

Note that this is not a C requirement (UB can be anything), but a compiler requirement (if UB is set to be a trap - which forces a "DB").

Mitigate answered 19/9, 2023 at 19:27 Comment(0)
C
0

A compiler is allowed to perform optimizations which will not observably affect the behavior of any defined program execution. An unfortunate corollary of this is that the only way to allow useful optimizations that might observably affect the behavior of some program executions is to categorize any executions as invoking undefined behavior. Compilers writers' crazy fascination with Undefined Behavior stems from the unwillingness of standards writers to specify specific ways in which compilers may behave in ways inconsistent with processing the individual steps of a program sequentially, in cases where program execution would be defined within the bounds allowed by such deviation.

If a compiler is targeting a language or platform which specifies how it will behave behavior in more circumstances than mandated by the Standard, a compiler may perform optimizations which would not be valid in the absence of such guarantees. If, for example, a compiler given a construct like:

extern int x[],y[];
int i,*p;
...
if (p+i==x+4)
  p[i] = 1;

and its output language guarantees that operations which dereference pointers will be processed in a manner agnostic to how they were computed, it could transform the construct p[i] into x[4] even in situations where the Standard might define the behavior of the former but not the latter.

Such optimizations may cease to be valid if the downstream processing does not handle all of the expected cases in the manner relied upon by the upstream optimization. In the above construct, for example, the optimization would be unsound if downstream code were to assume that because the address of x[4] is computed by adding an integer displacement to the base address of x, dereferencing it cannot possibly access the storage at y[0] even though the Standard would specify program behavior as doing precisely that if x had four elements, y immediately followed x in memory, and p had been formed by taking the address of y.

Note that either of the above described optimizations would have been legitimate in isolation, even though the combination is not. An upstream pass that knows the downstream pass won't perform certain transforms may legitimately perform optimizations that would be unsound without such knowledge. The Standard relies upon implementations to recognize what combinations would be invalid, and devise whatever means would be most convenient for avoiding them.

Cradlesong answered 19/9, 2023 at 17:59 Comment(15)
C++ says UB, so any behavior is allowed and "the optimization cannot be performed" is wrong according to the standard. In fact, the book discusses the case such that UB is imposed to be program termination.Mitigate
@YvesDaoust: If x has length 4, computation of the pointer x+4 is defined behavior, and while I"m not sure about the C++ Standard, the C Standard expressly defines the behavior of an equality comparison between a "just past" pointer and a pointer to the immediately following object. What UB are you talking about?Cradlesong
The OP's example is integer overflow.Mitigate
@YvesDaoust: The C Standard uses the term "Undefined Behavior" as a catch-all for many purposes, including constructs which most implementations were expected to process "in a documented manner characteristic of the environment", except when doing otherwise would offer a clear benefit to their customers; the C++ inherits such usage from the C Standard in many cases. The main point with my answer was that a transformation may be valid if downstream processing guarantees more corner-cases behaviors than the Standard would mandate, but invalid when the downstream process doesn't do so.Cradlesong
"downstream processing guarantees more corner-cases behavior": I have no idea what you mean.Mitigate
@YvesDaoust: Translation and optimization usually take place in multiple phases. If some particular phase of processing guarantees that signed integer addition and subtraction will use truncating two's-complement behavior in all cases, a preceding phase may be able to consolidate the sequence of events x+=100; x+=a; x+=b; x+=c; x-=100; into x+=(a+b+c); without regard for whether combinations of values that would not have caused integer overflow in the code as written might yield integer overflow in the revised version. If the later phase might do something weird in case of overflow, ...Cradlesong
...then such a transform would not be valid.,since it could fail in cases where a+b is between INT_MIN-100 and INT_MIN-1, and a+b+c would not be less than INT_MIN.Cradlesong
I disagree with this. C++ will allow any transformation. But if you set the compiler to stop on overflow, the compiled code must stop on the first overflow.Mitigate
@YvesDaoust Why are you mentioning C++ here?Pris
@YvesDaoust: The Standard does not specify any means of setting compilers to stop on overflow. If an implementation documents that it will stop on overflow even if the result of a computation is unused, then an implementation should behave as documented, though the Standard would have nothing to do with that. If an implementation documents that it will stop before a program can produce any output inconsistent with mathematically correct computations, but that it might produce mathematically correct output even in cases where temporary results would exceed the range of int, ...Cradlesong
...then many useful optimizations could yield behavior inconsistent with immediate overflow trapping but still satisfy the weaker requirement. For example, replacing a*30/15 with a*2 and only overflow-checking the latter multiplication would cause the calculation not to trap in cases where a is between INT_MAX/30 and INT_MAX/2, but to yield mathematically correct results instead which would likely be at least as useful.Cradlesong
Mh, you don't want to understand what I say, presumably. I made the distinction between the standard and the compiler settings.Mitigate
@YvesDaoust: A compiler setting would need to preclude optimizations that would cause behavior contrary to the compiler's documentation. If a compiler's documentation were to specify that a compiler may apply specific optimizing transforms that may yield behavior that is observably inconsistent with what the programmer wrote, but still consistent with common application requirements, then the settings would not preclude such optimizations.Cradlesong
If the setting is "trap on overflow", the behavior will be "trap on overflow". No need for more literature.Mitigate
@YvesDaoust: An implementation might usefully support synchronous traps on overflow or asynchronous traps on overflow, and the phrase "trap on overflow" doesn't necessarily imply either meaning. Further, on some implementations the fastest way of processing e.g. int1 = (int2*int3)/int4; may be to behave in a fashion equivalent to int1 = ((long long)int2*int3)/int4;. I don't think it's clear that "trap on overflow" would imply a trap in all cases where int2*int3 exceeds the range of int, including those where the alternative calculation wouldn't overflow.Cradlesong

© 2022 - 2025 — McMap. All rights reserved.