Optimizing away a "while(1);" in C++0x
Asked Answered
C

9

169

Updated, see below!

I have heard and read that C++0x allows an compiler to print "Hello" for the following snippet

#include <iostream>

int main() {
  while(1) 
    ;
  std::cout << "Hello" << std::endl;
}

It apparently has something to do with threads and optimization capabilities. It looks to me that this can surprise many people though.

Does someone have a good explanation of why this was necessary to allow? For reference, the most recent C++0x draft says at 6.5/5

A loop that, outside of the for-init-statement in the case of a for statement,

  • makes no calls to library I/O functions, and
  • does not access or modify volatile objects, and
  • performs no synchronization operations (1.10) or atomic operations (Clause 29)

may be assumed by the implementation to terminate. [ Note: This is intended to allow compiler transfor- mations, such as removal of empty loops, even when termination cannot be proven. — end note ]

Edit:

This insightful article says about that Standards text

Unfortunately, the words "undefined behavior" are not used. However, anytime the standard says "the compiler may assume P," it is implied that a program which has the property not-P has undefined semantics.

Is that correct, and is the compiler allowed to print "Bye" for the above program?


There is an even more insightful thread here, which is about an analogous change to C, started off by the Guy done the above linked article. Among other useful facts, they present a solution that seems to also apply to C++0x (Update: This won't work anymore with n3225 - see below!)

endless:
  goto endless;

A compiler is not allowed to optimize that away, it seems, because it's not a loop, but a jump. Another guy summarizes the proposed change in C++0x and C201X

By writing a loop, the programmer is asserting either that the loop does something with visible behavior (performs I/O, accesses volatile objects, or performs synchronization or atomic operations), or that it eventually terminates. If I violate that assumption by writing an infinite loop with no side effects, I am lying to the compiler, and my program's behavior is undefined. (If I'm lucky, the compiler might warn me about it.) The language doesn't provide (no longer provides?) a way to express an infinite loop without visible behavior.


Update on 3.1.2011 with n3225: Committee moved the text to 1.10/24 and say

The implementation may assume that any thread will eventually do one of the following:

  • terminate,
  • make a call to a library I/O function,
  • access or modify a volatile object, or
  • perform a synchronization operation or an atomic operation.

The goto trick will not work anymore!

Canard answered 28/8, 2010 at 21:32 Comment(19)
Auto-parallelization maybe? I wish I knew more about that… but eliminating that loop is certainly equivalent to executing it on a parallel thread that never reports back. On the other hand, a thread that does generate information for the caller would be imbued with some kind of synchronization. And a thread with proper side effects couldn't be parallelized.Bertie
Will in the general case that loop would be a programming error.Espouse
while(1) { MyMysteriousFunction(); } must be independently compilable without knowing the definition of that mysterious function, right? So how can we determine if it makes calls to any library I/O functions? In other words: surely that first bullet could be phrased makes no calls to functions.Seventeenth
@Daniel: If it has access to the function's definition, it can prove a lot of things. There is such a thing as interprocedural optimization.Bertie
The section you have quoted gives an explanation for why it's allowed. Did you not understand it, disagree with it, or did it not satisfy you in some other way?Fewness
@Philip: loops which do absolutely nothing, yet don't have an easy termination case, are uncommon and not a major issue for optimization.Bertie
@Philip i do not understand it. What are those compiler transformations that this optimization makes easier? And why was it only for C++0x that this is allowed? Now I'm a big threading noob, so I've no clue about that.Canard
Right now, in C++03, is a compiler allowed to change int x = 1; for(int i = 0; i < 10; ++i) do_something(&i); x++; into for(int i = 0; i < 10; ++i) do_something(&i); int x = 2;? Or possibly the other way, with x being initialized to 2 before the loop. It can tell do_something doesn't care about the value of x, so its a perfectly safe optimization, if do_something doesn't cause the value of i to change such that you end up in an infinite loop.Jacinthe
So does this mean that main() { start_daemon_thread(); while(1) { sleep(1000); } } might just exit immediately instead of running my daemon in a background thread?Arrive
"This insightful article" assumes that a specific behavior is Undefined Behavior merely because there is no explicit, defined behavior. That's an incorrect assumption. In general, when the standard leaves open a finite number of behaviors, an implemenation must choose any of those (Unspecified behavior). This need not be deterministic. Whether a do-nothing loop terminates is arguably a boolean choice; either it does or it does not. Doing something else isn't allowed.Fleischer
@Gabe. I doubt it. What the optimization means is that anything that happens after your while(1) loop could be moved to before the loop by the compiler. The while(1) still executes, it just may execute before or after other instructions. The real question is what happens if you throw exit(EXIT_SUCCESS) after the loop.Lancewood
@kts no it means the compiler can assume the loop terminates. If a loop that has no side effect terminates you cannot notice how many iterations it had, apart of some delay in execution which is entirely specific to how performant your computer is. Under the as-if rule, the compiler is then allowed to optimize away the loop entirely. So i think, in the above comment, if sleep's definition does any I/O calls or if the implementation defines "sleep" as an I/O function itself, the loop in @Gabe's comment cannot be assumed to terminate.Canard
@kts: Code after the loop which neither affects nor is affected by the loop may be moved before it. Code after the loop which would affects or be affected by the loop must wait for the loop to terminate. So I would expect the throw to happen when the loop terminates, if it ever does.Ceballos
This is fascinating but is there a practical case of a compiler throwing away a loop that actually does something? Imagine filing a bug with a compiler team and getting a will-not-fix because the standard says it has to throw it away. Not likely, compilers will flow analyze and will treat unknowns as the bad case.Davidadavidde
Can I suggest reorgnizing this post to just contain C++11 (or C++14) quotes; the historical progression of the rule through drafts is not crucialBossism
@Gabe: No. sleep() expands to a system call so it won't optimize it away.Thant
This seems dangerous/problematic. I don't want the compiler "optimizing away" my loops if they were intentional, even if they perhaps included a logical bug. I should be responsible for identifying and resolving my bug, and not reliant on the compiler to make assumptions about the behavior of my algorithm.Elastance
@h0r53: If e.g. a program defines a function like unsigned normalize_lsb(unsigned x) { while(!(x & 1)) x>>=1; return x;}, and a compiler can ascertain that the return value from some particular call is never used, should a compiler be required to generate code that hangs if x is zero? Being able to ignore the value of x in cases where the return value isn't used (and use the fact that x is ignored to in turn omit code whose sole purpose would be to compute it) is a useful optimization, on compilers whose behavior is limited to cleanly omitting or deferring such loops.Ceballos
THIS IS INSANE! When will optimisation freaks stop breaking innocent looking code? At least the C Standard only allows this assumption if the controlling expression is not a constant expression, so while(1); will still run forever in C.Protraction
Y
41

Does someone have a good explanation of why this was necessary to allow?

Yes, Hans Boehm provides a rationale for this in N1528: Why undefined behavior for infinite loops?, although this is WG14 document the rationale applies to C++ as well and the document refers to both WG14 and WG21:

As N1509 correctly points out, the current draft essentially gives undefined behavior to infinite loops in 6.8.5p6. A major issue for doing so is that it allows code to move across a potentially non-terminating loop. For example, assume we have the following loops, where count and count2 are global variables (or have had their address taken), and p is a local variable, whose address has not been taken:

for (p = q; p != 0; p = p -> next) {
    ++count;
}
for (p = q; p != 0; p = p -> next) {
    ++count2;
}

Could these two loops be merged and replaced by the following loop?

for (p = q; p != 0; p = p -> next) {
        ++count;
        ++count2;
}

Without the special dispensation in 6.8.5p6 for infinite loops, this would be disallowed: If the first loop doesn't terminate because q points to a circular list, the original never writes to count2. Thus it could be run in parallel with another thread that accesses or updates count2. This is no longer safe with the transformed version which does access count2 in spite of the infinite loop. Thus the transformation potentially introduces a data race.

In cases like this, it is very unlikely that a compiler would be able to prove loop termination; it would have to understand that q points to an acyclic list, which I believe is beyond the ability of most mainstream compilers, and often impossible without whole program information.

The restrictions imposed by non-terminating loops are a restriction on the optimization of terminating loops for which the compiler cannot prove termination, as well as on the optimization of actually non-terminating loops. The former are much more common than the latter, and often more interesting to optimize.

There are clearly also for-loops with an integer loop variable in which it would be difficult for a compiler to prove termination, and it would thus be difficult for the compiler to restructure loops without 6.8.5p6. Even something like

for (i = 1; i != 15; i += 2)

or

for (i = 1; i <= 10; i += j)

seems nontrivial to handle. (In the former case, some basic number theory is required to prove termination, in the latter case, we need to know something about the possible values of j to do so. Wrap-around for unsigned integers may complicate some of this reasoning further.)

This issue seems to apply to almost all loop restructuring transformations, including compiler parallelization and cache-optimization transformations, both of which are likely to gain in importance, and are already often important for numerical code. This appears likely to turn into a substantial cost for the benefit of being able to write infinite loops in the most natural way possible, especially since most of us rarely write intentionally infinite loops.

The one major difference with C is that C11 provides an exception for controlling expressions that are constant expressions which differs from C++ and makes your specific example well-defined in C11.

Trvial Infinite Loops are no longer UB in C++

With the adoption of P2809 at the 2024-03 Tokyo ISO C++ Committee meeting. Trivial infinite loops are no longer undefined behavior in C++.

This does not change the essence of the answer but does move C++ to be more in line with C here. It allows users to use what most would consider to be idiomatic infinite loops safety.

Yokoyokohama answered 22/6, 2015 at 19:22 Comment(1)
Are there any safe and useful optimizations that are facilitated by the present language that would not be facilitated just as well by saying "If the termination of a loop depends upon the state of any objects, the time required to execute the loop is not considered an observable side-effect, even if such time happens to be infinite". Given do { x = slowFunctionWithNoSideEffects(x);} while(x != 23); Hoisting code after the loop which would not depend upon x would seem safe and reasonable, but allowing a compiler to assume x==23 in such code seems more dangerous than useful.Ceballos
F
50

To me, the relevant justification is:

This is intended to allow compiler transfor- mations, such as removal of empty loops, even when termination cannot be proven.

Presumably, this is because proving termination mechanically is difficult, and the inability to prove termination hampers compilers which could otherwise make useful transformations, such as moving nondependent operations from before the loop to after or vice versa, performing post-loop operations in one thread while the loop executes in another, and so on. Without these transformations, a loop might block all other threads while they wait for the one thread to finish said loop. (I use "thread" loosely to mean any form of parallel processing, including separate VLIW instruction streams.)

EDIT: Dumb example:

while (complicated_condition()) {
    x = complicated_but_externally_invisible_operation(x);
}
complex_io_operation();
cout << "Results:" << endl;
cout << x << endl;

Here, it would be faster for one thread to do the complex_io_operation while the other is doing all the complex calculations in the loop. But without the clause you have quoted, the compiler has to prove two things before it can make the optimisation: 1) that complex_io_operation() doesn't depend on the results of the loop, and 2) that the loop will terminate. Proving 1) is pretty easy, proving 2) is the halting problem. With the clause, it may assume the loop terminates and get a parallelisation win.

I also imagine that the designers considered that the cases where infinite loops occur in production code are very rare and are usually things like event-driven loops which access I/O in some manner. As a result, they have pessimised the rare case (infinite loops) in favour of optimising the more common case (noninfinite, but difficult to mechanically prove noninfinite, loops).

It does, however, mean that infinite loops used in learning examples will suffer as a result, and will raise gotchas in beginner code. I can't say this is entirely a good thing.

EDIT: with respect to the insightful article you now link, I would say that "the compiler may assume X about the program" is logically equivalent to "if the program doesn't satisfy X, the behaviour is undefined". We can show this as follows: suppose there exists a program which does not satisfy property X. Where would the behaviour of this program be defined? The Standard only defines behaviour assuming property X is true. Although the Standard does not explicitly declare the behaviour undefined, it has declared it undefined by omission.

Consider a similar argument: "the compiler may assume a variable x is only assigned to at most once between sequence points" is equivalent to "assigning to x more than once between sequence points is undefined".

Fewness answered 28/8, 2010 at 22:1 Comment(19)
"Proving 1) is pretty easy" - in fact doesn't it follow immediately from the 3 conditions for the compiler to be allowed to assume loop termination under the clause Johannes is asking about? I think they amount to, "the loop has no observable effect, except perhaps spinning forever", and the clause ensures that "spinning forever" isn't guaranteed behaviour for such loops.Tyche
@Steve: it's easy if the loop doesn't terminate; but if the loop does terminate then it may have nontrivial behaviour which affects the processing of the complex_io_operation.Fewness
Oops, yes, I missed that it might modify non-volatile locals/aliases/whatever which are used in the IO op. So you're right: although it doesn't necessarily follow, there are many cases in which compilers can and do prove that no such modification occurs.Tyche
"It does, however, mean that infinite loops used in learning examples will suffer as a result, and will raise gotchas in beginner code. I can't say this is entirely a good thing." Just compile with optimizations off and it should still workLancewood
@Philip Potter: I don't think Undefined Behavior is indicated. I don't think the standard allows for a compiler to "assume" something it knows to be false, and a compiler which doesn't know a loop won't terminate must generate code whose ultimate behavior will end up being correct if the loop ever does.Ceballos
@supercat: what then does it mean to allow the compiler to assume something if the compiler still has to check if it's actually true? To me the very definition of assumption is to take something as true without having to check.Fewness
@Philip Potter: Suppose a some code generates strings to see if it can find one which matches a particular 64-bit hash, and will run until it finds a match. A compiler looking at that code probably can't know whether it will ever terminate. The compiler's behavior must be such that if and when the code does find a match, the sequence of visible events prescribed by the code will end up occurring. The rule allows the visible events may occur before the match is found, even if they're coded to occur after, provided that the final sequence will be correct if/when a match is found.Ceballos
@supercat: I'm not sure what your point is. I don't disagree with anything you say. If the loop does terminate, there is no UB.Fewness
@Philip Potter: The compiler may only meaningfully "assume" the loop will terminate if it doesn't know. If the compiler doesn't know whether or not the loop will terminate, its code must be confined to those operations which would not prove to be erroneous if the loop does terminate. Even if the loop does not terminate, I see no permission for the compiler to do anything that would be erroneous if by some miracle the loop did. The rules grant to compiler some latitude, but not nearly as much as UB. UB means the compiler is allowed to do ANYTHING and still comply with the spec.Ceballos
@supercat: i agree to the extent that this is what will happen in practice. However, if we assume the loop terminates, but it does not terminate, then we have a contradiction, and from a contradiction anything follows. xkcd.com/704Fewness
@Philip Potter: I guess the question would be whether a compiler is allowed to "assume" something it "knows" to be false or, to put it another way, whether permission to "assume" something implies permission to deliberately act adversely if it's KNOWN to be false. Frankly, I think the standard would be clearer and more useful if it said that the compiler is required to maintain all visible effects and side-effects of a piece of code, but time required to execute it--even if infinite--shall not in and of itself be considered an effect the compiler is required to maintain.Ceballos
@supercat: Of course the compiler can assume false things. That's what assuming is. It wouldn't be an assumption if the things were necessarily true. As for "knowing", the compiler's knowledge is irrelevent. The standard doesn't require the compiler to "know" anything about loop termination, and it would be really stupid if it did.Fewness
@Philip Potter: Suppose a program is supposed to perform a lengthy computation, print "The answer is", then print the answer, then print some more stuff. If the loops is going to terminate, then if the program's printout can't start with anything other than "The answer is" and the correct answer. Is there any way in which a conforming compiler that doesn't know whether a loop will terminate or what it will compute could print out anything other than "The answer is", prior to the termination of the loop? Even if the loop doesn't terminate, the behavior of the program is highly constrained.Ceballos
@supercat: What you describe is what will happen in practice, but it's not what the draft standard requires. We can't assume the compiler doesn't know whether a loop will terminate. If the compiler does know the loop won't terminate, it can do whatever it likes. The DS9K will create nasal demons for any infinite loop with no I/O etc. (Therefore, the DS9K solves the halting problem.)Fewness
@supercat. If the programmer can not be certain weather a deliberate infinite loop will be removed, that's undefined behavior. One compiler may eliminate the loop, while another leaves it in - or worse, the same compiler on different optimization settings. While restricted to 2 options, the specific compiler behavior is undefined by the standard.Crack
@phkahler: undefined behaviour is totally unrestricted. Those 2 options are not the only valid options for a compiler. That's kind of what "undefined behaviour" means: behaviour that the standard doesn't define. If the implementation must choose from a set of known options, that's "unspecified" rather than undefined behaviour.Fewness
@Philip: I personally dislike language like "the compiler may assume", unless the context makes clear what specifically the compiler may do on the basis of that assumption. Frankly, I think the whole "endless loop" thing would have been more clearly dealt with by saying that the execution of any piece of code may be deferred until its first visible side-effect, and a program may be considered to have finished execution once all visible side-effects that are ever going to occur have done so.Ceballos
@Philip: If the standard were written that way, there would be no confusion about "compiler assumptions". If a loop has no side-effects other than the time required to execute it, its "execution" may be deferred until the program has performed all required side-effect-inducing actions. At that point, even if there are still one or more side-effect-less endless loops that haven't yet executed, one could perfectly well pretend that those loops will execute in parallel with the execution of the next program to run.Ceballos
@PhilipPotter: Under the as-if rule, if an optimization would cause a compiler to produce code for a construct whose observable behavior would be inconsistent with sequentially processing the code as written, the only way to allow the optimization is to "undefine" program behavior in any cases where such inconsistencies might arise. Unfortunately, the Standard uses the same terminology to allow generally-harmless deviations from what would otherwise be defined sequential program behavior, as it uses to describe constructs that are genuinely meaningless.Ceballos
Y
41

Does someone have a good explanation of why this was necessary to allow?

Yes, Hans Boehm provides a rationale for this in N1528: Why undefined behavior for infinite loops?, although this is WG14 document the rationale applies to C++ as well and the document refers to both WG14 and WG21:

As N1509 correctly points out, the current draft essentially gives undefined behavior to infinite loops in 6.8.5p6. A major issue for doing so is that it allows code to move across a potentially non-terminating loop. For example, assume we have the following loops, where count and count2 are global variables (or have had their address taken), and p is a local variable, whose address has not been taken:

for (p = q; p != 0; p = p -> next) {
    ++count;
}
for (p = q; p != 0; p = p -> next) {
    ++count2;
}

Could these two loops be merged and replaced by the following loop?

for (p = q; p != 0; p = p -> next) {
        ++count;
        ++count2;
}

Without the special dispensation in 6.8.5p6 for infinite loops, this would be disallowed: If the first loop doesn't terminate because q points to a circular list, the original never writes to count2. Thus it could be run in parallel with another thread that accesses or updates count2. This is no longer safe with the transformed version which does access count2 in spite of the infinite loop. Thus the transformation potentially introduces a data race.

In cases like this, it is very unlikely that a compiler would be able to prove loop termination; it would have to understand that q points to an acyclic list, which I believe is beyond the ability of most mainstream compilers, and often impossible without whole program information.

The restrictions imposed by non-terminating loops are a restriction on the optimization of terminating loops for which the compiler cannot prove termination, as well as on the optimization of actually non-terminating loops. The former are much more common than the latter, and often more interesting to optimize.

There are clearly also for-loops with an integer loop variable in which it would be difficult for a compiler to prove termination, and it would thus be difficult for the compiler to restructure loops without 6.8.5p6. Even something like

for (i = 1; i != 15; i += 2)

or

for (i = 1; i <= 10; i += j)

seems nontrivial to handle. (In the former case, some basic number theory is required to prove termination, in the latter case, we need to know something about the possible values of j to do so. Wrap-around for unsigned integers may complicate some of this reasoning further.)

This issue seems to apply to almost all loop restructuring transformations, including compiler parallelization and cache-optimization transformations, both of which are likely to gain in importance, and are already often important for numerical code. This appears likely to turn into a substantial cost for the benefit of being able to write infinite loops in the most natural way possible, especially since most of us rarely write intentionally infinite loops.

The one major difference with C is that C11 provides an exception for controlling expressions that are constant expressions which differs from C++ and makes your specific example well-defined in C11.

Trvial Infinite Loops are no longer UB in C++

With the adoption of P2809 at the 2024-03 Tokyo ISO C++ Committee meeting. Trivial infinite loops are no longer undefined behavior in C++.

This does not change the essence of the answer but does move C++ to be more in line with C here. It allows users to use what most would consider to be idiomatic infinite loops safety.

Yokoyokohama answered 22/6, 2015 at 19:22 Comment(1)
Are there any safe and useful optimizations that are facilitated by the present language that would not be facilitated just as well by saying "If the termination of a loop depends upon the state of any objects, the time required to execute the loop is not considered an observable side-effect, even if such time happens to be infinite". Given do { x = slowFunctionWithNoSideEffects(x);} while(x != 23); Hoisting code after the loop which would not depend upon x would seem safe and reasonable, but allowing a compiler to assume x==23 in such code seems more dangerous than useful.Ceballos
S
17

I think the correct interpretation is the one from your edit: empty infinite loops are undefined behavior.

I wouldn't say it's particularly intuitive behavior, but this interpretation makes more sense than the alternative one, that the compiler is arbitrarily allowed to ignore infinite loops without invoking UB.

If infinite loops are UB, it just means that non-terminating programs aren't considered meaningful: according to C++0x, they have no semantics.

That does make a certain amount of sense too. They are a special case, where a number of side effects just no longer occur (for example, nothing is ever returned from main), and a number of compiler optimizations are hampered by having to preserve infinite loops. For example, moving computations across the loop is perfectly valid if the loop has no side effects, because eventually, the computation will be performed in any case. But if the loop never terminates, we can't safely rearrange code across it, because we might just be changing which operations actually get executed before the program hangs. Unless we treat a hanging program as UB, that is.

Squinch answered 29/8, 2010 at 4:44 Comment(7)
“empty infinite loops are undefined behavior”? Alan Turing would beg to differ, but only when he gets over spinning in his grave.Sphere
@Donal: I never said anything about its semantics in a Turing machine. We're discussing the semantics of an infinite loop with no side effcts in C++. And as I read it, C++0x chooses to say that such loops are undefined.Squinch
Empty infinite loops would be silly, and there'd be no reason to have special rules for them. The rule is designed to deal with useful loops of unbounded (hopefully not infinite) duration, which calculate something that will be needed in future but not immediately.Ceballos
Does this mean that C++0x is not suited for embedded devices? Almost all embedded devices are non-terminating and do their job inside a big fat while(1){...}. They even routinely use while(1); to invoke a watchdog-assisted reset.Hamnet
@vsz: the first form is fine. Infinite loops are perfectly well defined, as long as they have some sort of observable behavior. The second form is trickier, but I can think of two very easy ways out: (1) a compiler targeting embedded devices could just choose to define stricter behavior in that case, or (2) you create a body which calls some dummy library function. As long as the compiler doesn't know what that function does, it has to assume that it may have some side effect, and so it can't mess with the loop.Squinch
@Squinch : I asked it and got a pretty good answer for it: stackoverflow.com/questions/24298314/…Hamnet
@jalf: There are many situations where cleanly omitting the code for a loop that may or may not terminate, but will definitely have no side effects if it does, may be a useful optimization. If programmers must include dummy side effects in such loops to prevent completely arbitrary behavior in response to inputs that might cause them to hang, such useful optimizations would be precluded and the only effect of the rule would be to require that programmers expend needless effort to achieve performance which is no better than would have been achieved if compilers processed loops as written.Ceballos
S
10

The relevant issue is that the compiler is allowed to reorder code whose side effects do not conflict. The surprising order of execution could occur even if the compiler produced non-terminating machine code for the infinite loop.

I believe this is the right approach. The language spec defines ways to enforce order of execution. If you want an infinite loop that cannot be reordered around, write this:

volatile int dummy_side_effect;

while (1) {
    dummy_side_effect = 0;
}

printf("Never prints.\n");
Silence answered 29/8, 2010 at 5:20 Comment(2)
@JohannesSchaub-litb: If a loop--endless or not--doesn't read or write any volatile variables during execution, and does not call any functions that might do so, a compiler is free to defer any portion of the loop until the first effort to access something computed therein. Given unsigned int dummy; while(1){dummy++;} fprintf(stderror,"Hey\r\n"); fprintf(stderror,"Result was %u\r\n",dummy);, the first fprintf could execute, but the second could not (the compiler could move the computation of dummy between the two fprintf, but not past the one that prints its value).Ceballos
BTW, since responding to this question, I've discovered that in some cases where nothing in the code as written will rely upon any post-conditions that would be guaranteed by successful completion of a loop, clang will sometimes combine optimizations that rely upon such post-conditions with optimizations that remove the loop. The Standard doesn't forbid such nonsense, but it means that inputs that would cause code to get stuck in an infinite loop may cause memory corruption, in ways not bound by ordinary laws of causality.Ceballos
S
8

I think this is along the lines of the this type of question, which references another thread. Optimization can occasionally remove empty loops.

Slambang answered 28/8, 2010 at 21:47 Comment(14)
Nice question. Seems like that guy had exactly the problem that this paragraph allows that compiler to cause. In the linked discussion by one of the answers, it is written that "Unfortunately, the words 'undefined behavior' are not used. However, anytime the standard says 'the compiler may assume P,' it is implied that a program which has the property not-P has undefined semantics." . This surprises me. Does this mean my example program above has undefined behavior, and may just segfault out of nowhere?Canard
@Johannes: the text "may be assumed" doesn't occur anywhere else in the draft I have to hand, and "may assume" only occurs a couple of times. Although I checked this with a search function which fails to match across line breaks, so I may have missed some. So I'm not sure the author's generalisation is warranted on the evidence but as a mathematician I have to concede the logic of the argument, that if the compiler assumes something false then in general it may deduce anything...Tyche
...Permitting a contradiction in the compiler's reasoning about the program certainly hints very strongly at UB, since in particular it allows the compiler, for any X, to deduce that the program is equivalent to X. Surely permitting the compiler to deduce that is permitting it to do that. I agree with the author, too, that if UB is intended it should be explicitly stated, and if it's not intended then the spec text is wrong and should be fixed (perhaps by the spec-language equivalent of, "the compiler may replace the loop with code that has no effect", I'm not sure).Tyche
@SteveJessop: What would you think of simply saying that execution of any piece of code--including infinite loops--may be postponed until such time as something the piece of code did would affect an observable program behavior, and that for purposes of that rule, the time required to execute a piece of code--even if infinite--is not an "observable side-effect". If a compiler can demonstrate that a loop cannot exit without a variable holding a certain value, the variable may be deemed to hold that value, even it could also be shown that the loop could not exit with it holding that value.Ceballos
@supercat: as you've stated that conclusion, I don't think it improves things. If the loop provably never exits then for any object X and bit-pattern x, the compiler can demonstrate the loop does not exit without X holding bit-pattern x. It's vacuously true. So X could be deemed to hold any bit pattern, and that's as bad as UB in the sense that for the wrong X and x it will swiftly cause some. So I believe you need to be more precise in your standardese. It's difficult to talk about what happens "at the end of" an infinite loop, and show it equivalent to some finite operation.Tyche
@SteveJessop: E.g. after unsigned int blah=0; while(blah != 42 || blah != 24601) blah++; printf("%d",blah); could legally print either 42 or 24601; after unsigned int blah=1; while(blah != 0 && blah != 2) blah+=2; printf("%d",blah & mask); a compiler could legally print 0 if (mask & 2) was zero, but would otherwise have to wait for the loop to complete to find the value of bit 1 of mask.Ceballos
@SteveJessop: Perhaps things need to be better worded to say that a if a compiler deduces that a loop cannot end, it cannot reschedule it, but otherwise the compiler may legitimately make any combination of inferences. Thus, a compiler could in the second code example deem that bit 0 of blah to be set (since that's a loop invariant) or clear (required by exit condition), but it could not deduce that some unrelated variable was equal to 42, since the only way to deduce that the loop couldn't end without that variable being 42 would be to deduce that it couldn't end at all.Ceballos
@SteveJessop: Do you think that rule would seem workable? I see no basis for allowing endless loops to trigger nasal demons (very few programs could be statically analyzed and shown not to cause nasal demons were that the case), and I think that allowing compilers to postpone loops while making any combination of deductions which would not require a deduction that a loop cannot terminate at all would accommodate all useful inferences about loops which might or might not terminate, without allowing nasal demons for loops which happen not to do so.Ceballos
@supercat: I don't think it's helpful to say "if a compiler deduces X then it cannot do Y", unless the standard specifies what every compiler must do in order to attempt to deduce X. And I don't think the authors want to do that. Otherwise you're allowing bad optimizers to do something that good optimizers are forbidden, which doesn't make sense. Sounds like you could define certain kinds of loop invariants and exit conditions and allow the re-ordered code to rely on them. Since no compiler would be required to deduce these things I'd guess it wouldn't be an implementation burden...Tyche
... and the desired optimization in Philip Potter's answer would be permitted provided the compiler can prove that the "externally visible operation" really doesn't affect the observable behavior of the "complex io operation". Which it has to prove at the moment anyway before parallelizing them, just in case the loop does terminate. So yeah, give it a go and see what you come up with :-) I suspect it's academic for C++ though, since the standard has been published allowing UB and so narrowing the permitted behavior now would be an incompatible change from the implementer's POV.Tyche
@SteveJessop: I think the goal of the standards committee is (or at least should be) to allow compiler writers to make any and all useful inferences that would hold if the loop in fact terminates, without authorizing nasal demons if it doesn't; thus, there must be some limit on what compilers are or are not allowed to do with endless loops. How about, instead of saying "if the compiler deduces...", saying "The compiler may not reschedule a loop if it makes use of any inference that the loop cannot exit". Given uint32_t x=0,y=0; while (foo(y) || (y != 42)) y++;...Ceballos
...a compiler that can't figure out anything about foo(y) beyond its lack of side-effects could (but would not be required to) deduce that the loop cannot exit when y isn't 42, and thus deduce that y must be 42 when it exits. If the compiler determined that foo could never return zero when y is 42, that would imply that the loop cannot terminate when x isn't 57 (vacuously true, since it can't terminate at all), but couldn't make that inference without making use of the inference that the loop never terminates. Can you see any way a compiler could infer anything about the value of x...Ceballos
...when the loop terminates, other than through an inference that the loop doesn't terminate? The idea that endless loops should be unqualified UB seems massively overkill, and makes static verifiers essentially useless on any program outside of those which can be mathematically proven to be incapable of getting stuck in an endless loop.Ceballos
Note, I added a new answer to the second question you mention ... C11 provided an explicit exception for loops with controlling expressions that are constant expressions.Yokoyokohama
C
1

I think the issue could perhaps best be stated, as "If a later piece of code does not depend on an earlier piece of code, and the earlier piece of code has no side-effects on any other part of the system, the compiler's output may execute the later piece of code before, after, or intermixed with, the execution of the former, even if the former contains loops, without regard for when or whether the former code would actually complete. For example, the compiler could rewrite:

void testfermat(int n)
{
  int a=1,b=1,c=1;
  while(pow(a,n)+pow(b,n) != pow(c,n))
  {
    if (b > a) a++; else if (c > b) {a=1; b++}; else {a=1; b=1; c++};
  }
  printf("The result is ");
  printf("%d/%d/%d", a,b,c);
}

as

void testfermat(int n)
{
  if (fork_is_first_thread())
  {
    int a=1,b=1,c=1;
    while(pow(a,n)+pow(b,n) != pow(c,n))
    {
      if (b > a) a++; else if (c > b) {a=1; b++}; else {a=1; b=1; c++};
    }
    signal_other_thread_and_die();
  }
  else // Second thread
  {
    printf("The result is ");
    wait_for_other_thread();
  }
  printf("%d/%d/%d", a,b,c);
}

Generally not unreasonable, though I might worry that:

  int total=0;
  for (i=0; num_reps > i; i++)
  {
    update_progress_bar(i);
    total+=do_something_slow_with_no_side_effects(i);
  }
  show_result(total);

would become

  int total=0;
  if (fork_is_first_thread())
  {
    for (i=0; num_reps > i; i++)
      total+=do_something_slow_with_no_side_effects(i);
    signal_other_thread_and_die();
  }
  else
  {
    for (i=0; num_reps > i; i++)
      update_progress_bar(i);
    wait_for_other_thread();
  }
  show_result(total);

By having one CPU handle the calculations and another handle the progress bar updates, the rewrite would improve efficiency. Unfortunately, it would make the progress bar updates rather less useful than they should be.

Ceballos answered 1/9, 2010 at 19:20 Comment(3)
I think that your progress bar case could not be separated, because displaying a progress bar is a library I/O call. Optimisations should not change visible behaviour in this way.Fewness
@Philip Potter: If the slow routine had side-effects, that would certainly be true. In my example before, it would be meaningless if it didn't, so I changed it. My interpretation of the spec is that the system is allowed to defer the execution of the slow code until such time as its effects (other than the time it takes to execute) would become visible, i.e. the show_result() call. If the progress-bar code made use of the running total, or at least pretended to do so, that would force it to sync up with the slow code.Ceballos
This explains all of those progress bars that go fast from 0 to 100 and then hang for ages ;)Ekg
S
1

Does someone have a good explanation of why this was necessary to allow?

I have an explanation of why it is necessary not to allow, assuming that C++ still has the ambition to be a general-purpose language suitable for performance-critical, low-level programming.

This used to be a working, strictly conforming freestanding C++ program:

int main()
{
  setup_interrupts();
  for(;;)
  {}
}

The above is a perfectly fine and common way to write main() in many low-end microcontroller systems. The whole program execution is done inside interrupt service routines (or in case of RTOS, it could be processes). And main() may absolutely not be allowed to return since these are bare metal systems and there is nobody to return to.

Typical real-world examples of where the above design can be used are PWM/motor driver microcontrollers, lighting control applications, simple regulator systems, sensor applications, simple household electronics and so on.

Changes to C++ have unfortunately made the language impossible to use for this kind of embedded systems programming. Existing real-world applications like the ones above will break in dangerous ways if ported to newer C++ compilers.

C++20 6.9.2.3 Forward progress [intro.progress]

The implementation may assume that any thread will eventually do one of the following:
(1.1) — terminate,
(1.2) — make a call to a library I/O function,
(1.3) — perform an access through a volatile glvalue, or
(1.4) — perform a synchronization operation or an atomic operation.

Lets address each bullet for the above, previously strictly conforming freestanding C++ program:

  • 1.1. As any embedded systems beginner can tell the committee, bare metal/RTOS microcontroller systems never terminate. Therefore the loop cannot terminate either or main() would turn into runaway code, which would be a severe and dangerous error condition.
  • 1.2 N/A.
  • 1.3 Not necessarily. One may argue that the for(;;) loop is the proper place to feed the watchdog, which in turn is a side effect as it writes to a volatile register. But there may be implementation reasons why you don't want to use a watchdog. At any rate, it is not C++'s business to dictate that a watchdog should be used.
  • 1.4 N/A.

Because of this, C++ is made even more unsuitable for embedded systems applications than it already was before.


Another problem here is that the standard speaks of "threads", which are higher level concepts. On real-world computers, threads are implemented through a low-level concept known as interrupts. Interrupts are similar to threads in terms of race conditions and concurrent execution, but they are not threads. On low-level systems there is just one core and therefore only one interrupt at a time is typically executed (kind of like threads used to work on single core PC back in the days).

And you can't have threads if you can't have interrupts. So threads would have to be implemented by a lower-level language more suitable for embedded systems than C++. The options being assembler or C.


By writing a loop, the programmer is asserting either that the loop does something with visible behavior (performs I/O, accesses volatile objects, or performs synchronization or atomic operations), or that it eventually terminates.

This is completely misguided and clearly written by someone who has never worked with microcontroller programming.


So what should those few remaining C++ embedded programmers who refuse to port their code to C "because of reasons" do? You have to introduce a side effect inside the for(;;) loop:

int main()
{
  setup_interrupts();
  for(volatile int i=0; i==0;)
  {}
}

Or if you have the watchdog enabled as you ought to, feed it inside the for(;;) loop.

Shanly answered 19/1, 2023 at 14:27 Comment(8)
(Although this C++ loop repeatedly checking a volatile is more dangerous than a C for(;;){} loop. In case of RAM memory corruption, the C loop will still boil down to machine code like van_halen: jmp van_halen, whereas the C++ loop will be reading something from RAM and main() will therefore turn into runaway code in case the memory location of i ever changes.)Shanly
Which C++ compiler in the field actually does optimize away an empty infinite loop? I tried to get gcc/clang on godbolt to optimize the loop away but failed. Do I have to add special command line options? godbolt.org/z/rv4zcr7jcReuben
@DanielK. The clang compiler fails spectacularly. It doesn't optimize the loop away, it runs into the UB woods screaming. Nothing special required, latest clang for C++ and -O3. godbolt.org/z/h45d8hTP3 Comment in the #define and it segfaults. Look at the generated machine code, it's nonsense. clang for C behaves, as does gcc/g++. Add the volatile hack from this answer and then it works.Shanly
@DanielK. Although it isn't just the C++ standard that's broken. clang is similarly broken even in C since forever, see How do I make an infinite empty loop that won’t be optimized away? So it is best to stay away from C++ in embedded systems and stay away from the clang compiler entirely. Neither of them is likely to get fixed, too much prestige invested in broken designs.Shanly
Thanks for the insight. I figured that godbolt seems to give ambiguous results (or I may be doing something wrong): The same program (as I linked above) with the same C++ compiler (clang 16.0.0) and the same options (-Wall -O3) now gives a completely different result: now the infinite loop is optimized away and as such the program terminates. Further, it outputs "hello" twice and nothing else. Strange.Reuben
@DanielK. Just stop using clang, it is a broken PoS. Using any version past 13.0.0, look at the machine code for all those examples, do you see an x86 ret instruction anywhere? It did not only optimize out the loop, it laid down to die halfways through code generation. Newer versions of clang like to do this, giving you a half-baked executable. Why your example does not only print twice but also returns code 139 = seg fault. clang has evidently decided to be the infamous "ISO compliant death station" - they don't care the slightest about quality of implementation.Shanly
Similar "lets lay down to die in silence" behavior from clang here (C question): clang 15 miscompiles code accessing indeterminate values. In order to get a warning "oops btw I only generated half an executable", you have to use clang with very specific options. Useless tool, port to gcc/g++ asap.Shanly
I'm using gcc not clang. I was just fiddling with different compilers on godbolt.org. Thanks.Reuben
W
0

It is not decidable for the compiler for non-trivial cases if it is an infinite loop at all.

In different cases, it can happen that your optimiser will reach a better complexity class for your code (e.g. it was O(n^2) and you get O(n) or O(1) after optimisation).

So, to include such a rule that disallows removing an infinite loop into the C++ standard would make many optimisations impossible. And most people don't want this. I think this quite answers your question.


Another thing: I never have seen any valid example where you need an infinite loop which does nothing.

The one example I have heard about was an ugly hack that really should be solved otherwise: It was about embedded systems where the only way to trigger a reset was to freeze the device so that the watchdog restarts it automatically.

If you know any valid/good example where you need an infinite loop which does nothing, please tell me.

Wellpreserved answered 3/9, 2010 at 18:36 Comment(3)
Example of where you might want an infinite loop: an embedded system where you don't want to sleep for performance reasons and all the code is hanging off an interrupt or two?Subtrahend
@Subtrahend in Standard C the interrupts should set a flag which the main loop checks , therefore the main loop would have observable behaviour in the case of the flags being set. Running substantial code in interrupts is non-portable .Bossism
@M.M: On embedded systems with prioritized interrupts, it's not terribly uncommon to have systems where, after initialization, everything is done within some kind of interrupt. Many ARM-family controllers even have a mode explicitly for such designs which won't need an endless loop in main, because once a flag is set the system won't execute any more code that isn't in interrupts.Ceballos
D
-1

I think it's worth pointing out that loops which would be infinite except for the fact that they interact with other threads via non-volatile, non-synchronised variables can now yield incorrect behaviour with a new compiler.

I other words, make your globals volatile -- as well as arguments passed into such a loop via pointer/reference.

Disinfest answered 25/8, 2011 at 10:13 Comment(15)
If they're interacting with other threads, you shouldn't make them volatile, make them atomics or protect them with a lock.Califate
Thjs is terrible advice. Making them volatile is neither necessary nor suffiicent, and it hurts performance dramatically.Mussolini
@DavidSchwartz: The semantics of volatile are "implementation-defined", and the sufficiency of that qualifier for various purposes will depend upon how an implementation chooses to define it. There are many freestanding implementations for which volatile is both necessary and sufficient to accomplish what programs will need to do.Ceballos
@Ceballos Can you really name a single implementation where volatile is necessary?Mussolini
@DavidSchwartz: On many commercial compilers, consecutive accesses to static-duration objects will be consolidated unless either the objects are volatile, the accesses are separated by a volatile write, or one of the accesses is a write and they are separated by a volatile read. For platforms that use a consistent memory model (including single Except in cases where volatile reads trigger actions, volatile will often be necessary and sufficient.Ceballos
@Ceballos Again, can you name a single platform where volatile is necessary? Every one I know of includes better ways to do this. (And volatile is truly awful because it is not going to be clear what it is doing or why it is there while specific functions (such as an atomic fetch function used where the atomic fetch is needed) are self-documenting.)Mussolini
@DavidSchwartz: Most compilers I've used for embedded platforms worked as I described, and for many of them in the 1990s the two ways of achieving the proper semantics would be to use volatile with standard syntax, or asm with non-standard syntax. Using a volatile qualifier on objects that are e.g. manipulated in ISR's and inspected in main-line code, or vice versa, seems cleaner than using a non-standard asm syntax to force an "optimization barrier", and I'm not sure what alternative you think would be better on platforms without native compare-and-swap semantics.Ceballos
@Ceballos What would be better is to code what you actually want. If you actually want an atomic read, then use the platform's atomic read function. Can you name a platform where volatile is really the best way? Anything in the past 15 years?Mussolini
@DavidSchwartz: What kind of "atomic" read function does a Motorola 65HC11 or 68000 have? Or an Intel 8088? Or an ARM Cortex-M0, the latter of whic is still used in new designs?Ceballos
@Ceballos It depends on the compiler. GCC, for example, has built in atomic intrinsics that provide you all, and only, the semantics you require. They're much better than volatile because they don't impose any costs you don't need and they're self-documenting while volatile is always a mystery. You need to check the compiler's documentation anyway to determine if volatile is sufficient since we agreed it was platform-specific, right?Mussolini
@DavidSchwartz: Commercial compiler vendors of the 1980s and 1990s processed volatile in such a manner as to be sufficient, and there was never any reason for any compiler intended for use with user-code interrupt routines not to support such semantics, at least as an option. Volatile semantics will be adequate for any compiler that makes a bona fide effort to make them so, and the fact that a program to accomplish some task is incompatible with compilers that make no effort to support such a task is only a defect if no better compiler is available.Ceballos
@DavidSchwartz: If every compiler but one supports a construct using the same syntax, bjut the author of one oddball compiler requires a non-standard syntax, coe using the commonly-supported sequence should be viewed as portable, and the oddball compiler should be recognized as aberrant. The Standard doesn't require that all implementations be suitable for all tasks, and suitability for tasks involving interrupts is outside the Standard's jurisdiction. As a consequence, it has no authority to say that such suitability should not be viewed as requiring the common treatment of volatile.Ceballos
@DavidSchwartz: If all compilers will process a program to accomplish a certain task in the same semantically-useful fashion with optimizations disabled, without requiring non-standard syntax, compilers that can efficiently accomplish such task without jumping through hoops or mandating vendor-specific syntax should be recognized as being more suitable for that task than those which can't, regardless of whether the Standard mandates such support.Ceballos
Almost by definition, it's what the standard mandates that you can rely on being processed the same. Otherwise, you're just throwing your hands up in the air and hoping for the best. And there are the huge disadvantages I mentioned -- you pay for much more than you need and your code is not self-documenting. In any event, if there are other options, it's not necessary. If no standard or compiler-specific document says it's sufficient, you're just hoping it's sufficient.Mussolini
Non-embedded programmers cannot understand why variables shared with an ISR need to be volatile for, because they don't even know what an interrupt is to begin with. And then those PC programmers end up in the C++ committee. And there you go.Shanly

© 2022 - 2024 — McMap. All rights reserved.