In C++11, does `i += ++i + 1` exhibit undefined behavior?
Asked Answered
G

5

46

This question came up while I was reading (the answers to) So why is i = ++i + 1 well-defined in C++11?

I gather that the subtle explanation is that (1) the expression ++i returns an lvalue but + takes prvalues as operands, so a conversion from lvalue to prvalue must be performed; this involves obtaining the current value of that lvalue (rather than one more than the old value of i) and must therefore be sequenced after the side effect from the increment (i.e., updating i) (2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of i; while this value computation is unsequenced w.r.t. the value computation of the RHS, this poses no problem (3) the value computation of the assignment itself involves updating i (again), but is sequenced after the value computation of its RHS, and hence after the prvious update to i; no problem.

Fine, so there is no UB there. Now my question is what if one changed the assigment operator from = to += (or a similar operator).

Does the evaluation of the expression i += ++i + 1 lead to undefined behavior?

As I see it, the standard seems to contradict itself here. Since the LHS of += is still an lvalue (and its RHS still a prvalue), the same reasoning as above applies as far as (1) and (2) are concerned; there is no undefined behavior in the evalutation of the operands on +=. As for (3), the operation of the compound assignment += (more precisely the side effect of that operation; its value computation, if needed, is in any case sequenced after its side effect) now must both fetch the current value of i, and then (obviously sequenced after it, even if the standard does not say so explicitly, or otherwise the evaluation of such operators would always invoke undefined behavior) add the RHS and store the result back into i. Both these operations would have given undefined behavior if they were unsequenced w.r.t. the side effect of the ++, but as argued above (the side effect of the ++ is sequenced before the value computation of + giving the RHS of the += operator, which value computation is sequenced before the operation of that compound assignment), that is not the case.

But on the other hand the standard also says that E += F is equivalent to E = E + F, except that (the lvalue) E is evaluated only once. Now in our example the value computation of i (which is what E is here) as lvalue does not involve anything that needs to be sequenced w.r.t. other actions, so doing it once or twice makes no difference; our expression should be strictly equivalent to E = E + F. But here's the problem; it is pretty obvious that evaluating i = i + (++i + 1) would give undefined behaviour! What gives? Or is this a defect of the standard?

Added. I have slightly modified my discussion above, to do more justice to the proper distinction between side effects and value computations, and using "evaluation" (as does the standard) of an expression to encompass both. I think my main interrogation is not just about whether behavior is defined or not in this example, but how one must read the standard in order to decide this. Notably, should one take the equivalence of E op= F to E = E op F as the ultimate authority for the semantics of the compound assignment operation (in which case the example clearly has UB), or merely as an indication of what mathematical operation is involved in determining the value to be assigned (namely the one identified by op, with the lvalue-to-rvalue converted LHS of the compound assignment operator as left operand and its RHS as right operand). The latter option makes it much harder to argue for UB in this example, as I have tried to explain. I admit that it is tempting to make the equivalence authoritative (so that compound assignments become a kind of second-class primitives, whose meaning is given by rewriting in term of first-class primitives; thus the language definition would be simplified), but there are rather strong arguments against this:

  • The equivalence is not absolute, because of the "E is evaluated only once" exception. Note that this exception is essential to avoid making any use where the evaluation of E involves a side effect undefined behavior, for instance in the fairly common a[i++] += b; usage. If fact I think no absolutely equivalent rewriting to eliminate compound assignments is possible; using a fictive ||| operator to designate unsequenced evaluations, one might try to define E op= F; (with int operands for simplicity) as equivalent to { int& L=E ||| int R=F; L = L + R; }, but then the example no longer has UB. In any case the standard gives us no rewriitng recipe.

  • The standard does not treat compound assignments as second-class primitives for which no separate definition of semantics is necessary. For instance in 5.17 (emphasis mine)

    The assignment operator (=) and the compound assignment operators all group right-to-left. [...] In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation.

  • If the intention were to let compound assignments be mere shorthands for simple assignments, there would be no reason to include them explicitly in this description. The final phrase even directly contradicts what would be the case if the equivalence was taken to be authoritative.

If one admits that compound assignments have a semantics of their own, then the point arises that their evaluation involves (apart from the mathematical operation) more than just a side effect (the assignment) and a value evaluation (sequenced after the assignment), but also an unnamed operation of fetching the (previous) value of the LHS. This would normally be dealt with under the heading of "lvalue-to-rvalue conversion", but doing so here is hard to justify, since there is no operator present that takes the LHS as an rvalue operand (though there is one in the expanded "equivalent" form). It is precisely this unnamed operation whose potential unsequenced relation with the side effect of ++ would cause UB, but this unsequenced relation is nowhere explicitly stated in the standard, because the unnamed operation is not. It is hard to justify UB using an operation whose very existence is only implicit in the standard.

Gynaeco answered 12/6, 2014 at 21:17 Comment(18)
Unfortunately, the Standard fails to define properly which operators require an lvalue-to-rvalue conversion in which circumstances. See CWG active issue 1642Whitmer
Why would it? Let i = 0.. Would yield 2 because 0 + 1 + 1 = 2Twinkle
"and must therefore be sequenced after the side effect from the increment" I think that must might be misleading here: The Standard does require for increment that the side effect is sequenced before the value computation of the result of the expression ++i, and + requires the value computations of the operands to be sequenced before the value computation of the + expression.Whitmer
IIRC the value computation of the LHS is also pretty obscure, maybe it even appears in a DR (I'll check)Whitmer
I think that from "E += F is equivalent to E = E + F" you can't conclude "i += ++i + 1 is equivalent to i = i + (++i + 1)", exactly because of what's intended with the "E is evaluated only once" condition.Jakoba
@Rhymoid The "evaluated once" part is for things like foo.bar() += 5, where foo.bar() is not executed twice, for example (AFAIK)Whitmer
@πάνταῥεῖ Reading comprehension fail.Tletski
@LightnessRacesinOrbit I think it's people trying to understand what's going on, testing their understanding with complicated examples.Whitmer
@LightnessRacesinOrbit: This question did not come up in my programming practice. It is a merely intellectual exercise in trying to understand the limits the standard sets (or does not set) for meaningful programs, and beyond that to understand in what way the standard should be read. But such exercises are not useless; they can be vital to implementors, and what implementors (can) do is ultimately significant to programmers.Gynaeco
@MarcvanLeeuwen This indicates that that piece of code is garbage indeed, as it is way too complex for anyone to properly understand.Leanneleanor
@BЈовић: The citation you linked to is unrelated to the piece of code above (which by the way I have no intention of writing, let alone debugging), but it does show that you missed the purpose of my question. And although you are completely free to consider it beyond your intellectual capabilities (or simply totally uninteresting), it is a bit presumptive to assert that this must be equally true for anyone. As for complexity, I've seen quite a bit worse than this example.Gynaeco
@Whitmer And just why do you need to evolve a deeper understanding for pointless code which is always incorrect? You will gain no knowledge of value from it. For some reason people have muddy ideas about how they will learn important things about the low level stuff by exploring the results of undefined behavior. What they fail to see is that nothing is stopping them from exploring such things at any point, without some weird UB scenario looming in the background.Defendant
@Lundin: It may be pointless to write such explicit code, but equivalent cases may arise in code that looks much more reasonable, for instance if some different looking subexpressions happen to be aliases for the same variable. And understanding the model used in the standard is important to know the basis on which showing correctness of your program can be founded. It also serves as a contract that implementors have to fulfill. Implementors love the small print that frees them from having to do what you intend to do, if they can hold some subtle UB you never thought of against you.Gynaeco
@MarcvanLeeuwen Knowing about what cases of UB there are in the language, and exploring the outcome of UB, are two different things. The former you are expected to know as a C++ programmer (tough luck, there are hundreds of cases of UB, enough to fill a book). The latter is just waste of time.Defendant
@Lundin: This question is not about the outcome of UB; as we all know it just means you give your implementor a blank cheque. It is about whether UB arises in certain circumstances. As a side note I am somewhat surprised by the apparent general enthousiasm in favour of UB (gleaned from the answers), which might be similar to sentiments in favor of capital punishment. We all think it will never happen to us. (Other aside, when D.E.Knuth was told that in C null pointers need not be represented by a zero value, he just decided he would refuse to let any so stupid compiler touch his code.)Gynaeco
I'm voting this question down, not because I think it's a bad, silly or useless, but because you have failed to separate the question from explanations for possible answers. Good questions on SO are succinctly stated with only so much information so as to "set up" the question. You have a number of questions here, mixed with justifications and explanations.Moldavia
@ThomasMcLeod: You are right of course about length. And maybe this is just not the right forum. But this is a subtle question that I cannot state succinctly, and it generates a lot of interrogation as the comments show. As the added part shows it is not so much about knowing UB for one particular case, but about how one should think about the semantics of compound assignment. But for an even longer-winded investigation of the possible interpretations, I've separated those out in a self-answer.Gynaeco
@MarcvanLeeuwen: I would really like to see some official recognized split between what language the C89 authors intended to invent (augment pre-existing forms of C with a common core that could be used to achieve portability at the expense of both readability and performance) with what one would have if one read only the Standard and ignored pre-existing forms of C.Notify
G
0

There is no clear case for Undefined Behavior here

Sure, an argument leading to UB can be given, as I indicated in the question, and which has been repeated in the answers given so far. However this involves a strict reading of 5.17:7 that is both self-contradictory and in contradiction with explicit statements in 5.17:1 about compound assignment. With a weaker reading of 5.17:7 the contradictions disappear, as does the argument for UB. Whence my conclusion is neither that there is UB here, nor that there is clearly defined behaviour, but the the text of the standard is inconsistent, and should be modified to make clear which reading prevails (and I suppose this means a defect report should be written). Of course one might invoke here the fall-back clause in the standard (the note in 1.3.24) that evaluations for which the standard fails to define the behavior [unambiguously and self-consistently] are Undefined Behavior, but that would make any use of compound assignments (including prefix increment/decrement operators) into UB, something that might appeal to certain implementors, but certainly not to programmers.

Instead of arguing for the given problem, let me present a slightly modified example that brings out the inconsistency more clearly. Assume one has defined

int& f (int& a) { return a; }

a function that does nothing and returns its (lvalue) argument. Now modify the example to

n += f(++n) + 1;

Note that while some extra conditions about sequencing of function calls are given in the standard, this would at first glance not seem to effect the example, since there are no side effect at all from the function call (not even locally inside the function), as the incrementation happens in the argument expression for f, whose evaluation is not subject to those extra conditions. Indeed, let us apply the Crucial Argument for Undefined Behavior (CAUB), namely 5.17:7 which says that the behavior of such a compound assignment is equivalent to that of (in this case)

n = n + f(++n) + 1;

except that n is evaluated only once (an exception that makes no difference here). The evaluation of the statement I just wrote clearly has UB (the value computation of the first (prvalue) n in the RHS is unsequenced w.r.t. the side effect of the ++ operation, which involves the same scalar object (1.9:15) and you're dead).

So the evaluation of n += f(++n) + 1 has undefined behavior, right? Wrong! Read in 5.17:1 that

With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator. — end note ]

This language is far from as precise as I would like it to be, but I don't think it is a stretch to assume that "indeterminately-sequenced" should mean "with respect to that operation of a compound assignment". The (non normative, I know) note makes it clear that the lvalue-to-rvalue conversion is part of the operation of the compound assignment. Now is the call of f indeterminately-sequenced with respect to the operation of the compound assignment of +=? I'm unsure, because the 'sequenced' relation is defined for individual value computations and side effects, not complete evaluations of operators, which may involve both. In fact the evaluation of a compound assignment operator involves three items: the lvalue-to-rvalue conversion of its left operand, the side effect (the assignment proper), and the value computation of the compound assignment (which is sequenced after the side effect, and returns the original left operand as lvalue). Note that the existence of the lvalue-to-rvalue conversion is never explicitly mentioned in the standard except in the note cited above; in particular, the standard makes no (other) statement at all regarding its sequencing relative to other evaluations. It is pretty clear that in the example the call of f is sequenced before the side effect and value computation of += (since the call occurs in the value computation of the right operand to +=), but it might be indeterminately-sequenced with respect to the lvalue-to-rvalue conversion part. I recall from my question that since the left operand of += is an lvalue (and necessarily so), one cannot construe the lvalue-to-rvalue conversion to have occurred as part of the value computation of the left operand.

However, by the principle of the excluded middle, the call to f must either be indeterminately-sequenced with respect to the operation of the compound assignment of +=, or not indeterminately-sequenced with respect to it; in the latter case it must be sequenced before it because it cannot possibly be sequenced after it (the call of f being sequenced before the side effect of +=, and the relation being anti-symmetric). So first assume it is indeterminately-sequenced with respect to the operation. Then the cited clause says that w.r.t. the call of f the evaluation of += is a single operation, and the note explains that it means the call should not intervene between the lvalue-to-rvalue conversion and the side effect associated with +=; it should either be sequenced before both, or after both. But being sequenced after the side effect is not possible, so it should be before both. This makes (by transitivity) the side effect of ++ sequenced before the lvalue-to-rvalue conversion, exit UB. Next assume the call of f is sequenced before the operation of +=. Then it is in particular sequenced before the lvalue-to-rvalue conversion, and again by transitivity so is the side effect of ++; no UB in this branch either.

Conclusion: 5.17:1 contradicts 5.17:7 if the latter is taken (CAUB) to be normative for questions of UB resulting from unsequenced evaluations by 1.9:15. As I said CAUB is self-contradictory as well (by arguments indicated in the question), but this answer is getting to long, so I'll leave it at this for now.

Three problems, and two proposals for resolving them

Trying to understand what the standard writes about these matters, I distinguish three aspects in which the text is hard to interpret; they all are of a nature that the text is insufficiently clear about what model its statements are referring to. (I cite the texts at the end of the numbered items, since I do not know the markup to resume a numbered item after a quote)

  1. The text of 5.17:7 is of an apparent simplicity that, although the intention is easy to grasp, gives us little hold when applied to difficult situations. It makes a sweeping claim (equivalent behavior, apparently in all aspects) but whose application is thwarted by the exception clause. What if the behavior of E1 = E1 op E2 is undefined? Well then that of E1 op = E2 should be as well. But what if the UB was due to E1 being evaluated twice in E1 = E1 op E2? Then evaluating E1 op = E2 should presumably not be UB, but if so, then defined as what? This is like saying "the youth of the second twin was exactly like that of the first, except that he did not die at childbirth." Frankly, I think this text, which has little evolved since the C version "A compound assignment of the the form E1 op = E2 differs from the simple assignment expression E1 = E1 op E2 only in that the lvalue E1 is evaluated only once." might be adapted to match the changes in the standard.

    (5.17) 7 The behavior of an expression of the form E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.[...]

  2. It is not so clear what precisely the actions (evaluations) are between which the 'sequenced' relation is defined. It is said (1.9:12) that evaluation of an expression includes value computations and initiation of side effects. Though this appears to say that an evaluation may have multiple (atomic) components, the sequenced relation is actually mostly defined (e.g. in 1.9:14,15) for individual components, so that it might be better to read this as that the notion of "evaluation" encompasses both value computations and (initiation of) side effects. However in some cases the 'sequenced' relation is defined for the (entire) execution of an expression of statement (1.9:15) or for a function call (5.17:1), even though a passage in 1.9:15 avoids the latter by referring directly to executions in the body of a called function.

    (1.9) 12 Evaluation of an expression (or a sub-expression) in general includes both value computations (...) and initiation of side effects. [...] 13 Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread [...] 14 Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated. [...] 15 When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [...] Every evaluation in the calling function (including other function calls) ... is indeterminately sequenced with respect to the execution of the called function [...] (5.2.6, 5.17) 1 ... With respect to an indeterminately-sequenced function call, ...

  3. The text should more clearly acknowledge that a compound assignment involves, in contrast to a simple assignment, the action of fetching the value previously assigned to its left operand; this action is like lvalue-to-rvalue conversion, but does not happen as part of the value computation of that left operand, since it is not a prvalue; indeed it is a problem that 1.9:12 only acknowledges such action for prvalue evaluation. In particular the text should be more clear about which 'sequenced' relations are given for that action, if any.

    (1.9) 12 Evaluation of an expression... includes... value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation)

The second point is the least directly related to our concrete question, and I think it can be solved simply by choosing a clear point of view and reformulating pasages that seem to indicate a different point of view. Given that one of the main purposes of the old sequence points, and now the 'sequenced' relation, was to make clear that the side effect of postfix-increment operators is unsequenced w.r.t. to actions sequenced after the value computation of that operator (thus giving e.g. i = i++ UB), the point of view must be that individual value computations and (initiation of) individual side effects are "evaluations" for which "sequenced before" may be defined. For pragmatic reasons I would also include two more kinds of (trivial) "evaluations": function entry (so that the language of 1.9:15 may be simplified to: "When calling a function..., every value computation and side effect associated with any of its argument expressions, or with the postfix expression designating the called function, is sequenced before entry of that function") and function exit (so that any action in the function body gets by transitivity sequenced before anything that requires the function value; this used to be guaranteed by a sequence point, but the C++11 standard seems to have lost such guarantee; this might make calling a function ending with return i++; potentially UB where this is not intended, and used to be safe). Then one can also be clear about the "indeterminately sequenced" relation of functions calls: for every function call, and every evaluation that is not (directly or indirectly) part of evaluating that call, that evaluation shall be sequenced (either before or after) w.r.t. both entry and exit of that function call, and it shall have the same relation in both cases (so that in particular such external actions cannot be sequenced after function entry but before function exit, as is clearly desirable within a single thread).

Now to resolve points 1. and 3., I can see two paths (each affecting both points), which have different consequences for the defined or not behavior of our example:

Compound assignments with two operands, and three evaluations

Compound operations have thier two usual operands, an lvalue left operand and a prvalue right operand. To settle the unclarity of 3., it is included in 1.9:12 that fetching the value previously assigned to an object also may occur in compound assignments (rather than only for prvalue evaluation). The semantics of compount assignments are defined by changing 5.17:7 to

In a compound assignment op=, the value previously assigned to the object referred to by the left operand is fetched, the operator op is applied with this value as left operand and the right operand of op= as right operand, and the resulting value replaces that of the object referred to by the left operand.

(That gives two evaluations, the fetch and the side effect; a third evaluation is the trivial value computation of the compound operator, sequenced after both other evaluations.)

For clarity, state clearly in 1.9:15 that value computations in operands are sequenced before all value computations associated with the operator (rather than just those for the result of the operator), which ensures that evaluating the lvalue left operand is sequenced before fetching its value (one can hardly imagine otherwise), and also sequences the value computation of the right operand before that fetch, thus excluding UB in our example. While at it, I see no reason not to also sequence value computations in operands before any side effects associated with the operator (as they clearly must); this would make mentioning this explicitly for (compound) assignments in 5.17:1 superfluous. On the other hand do mention there that the value fetching in a compound assignment is sequenced before its side effect.

Compound assignments with three operands, and two evaluations

In order to obtain that the fetch in a compount assignment will be unsequenced with respect to the value computation of the right operand, making our example UB, the clearest way seems to be to give compound operators an implicit third (middle) operand, a prvalue, not represented by a separate expression, but obtained by lvalue-to-rvalue conversion from the left operand (this three-operand nature corresponds to the expanded form of compound assignments, but by obtaining the middle operand from the left operand, it is ensured that the value is fetched from the same object to which the result will be stored, a crucial guarantee that is only vaguely and implicitly given in the current formulation through the "except that E1 is evaluated only once" clause). The difference with the previous solution is that the fetch is now a genuine lvalue-to-rvalue conversion (since the middle operand is a prvalue) and is performed as part of the value computation of the operands to the compound assignment, which makes it naturally unsequenced with the value computation of the right operand. It should be stated somewhere (in a new clause that describes this implicit operand) that the value computation of the left operand is sequenced before this lvalue-to-rvalue conversion (it clearly must). Now 1.9:12 can be left as it is, and in place of 5.17:7 I propose

In a compound assignment op= with left operand a (an lvalue), and midlle and right operands brespectively c (both prvalues), the operator op is applied with b as left operand and c as right operand, and the resulting value replaces that of the object referred to by a.

(That gives one evaluation, the side effect, with as second evaluation the trivial value computation of the compound operator, sequenced after it.)

The still applicable changes to 1.9:15 and 5.17:1 suggested in the previous solution could still apply, but would not give our original example defined behavior. However the modified example at the top of this answer would still have defined behavior, unless the part 5.17:1 "compound assignment is a single operation" is scrapped or modified (there is a similar passage in 5.2.6 for postfix increment/decrement). The existence of those passages would suggest that detaching the fecth and store operations within a single compound assignement or postfix increment/decrement was not the intention of those who wrote the current standard (and by extension making our example UB), but this of course is mere guesswork.

Gynaeco answered 14/6, 2014 at 10:7 Comment(14)
I think your quote of 5.17/1 refers to 1.9/15 "Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function." So the lvalue-to-rvalue conversion of n is indeterminately sequenced wrt the execution of f's body. Does it follow that it's sequenced wrt the evaluation of the argument ++n? The Standard says that the evaluation of ++n is sequenced before the execution of the body of f.Whitmer
Another example: int i; /* global */ int& f() { return ++i; } int main() { i += f(); } I think that's one of the cases that shall be well-behaved (no UB). The result is ++i; i += i;Whitmer
@dyp: first comment: the question is whether the call of f is sequenced before the evaluation of +=. I tend to say it is (since the call is in an argument to +=), but I argue that whether or not, there is no UB either way. It is indeed curious that my argument should repose on a clause that probably does not apply here! The important thing is that the clause clearly makes the lvalue-to-rvalue conversion part of the evaluation of +=.Gynaeco
"which I repeat is not part of the value computation of the LHS, since that results in an lvalue": NO! The value computation of the LHS results in a prvalue. The evaluation of the whole compound assignment expression results in an lvalue, but that is a different evaluation.Olympus
@MarcvanLeeuwen What do you mean with "the evaluation of +="? The evaluation of the operands of any operator @ is sequenced before the side effect of that operator, is sequenced before the value computation of the expression that encloses/is formed by that operator L@R.Whitmer
@dyp, second comment: good example! Note that the "equivalent" expression i = i + f() suffers at least from unspecified behavior (the value of the LHS of + could be taken either before or after the call of f), if not undefined behavior (I'm not quite sure which). Either way it the behavior is not equivalent to that of i += f(), showiing that CAUB is untenable.Gynaeco
@dyp: I meant the operation of the compound assignment associated to +=, excluding the evaluations (value computations and side effect) of its arguments. This is clearly the meaning used in the cited "the operation of a compound assignment is a single evaluation".Gynaeco
@MWid: 5.17:1 "The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand...". No prvalue in the LHS of compound assignments, you're just mistaken. Nor could there be, because while lvalue-to-rvalue conversion is possible, the opposite (recovering an address from the value extracted from it) is impossible.Gynaeco
In i = i + f(), the execution of the body of f() (which contains ++i) is indeterminately sequenced wrt the value comp of the i in i + f() as per 5.17/1 -- so I agree this is different from i += f(), where the compound assignment guarantees that the body of f() is executed before the lvalue-to-rvalue conversion of i (since it cannot be sequenced after the side effect of +=). I would search for that in the CWG issues list and then send a mail to the Standard mailing list to ask if that's intended.Whitmer
Ad "the question is whether the call of f is sequenced before the evaluation of +=" -- 5.17/1 only gives us a guarantee about the assignment of a compound assignment. There's no guarantee about the compound part (the op of an op =). I think the guarantees here have to be drawn from the rewrite.Whitmer
@MarcvanLeeuwen No. The expression on the LHS must be a modifiable lvalue, to which the lvalue-to-rvalue conversion is applied. Hence the result of evaluating the LHS is a prvalue.Olympus
@MWid: ...and return an lvalue referring to the left operand. " Which would not be possible if what you say were true.Gynaeco
@MarcvanLeeuwen As I already said, the returned lvalue is the result of evaluating the whole compound assignment expression. "In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression." It's the result of the value computation of the assignment expression, which is different from the value computation of the left operand.Olympus
@MWid: "...return an lvalue referring to the left operand". Now I see, you are completely right. The result of the value computation of the assignment expression is different from the value computation of its left operand. How I could I be so stupid to not see that?Gynaeco
W
16

About the description of i = ++i + 1

I gather that the subtle explanation is that

(1) the expression ++i returns an lvalue but + takes prvalues as operands, so a conversion from lvalue to prvalue must be performed;

Probably, see CWG active issue 1642.

this involves obtaining the current value of that lvalue (rather than one more than the old value of i) and must therefore be sequenced after the side effect from the increment (i.e., updating i)

The sequencing here is defined for the increment (indirectly, via +=, see (a)): The side effect of ++ (the modification of i) is sequenced before the value computation of the whole expression ++i. The latter refers to computing the result of ++i, not to loading the value of i.

(2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of i; while this value computation is unsequenced w.r.t. the value computation of the RHS, this poses no problem

I don't think that's properly defined in the Standard, but I'd agree.

(3) the value computation of the assignment itself involves updating i (again),

The value computation of i = expr is only required when you use the result, e.g. int x = (i = expr); or (i = expr) = 42;. The value computation itself does not modify i.

The modification of i in the expression i = expr that happens because of the = is called the side effect of =. This side effect is sequenced before value computation of i = expr -- or rather the value computation of i = expr is sequenced after the side effect of the assignment in i = expr.

In general, the value computation of the operands of an expression are sequenced before the side effect of that expression, of course.

but is sequenced after the value computation of its RHS, and hence after the previous update to i; no problem.

The side effect of the assignment i = expr is sequenced after the value computation of the operands i (A) and expr of the assignment.

The expr in this case is a +-expression: expr1 + 1. The value computation of this expression is sequenced after the value computations of its operands expr1 and 1.

The expr1 here is ++i. The value computation of ++i is sequenced after the side effect of ++i (the modification of i) (B)

That's why i = ++i + 1 is safe: There's a chain of sequenced before between the value computation in (A) and the side effect on the same variable in (B).


(a) The Standard defines ++expr in terms of expr += 1, which is defined as expr = expr + 1 with expr being evaluated only once.

For this expr = expr + 1, we therefore have only one value computation of expr. The side effect of = is sequenced before the value computation of the whole expr = expr + 1, and it's sequenced after the value computation of the operands expr (LHS) and expr + 1 (RHS).

This corresponds to my claim that for ++expr, the side effect is sequenced before the value computation of ++expr.


About i += ++i + 1

Does the value computation of i += ++i + 1 involve undefined behavior?

Since the LHS of += is still an lvalue (and its RHS still a prvalue), the same reasoning as above applies as far as (1) and (2) are concerned; as for (3) the value computation of the += operator now must both fetch the current value of i, and then (obviously sequenced after it, even if the standard does not say so explicitly, or otherwise the execution of such operators would always invoke undefined behavior) perform the addition of the RHS and store the result back into i.

I think here's the problem: The addition of i in the LHS of i += to the result of ++i + 1 requires knowing the value of i - a value computation (which can mean loading the value of i). This value computation is unsequenced with respect to the modification performed by ++i. This is essentially what you say in your alternative description, following the rewrite mandated by the Standard i += expr -> i = i + expr. Here, the value computation of i within i + expr is unsequenced with respect to the value computation of expr. That's where you get UB.

Please note that a value computation can have two results: The "address" of an object, or the value of an object. In an expression i = 42, the value computation of the lhs "produces the address" of i; that is, the compiler needs to figure out where to store the rhs (under the rules of observable behaviour of the abstract machine). In an expression i + 42, the value computation of i produces the value. In the above paragraph, I was referring to the second kind, hence [intro.execution]p15 applies:

If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.


Another approach for i += ++i + 1

the value computation of the += operator now must both fetch the current value of i, and then [...] perform the addition of the RHS

The RHS being ++i + 1. Computing the result of this expression (the value computation) is unsequenced with respect to the value computation of i from the LHS. So the word then in this sentence is misleading: Of course, it must first load i and then add the result of the RHS to it. But there's no order between the side-effect of the RHS and the value computation to get the value of the LHS. For example, you could get for the LHS either the old or the new value of i, as modified by the RHS.

In general a store and a "concurrent" load is a data race, which leads to Undefined Behaviour.


Addressing the addendum

using a fictive ||| operator to designate unsequenced evaluations, one might try to define E op= F; (with int operands for simplicity) as equivalent to { int& L=E ||| int R=F; L = L + R; }, but then the example no longer has UB.

Let E be i and F be ++i (we don't need the + 1). Then, for i = ++i

int* lhs_address;
int lhs_value;
int* rhs_address;
int rhs_value;

    (         lhs_address = &i)
||| (i = i+1, rhs_address = &i, rhs_value = *rhs_address);

*lhs_address = rhs_value;

On the other hand, for i += ++i

    (         lhs_address = &i, lhs_value = *lhs_address)
||| (i = i+1, rhs_address = &i, rhs_value = *rhs_address);

int total_value = lhs_value + rhs_value;
*lhs_address = total_value;

This is intended to represent my understanding of the sequencing guarantees. Note that the , operator sequences all value computations and side effects of the LHS before those of the RHS. Parentheses do not affect sequencing. In the second case, i += ++i, we have a modification of i unsequenced wrt an lvalue-to-rvalue conversion of i => UB.

The standard does not treat compound assignments as second-class primitives for which no separate definition of semantics is necessary.

I would say that's a redundancy. The rewrite from E1 op = E2 to E1 = E1 op E2 also includes which expression types and value categories are required (on the rhs, 5.17/1 says something about the lhs), what happens to pointer types, the required conversions etc. The sad thing is that the sentence about "With respect to an.." in 5.17/1 is not in 5.17/7 as an exception of that equivalence.

In any way, I think we should compare the guarantees and requirements for compound assignment vs. simple assignment plus the operator, and see if there's any contradiction.

Once we put that "With respect to an.." also in the list of exceptions in 5.17/7, I don't think there's a contradiction.

As it turns out, as you can see in the discussion of Marc van Leeuwen's answer, this sentence leads to the following interesting observation:

int i; // global
int& f() { return ++i; }
int main() {
    i  = i + f(); // (A)
    i +=     f(); // (B)
}

It seems that (A) has an two possible outcomes, since the evaluation of the body of f is indeterminately sequenced with the value computation of the i in i + f().

In (B), on the other hand, the evaluation of the body of f() is sequenced before the value computation of i, since += must be seen as a single operation, and f() certainly needs to be evaluated before the assignment of +=.

Whitmer answered 12/6, 2014 at 21:17 Comment(22)
The broadly confirms my understanding of the matter. However you need to introduce "loading the value of i" as a separate operation, distinct from both the side effect and the value computation of the compound assignment (the latter two being clearly sequenced after the side effect of ++), and claim that it is unsequenced w.r.t. the side effect of ++. But the standard never mentions such neither-side-effect-nor-value-computation aspects of an evaluation, and an lvalue-to-rvalue conversion is hard to conjure up here. So the "unsequenced " claim is up in the air. See the additions to my Q.Gynaeco
@MarcvanLeeuwen The Standard does not define += itself as you know, so you have to rewrite the whole thing to i = i + ++i + 1, where we have the expression i + ++i to which 1.9/15 applies "Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced." (I'll later read your additions and will try to answer them in more detail - no time right now)Whitmer
As I have added in my question, there are (for me) strong arguments against using the equivalence as defining the entire semantics of compound assignment, although it is clearly the only indication we get for finding out which value is to be assigned. Notably, certain explicit statements about sequencing involved in compound assignment contradict what would be implicitly deduced from the equivalence, which makes me hesitant to atribute absolute authority to the equivalence.Gynaeco
("concurrently" == unsequenced) No, unsequenced means that a partial ordering relation can't be established. It may be concurrent but it doesn't have to be.Malefaction
@BlueMoon That's why I put it in quotation marks. It's certainly problematic if it indeed is concurrently, but UB could also mean that the optimizer does strange things.Whitmer
"The sad thing is that the sentence about "With respect to an.." in 5.17/1 is not in 5.17/7 as an exception of that equivalence." I don't think that this is a problem, since the "equivalence" explicitely states, that E1 is only evaluated once and 5.17p1 only talks about evaluations.Olympus
Thank you for addressing the addendum. I understand your analysis of "on the other hand", which arives as UB in more or less the same way as my second solution in the recent extension to my (already ridiculously long) self-answer. The essential point is lhs_value = *lhs_address which is grouped with preparing the left operand, though not currently part of the value computation of the left operand. My fictive "middle operand" is there precisely to make this "external to the compound assignement" nature explicit.Gynaeco
Your answer is so long, and you haven't really answered : is this UB, or not? You haven't made a clear statement anywhere.Leanneleanor
@BЈовић I've hidden it ;) somewhat intentionally; IMO it's not easy to understand, and the question seemed more directed towards an explanation than towards an answer. I read the question more as "Does it really produce UB when I explain it this way, but no UB when I try to reason about it that way?" than as "Is i += ++i + 1 UB?"Whitmer
CWG 1462 may count, but the analysis is wrong. There is no such totally unseuqnced "load" operation even allowed by the standard in the abstract machine semantics. Note the result is defined irrelevant to the "load operation". To acquire the value of operand can be lvalue-to-rvalue conversion on the left operand, but there is no side effect unless the left operand is volatile-qualified. And if it is only a value computation, it is sequenced before the assignment. This also means that simple rewriting of a compound assignment to corresponding simple assignment is not conforming in general.Replete
@Replete It's a bit hard for me to reason about this two years later. Which part of my answer are you referring to, especially with "There is no such totally unseuqnced "load" operation"?Whitmer
@Whitmer The "load" was mentioned in your answer for the example "i += ++i + 1".Replete
@Replete My choice of words was not the best, maybe because I intended this to be more of an answer to the OP then to be a general interpretation of the Standard. I have tried to clarify that part of my answer.Whitmer
@Whitmer However, different interpretations eventually lead to different final answers to the question: whether it is undefined behavior or not? You can't make the proof out of thin air, so your answer is questionable.Replete
@Whitmer Specifically, there is no word in the standard allows that lhs_value = *lhs_address occurs on any side of ||| in the example i += ++i. Once the identity of the left operand (as a glvalue, or lhs_address here) is determined, the value of lhs_value in your representation is determined at the same time, since the invariant *lhs_address == lhs_value always holds, before the real side effect of the assignment(*lhs_address = total_value). So even if the load should be considered like the abstract machine semantics, at least there can't be such an unsequenced one in that position.Replete
@Replete I agree that lhs_value = *lhs_address is not spelled out in the Standard, but it's implied as a value computation that uses the value of the object (there's a value computation of the LHS of i += ++i, and the value of the LHS is used). There does not need to be an actual load, as I understand this value computation using the value is semantically a "read from a memory location" in the sense of intro.races p2.Whitmer
@Whitmer [intro.races]/2 explicitly concerns with 2 expression evaluations, which is not applicable for single evaluation here.Replete
@Replete Sorry, I got confused while looking at two drafts of the Standard. What I wanted to refer to is the UB resulting from a value computation using the value and a modification of that object. To me, that is also included in intro.races, but that section is more concerned with actual concurrency (not sure about the expression evaluation, since a sub-expression is also an expression). But intro.execution p15 is I think the appropriate quote for what I wanted to refer to.Whitmer
@Whitmer This is just the thing precisely excluded from the exceptional rule of single evaluation. Otherwise, what is the intent of "evaluated only once"? And order concerned with concurrency is built on the very same base (happens before extends sequenced before relationship).Replete
@Replete The evaluation of the LHS of i += ++i (the LHS being i) happens only once, but that evaluation, as far as I understand it, is indeterminately sequenced with the modification done via ++i (prior to the new order rules in C++17). I'm sorry but I don't understand what you mean with that there's something excluded from the rule of single evaluation. The main intent of that rule AFAIK is to not modify something twice e.g. in *++p += 5.Whitmer
@Whitmer The single evaluation rules make all forms of assignment work similar, no matter it is += or =. There should be no difference between i += ++i and i = ++i in this aspect.Replete
@Whitmer Single evaluation rules (on +=, ++, etc) effectively make these expression can work with volatile-qualified lvalue without UB due to access same scalar more than once.Replete
O
5

The expression:

i += ++i + 1

does invoke undefined behavior. The language lawyer method requires us to go back to the defect report that results in:

i = ++i + 1 ;

becoming well defined in C++11, which is defect report 637. Sequencing rules and example disagree , it starts out saying:

In 1.9 [intro.execution] paragraph 16, the following expression is still listed as an example of undefined behavior:

i = ++i + 1;

However, it appears that the new sequencing rules make this expression well-defined

The logic used in the report is as follows:

  1. The assignment side-effect is required to be sequenced after the value computations of both its LHS and RHS (5.17 [expr.ass] paragraph 1).

  2. The LHS (i) is an lvalue, so its value computation involves computing the address of i.

  3. In order to value-compute the RHS (++i + 1), it is necessary to first value-compute the lvalue expression ++i and then do an lvalue-to-rvalue conversion on the result. This guarantees that the incrementation side-effect is sequenced before the computation of the addition operation, which in turn is sequenced before the assignment side effect. In other words, it yields a well-defined order and final value for this expression.

So in this question our problem changes the RHS which goes from:

++i + 1

to:

i + ++i + 1

due to draft C++11 standard section 5.17 Assignment and compound assignment operators which says:

The behavior of an expression of the form E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once. [...]

So now we have a situation where the computation of i in the RHS is not sequenced relative to ++i and so we then have undefined behavior. This follows from section 1.9 paragraph 15 which says:

Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations. —end note ] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

The pragmatic way to show this would be to use clang to test the code, which generates the following warning (see it live):

warning: unsequenced modification and access to 'i' [-Wunsequenced]
i += ++i + 1 ;
  ~~ ^

for this code:

int main()
{
    int i = 0 ;

    i += ++i + 1 ;
}

This is further bolstered by this explicit test example in clang's test suite for -Wunsequenced:

 a += ++a; 
Orms answered 13/6, 2014 at 1:37 Comment(11)
Glad you're here - the lvalue-to-rvalue thing is still quite nebulous to me. You might want to add to the last section that i + ++i is UB because of the general evaluations of operands are unsequenced in 1.9/15.Whitmer
I don't think clang is authoritative in its interpretation of the standard. Also I noted that with = in place of += it still gives a warning (though a slightly different one). As for the (more interesting) discussion near the end, note that UB here hinges on considering the equivalence given for compound assignments to be authoritative for all aspects (rather than just for determining the value ot be assigned) of compound assignments; as I have added to my question, there are problems with such an absolute interpretation.Gynaeco
@Whitmer I thought about adding that section after I went to bed. Reading defect report 637 several times really helped my understanding of the lvalue to rvalue conversion although it would be nice if they finally clarified this in the standard.Orms
@hvd evaluated once does not force it be sequenced after, it is would be left up the compiler since it is not specified. The LHS side is not sequenced after see bullet 1. taken from the defect report.Orms
@MarcvanLeeuwen if clang identified this as unsequenced then it is strongly indicative that this is really undefined, if I had left it at that I would be remiss but I did not. It is bug if they have it wrong though and if none of the answers prove to you that this is undefined then this should be a bug report and we can see what the clang language lawyers think.Orms
@MarcvanLeeuwen a compiler getting undefined behavior wrong could lead to some serious issues due to modern optimization and so they have a strong impetuous to get this correct and so this is an area I would be surprised to find bugs in but it definitely possible.Orms
@hvd: The LHS of = (or of += before equivalence) is not sequenced relative to (the side effect in) ++i. But since it only involves finding the address of i, no UB results from this unsequenced relation. However, in the expanded form there is an lvalue-to-rvalue conversion that gets the contents of i; if it is also unsequenced with ++i then UB results. It is certainly unsequenced in the expanded form i = i +(++i + 1), but if it is considered part of the operation of the += operator, which is sequenced after the value computation of its operands then no UB results.Gynaeco
@ShafikYaghmour: What you say is a convincing argument for not writing anything that invokes undefined behavior or might be mistakenly interpreted as such by a compiler; while there may be some pleasure in knowing the compiler was to blame, you're still left with the pieces. But this does not answer the question.Gynaeco
ShafikYaghmour In retrospect, I'm convinced you're right. Thanks @MarcvanLeeuwen for clearing my head. The only sensible interpretation of "evaluated once" does indeed require reading it as "evaluated once as an lvalue", and that does indeed leave enough unspecified to claim the behaviour of the complete expression is undefined.Ridiculous
@MarcvanLeeuwen I rearranged my answer so that the language lawyer answer is first to remove emphasis from the more controversial using the compiler approach.Orms
I'm not convinced by this answer. The lvalue-to-rvalue conversion, which is to get the rvalue of some expression, is unrelated to the implicit operand (r)value in a compound assignment. The latter is determined by the stored value of left operand, which is determined by the identity of the left operand of the assignment expression, in turn by the evaluation of it with no lvalue-to-rvalue conversion ever needed. To associate the stored value and the implicit operand value is even not a "computation" of any expression beyond the identification, so it is not subject to be "unsequenced".Replete
O
1

Yes, it is UB!

The evaluation of your expression

i += ++i + 1

proceeds in the following steps:

5.17p1 (C++11) states (emphases mine):

The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.

What does "value computation" mean?

1.9p12 gives the answer:

Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects.

Since your code uses a compound assignment operator, 5.17p7 tells us, how this operator behaves:

The behavior of an expression of the form E1 op= E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.

Hence the evaluation of the expression E1 ( == i) involves both, determining the identity of the object designated by i and an lvalue-to-rvalue conversion to fetch the value stored in that object. But the evaluation of the two operands E1 and E2 are not sequenced with respect to each other. Thus we get undefined behavior since the evaluation of E2 ( == ++i + 1) initiates a side effect (updating i).

1.9p15:

... If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.


The following statements in your question/comments seem to be the root of your misunderstanding:

(2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of i

fetching a value can be part of a prvalue evaluation. But in E += F the only prvalue is F so fetching the value of E is not part of the evaluation of the (lvalue) subexpression E

If an expression is an lvalue or rvalue doesn't tell anything about how this expression is to be evaluated. Some operators require lvalues as their operands some others require rvalues.

Clause 5p8:

Whenever a glvalue expression appears as an operand of an operator that expects a prvalue for that operand, the lvalue-to-rvalue (4.1), array-to-pointer (4.2), or function-to-pointer (4.3) standard conversions are applied to convert the expression to a prvalue.

In a simple assignment the evaluation of of the LHS only requires determining the identity of the object. But in a compound assignment such as += the LHS must be a modifiable lvalue, but the evaluation of the LHS in this case consists of determining the identity of the object and an lvalue-to-rvalue conversion. It is the result of this conversion (which is a prvalue) that is added to the result (also a prvalue) of the evaluation of the RHS.

"But in E += F the only prvalue is F so fetching the value of E is not part of the evaluation of the (lvalue) subexpression E"

That's not true as I explained above. In your example F is a prvalue expression, but F may as well be an lvalue expression. In that case, the lvalue-to-rvalue conversion is also applied to F. 5.17p7 as cited above tells us, what the semantics of the compound assignment operators are. The standard states that the behavior of E += F is the same as of E = E + F but E is only evaluated once. Here, the evaluation of E includes the lvalue-to-rvalue conversion, because the binary operator + requires it operands to be rvalues.

Olympus answered 13/6, 2014 at 13:45 Comment(4)
Actually I find that the arguments you supply strengthen the "not UB" point of view. Of course if you make the "E1 = E1 op E2" clause the ultimate arbiter of semantics, yuo get UB; my question says so much, qnd qlso why this extreme point of view has problems. But if you take that "expressions" must be actual expressions in your program, not copies of them created by the equivalence rule, then your 1.9p12 citation says evaluation of a (sub)expression includes values computations and side effects; fetching a value can be part of a prvalue evaluation. But in E += F the only prvalue is F...Gynaeco
... so fetching the value of E is not part of the evaluation of the (lvalue) subexpression E. It can only be part of the evaluation of the whole += formula (since the fictive E + F is not an actual (sub)expresssion of the program), and both parts of the += evaluation (side effect and value evaluation) are sequenced after the evaluation of both operands of += (which are the only true operands around; again the rvalue left operand of + in E + F is not an actual expression of the program).Gynaeco
5.17p7 tells us that both operands are evaluated resulting in two prvalues, which are then added resulting again in a prvalue, which is stored in the object designated by the left hand side lvalue expression of +=. That is the (operational) semantic of +=.Olympus
No. 5.17p7 does not tell you that the implicit operand of compound assignment is just the result of substitution of the whole evaluation result of the left operand. The evaluation of left operand is needed, but only used to identify the lvalue itself. To get the value stored in the scalar object corresponding to the lvalue is not in any form of evaluation (of an expression) defined by the language. The related operational semantics rules are not presented in the standard at all.Replete
G
0

There is no clear case for Undefined Behavior here

Sure, an argument leading to UB can be given, as I indicated in the question, and which has been repeated in the answers given so far. However this involves a strict reading of 5.17:7 that is both self-contradictory and in contradiction with explicit statements in 5.17:1 about compound assignment. With a weaker reading of 5.17:7 the contradictions disappear, as does the argument for UB. Whence my conclusion is neither that there is UB here, nor that there is clearly defined behaviour, but the the text of the standard is inconsistent, and should be modified to make clear which reading prevails (and I suppose this means a defect report should be written). Of course one might invoke here the fall-back clause in the standard (the note in 1.3.24) that evaluations for which the standard fails to define the behavior [unambiguously and self-consistently] are Undefined Behavior, but that would make any use of compound assignments (including prefix increment/decrement operators) into UB, something that might appeal to certain implementors, but certainly not to programmers.

Instead of arguing for the given problem, let me present a slightly modified example that brings out the inconsistency more clearly. Assume one has defined

int& f (int& a) { return a; }

a function that does nothing and returns its (lvalue) argument. Now modify the example to

n += f(++n) + 1;

Note that while some extra conditions about sequencing of function calls are given in the standard, this would at first glance not seem to effect the example, since there are no side effect at all from the function call (not even locally inside the function), as the incrementation happens in the argument expression for f, whose evaluation is not subject to those extra conditions. Indeed, let us apply the Crucial Argument for Undefined Behavior (CAUB), namely 5.17:7 which says that the behavior of such a compound assignment is equivalent to that of (in this case)

n = n + f(++n) + 1;

except that n is evaluated only once (an exception that makes no difference here). The evaluation of the statement I just wrote clearly has UB (the value computation of the first (prvalue) n in the RHS is unsequenced w.r.t. the side effect of the ++ operation, which involves the same scalar object (1.9:15) and you're dead).

So the evaluation of n += f(++n) + 1 has undefined behavior, right? Wrong! Read in 5.17:1 that

With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator. — end note ]

This language is far from as precise as I would like it to be, but I don't think it is a stretch to assume that "indeterminately-sequenced" should mean "with respect to that operation of a compound assignment". The (non normative, I know) note makes it clear that the lvalue-to-rvalue conversion is part of the operation of the compound assignment. Now is the call of f indeterminately-sequenced with respect to the operation of the compound assignment of +=? I'm unsure, because the 'sequenced' relation is defined for individual value computations and side effects, not complete evaluations of operators, which may involve both. In fact the evaluation of a compound assignment operator involves three items: the lvalue-to-rvalue conversion of its left operand, the side effect (the assignment proper), and the value computation of the compound assignment (which is sequenced after the side effect, and returns the original left operand as lvalue). Note that the existence of the lvalue-to-rvalue conversion is never explicitly mentioned in the standard except in the note cited above; in particular, the standard makes no (other) statement at all regarding its sequencing relative to other evaluations. It is pretty clear that in the example the call of f is sequenced before the side effect and value computation of += (since the call occurs in the value computation of the right operand to +=), but it might be indeterminately-sequenced with respect to the lvalue-to-rvalue conversion part. I recall from my question that since the left operand of += is an lvalue (and necessarily so), one cannot construe the lvalue-to-rvalue conversion to have occurred as part of the value computation of the left operand.

However, by the principle of the excluded middle, the call to f must either be indeterminately-sequenced with respect to the operation of the compound assignment of +=, or not indeterminately-sequenced with respect to it; in the latter case it must be sequenced before it because it cannot possibly be sequenced after it (the call of f being sequenced before the side effect of +=, and the relation being anti-symmetric). So first assume it is indeterminately-sequenced with respect to the operation. Then the cited clause says that w.r.t. the call of f the evaluation of += is a single operation, and the note explains that it means the call should not intervene between the lvalue-to-rvalue conversion and the side effect associated with +=; it should either be sequenced before both, or after both. But being sequenced after the side effect is not possible, so it should be before both. This makes (by transitivity) the side effect of ++ sequenced before the lvalue-to-rvalue conversion, exit UB. Next assume the call of f is sequenced before the operation of +=. Then it is in particular sequenced before the lvalue-to-rvalue conversion, and again by transitivity so is the side effect of ++; no UB in this branch either.

Conclusion: 5.17:1 contradicts 5.17:7 if the latter is taken (CAUB) to be normative for questions of UB resulting from unsequenced evaluations by 1.9:15. As I said CAUB is self-contradictory as well (by arguments indicated in the question), but this answer is getting to long, so I'll leave it at this for now.

Three problems, and two proposals for resolving them

Trying to understand what the standard writes about these matters, I distinguish three aspects in which the text is hard to interpret; they all are of a nature that the text is insufficiently clear about what model its statements are referring to. (I cite the texts at the end of the numbered items, since I do not know the markup to resume a numbered item after a quote)

  1. The text of 5.17:7 is of an apparent simplicity that, although the intention is easy to grasp, gives us little hold when applied to difficult situations. It makes a sweeping claim (equivalent behavior, apparently in all aspects) but whose application is thwarted by the exception clause. What if the behavior of E1 = E1 op E2 is undefined? Well then that of E1 op = E2 should be as well. But what if the UB was due to E1 being evaluated twice in E1 = E1 op E2? Then evaluating E1 op = E2 should presumably not be UB, but if so, then defined as what? This is like saying "the youth of the second twin was exactly like that of the first, except that he did not die at childbirth." Frankly, I think this text, which has little evolved since the C version "A compound assignment of the the form E1 op = E2 differs from the simple assignment expression E1 = E1 op E2 only in that the lvalue E1 is evaluated only once." might be adapted to match the changes in the standard.

    (5.17) 7 The behavior of an expression of the form E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.[...]

  2. It is not so clear what precisely the actions (evaluations) are between which the 'sequenced' relation is defined. It is said (1.9:12) that evaluation of an expression includes value computations and initiation of side effects. Though this appears to say that an evaluation may have multiple (atomic) components, the sequenced relation is actually mostly defined (e.g. in 1.9:14,15) for individual components, so that it might be better to read this as that the notion of "evaluation" encompasses both value computations and (initiation of) side effects. However in some cases the 'sequenced' relation is defined for the (entire) execution of an expression of statement (1.9:15) or for a function call (5.17:1), even though a passage in 1.9:15 avoids the latter by referring directly to executions in the body of a called function.

    (1.9) 12 Evaluation of an expression (or a sub-expression) in general includes both value computations (...) and initiation of side effects. [...] 13 Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread [...] 14 Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated. [...] 15 When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [...] Every evaluation in the calling function (including other function calls) ... is indeterminately sequenced with respect to the execution of the called function [...] (5.2.6, 5.17) 1 ... With respect to an indeterminately-sequenced function call, ...

  3. The text should more clearly acknowledge that a compound assignment involves, in contrast to a simple assignment, the action of fetching the value previously assigned to its left operand; this action is like lvalue-to-rvalue conversion, but does not happen as part of the value computation of that left operand, since it is not a prvalue; indeed it is a problem that 1.9:12 only acknowledges such action for prvalue evaluation. In particular the text should be more clear about which 'sequenced' relations are given for that action, if any.

    (1.9) 12 Evaluation of an expression... includes... value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation)

The second point is the least directly related to our concrete question, and I think it can be solved simply by choosing a clear point of view and reformulating pasages that seem to indicate a different point of view. Given that one of the main purposes of the old sequence points, and now the 'sequenced' relation, was to make clear that the side effect of postfix-increment operators is unsequenced w.r.t. to actions sequenced after the value computation of that operator (thus giving e.g. i = i++ UB), the point of view must be that individual value computations and (initiation of) individual side effects are "evaluations" for which "sequenced before" may be defined. For pragmatic reasons I would also include two more kinds of (trivial) "evaluations": function entry (so that the language of 1.9:15 may be simplified to: "When calling a function..., every value computation and side effect associated with any of its argument expressions, or with the postfix expression designating the called function, is sequenced before entry of that function") and function exit (so that any action in the function body gets by transitivity sequenced before anything that requires the function value; this used to be guaranteed by a sequence point, but the C++11 standard seems to have lost such guarantee; this might make calling a function ending with return i++; potentially UB where this is not intended, and used to be safe). Then one can also be clear about the "indeterminately sequenced" relation of functions calls: for every function call, and every evaluation that is not (directly or indirectly) part of evaluating that call, that evaluation shall be sequenced (either before or after) w.r.t. both entry and exit of that function call, and it shall have the same relation in both cases (so that in particular such external actions cannot be sequenced after function entry but before function exit, as is clearly desirable within a single thread).

Now to resolve points 1. and 3., I can see two paths (each affecting both points), which have different consequences for the defined or not behavior of our example:

Compound assignments with two operands, and three evaluations

Compound operations have thier two usual operands, an lvalue left operand and a prvalue right operand. To settle the unclarity of 3., it is included in 1.9:12 that fetching the value previously assigned to an object also may occur in compound assignments (rather than only for prvalue evaluation). The semantics of compount assignments are defined by changing 5.17:7 to

In a compound assignment op=, the value previously assigned to the object referred to by the left operand is fetched, the operator op is applied with this value as left operand and the right operand of op= as right operand, and the resulting value replaces that of the object referred to by the left operand.

(That gives two evaluations, the fetch and the side effect; a third evaluation is the trivial value computation of the compound operator, sequenced after both other evaluations.)

For clarity, state clearly in 1.9:15 that value computations in operands are sequenced before all value computations associated with the operator (rather than just those for the result of the operator), which ensures that evaluating the lvalue left operand is sequenced before fetching its value (one can hardly imagine otherwise), and also sequences the value computation of the right operand before that fetch, thus excluding UB in our example. While at it, I see no reason not to also sequence value computations in operands before any side effects associated with the operator (as they clearly must); this would make mentioning this explicitly for (compound) assignments in 5.17:1 superfluous. On the other hand do mention there that the value fetching in a compound assignment is sequenced before its side effect.

Compound assignments with three operands, and two evaluations

In order to obtain that the fetch in a compount assignment will be unsequenced with respect to the value computation of the right operand, making our example UB, the clearest way seems to be to give compound operators an implicit third (middle) operand, a prvalue, not represented by a separate expression, but obtained by lvalue-to-rvalue conversion from the left operand (this three-operand nature corresponds to the expanded form of compound assignments, but by obtaining the middle operand from the left operand, it is ensured that the value is fetched from the same object to which the result will be stored, a crucial guarantee that is only vaguely and implicitly given in the current formulation through the "except that E1 is evaluated only once" clause). The difference with the previous solution is that the fetch is now a genuine lvalue-to-rvalue conversion (since the middle operand is a prvalue) and is performed as part of the value computation of the operands to the compound assignment, which makes it naturally unsequenced with the value computation of the right operand. It should be stated somewhere (in a new clause that describes this implicit operand) that the value computation of the left operand is sequenced before this lvalue-to-rvalue conversion (it clearly must). Now 1.9:12 can be left as it is, and in place of 5.17:7 I propose

In a compound assignment op= with left operand a (an lvalue), and midlle and right operands brespectively c (both prvalues), the operator op is applied with b as left operand and c as right operand, and the resulting value replaces that of the object referred to by a.

(That gives one evaluation, the side effect, with as second evaluation the trivial value computation of the compound operator, sequenced after it.)

The still applicable changes to 1.9:15 and 5.17:1 suggested in the previous solution could still apply, but would not give our original example defined behavior. However the modified example at the top of this answer would still have defined behavior, unless the part 5.17:1 "compound assignment is a single operation" is scrapped or modified (there is a similar passage in 5.2.6 for postfix increment/decrement). The existence of those passages would suggest that detaching the fecth and store operations within a single compound assignement or postfix increment/decrement was not the intention of those who wrote the current standard (and by extension making our example UB), but this of course is mere guesswork.

Gynaeco answered 14/6, 2014 at 10:7 Comment(14)
I think your quote of 5.17/1 refers to 1.9/15 "Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function." So the lvalue-to-rvalue conversion of n is indeterminately sequenced wrt the execution of f's body. Does it follow that it's sequenced wrt the evaluation of the argument ++n? The Standard says that the evaluation of ++n is sequenced before the execution of the body of f.Whitmer
Another example: int i; /* global */ int& f() { return ++i; } int main() { i += f(); } I think that's one of the cases that shall be well-behaved (no UB). The result is ++i; i += i;Whitmer
@dyp: first comment: the question is whether the call of f is sequenced before the evaluation of +=. I tend to say it is (since the call is in an argument to +=), but I argue that whether or not, there is no UB either way. It is indeed curious that my argument should repose on a clause that probably does not apply here! The important thing is that the clause clearly makes the lvalue-to-rvalue conversion part of the evaluation of +=.Gynaeco
"which I repeat is not part of the value computation of the LHS, since that results in an lvalue": NO! The value computation of the LHS results in a prvalue. The evaluation of the whole compound assignment expression results in an lvalue, but that is a different evaluation.Olympus
@MarcvanLeeuwen What do you mean with "the evaluation of +="? The evaluation of the operands of any operator @ is sequenced before the side effect of that operator, is sequenced before the value computation of the expression that encloses/is formed by that operator L@R.Whitmer
@dyp, second comment: good example! Note that the "equivalent" expression i = i + f() suffers at least from unspecified behavior (the value of the LHS of + could be taken either before or after the call of f), if not undefined behavior (I'm not quite sure which). Either way it the behavior is not equivalent to that of i += f(), showiing that CAUB is untenable.Gynaeco
@dyp: I meant the operation of the compound assignment associated to +=, excluding the evaluations (value computations and side effect) of its arguments. This is clearly the meaning used in the cited "the operation of a compound assignment is a single evaluation".Gynaeco
@MWid: 5.17:1 "The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand...". No prvalue in the LHS of compound assignments, you're just mistaken. Nor could there be, because while lvalue-to-rvalue conversion is possible, the opposite (recovering an address from the value extracted from it) is impossible.Gynaeco
In i = i + f(), the execution of the body of f() (which contains ++i) is indeterminately sequenced wrt the value comp of the i in i + f() as per 5.17/1 -- so I agree this is different from i += f(), where the compound assignment guarantees that the body of f() is executed before the lvalue-to-rvalue conversion of i (since it cannot be sequenced after the side effect of +=). I would search for that in the CWG issues list and then send a mail to the Standard mailing list to ask if that's intended.Whitmer
Ad "the question is whether the call of f is sequenced before the evaluation of +=" -- 5.17/1 only gives us a guarantee about the assignment of a compound assignment. There's no guarantee about the compound part (the op of an op =). I think the guarantees here have to be drawn from the rewrite.Whitmer
@MarcvanLeeuwen No. The expression on the LHS must be a modifiable lvalue, to which the lvalue-to-rvalue conversion is applied. Hence the result of evaluating the LHS is a prvalue.Olympus
@MWid: ...and return an lvalue referring to the left operand. " Which would not be possible if what you say were true.Gynaeco
@MarcvanLeeuwen As I already said, the returned lvalue is the result of evaluating the whole compound assignment expression. "In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression." It's the result of the value computation of the assignment expression, which is different from the value computation of the left operand.Olympus
@MWid: "...return an lvalue referring to the left operand". Now I see, you are completely right. The result of the value computation of the assignment expression is different from the value computation of its left operand. How I could I be so stupid to not see that?Gynaeco
M
0

From the compiler writer's perspective, they don't care about "i += ++i + 1", because whatever the compiler does, the programmer may not get the correct result, but they surely get what they deserve. And nobody writes code like that. What the compiler writer cares about is

*p += ++(*q) + 1;

The code must read *p and *q, increase *q by 1, and increase *p by some amount that is calculated. Here the compiler writer cares about restrictions on the order of read and write operations. Obviously if p and q point to different objects, the order makes no difference, but if p == q then it will make a difference. Again, p will be different from q unless the programmer writing the code is insane.

By making the code undefined, the language allows the compiler to produce the fastest possible code without caring for insane programmers. By making the code defined, the language forces the compiler to produce code that conforms to the standard even in insane cases, which may make it run slower. Both compiler writers and sane programmers don't like that.

So even if the behaviour is defined in C++11, it would be very dangerous to use it, because (a) a compiler might not be changed from C++03 behaviour, and (b) it might be undefined behaviour in C++14, for the reasons above.

Minuscule answered 9/7, 2014 at 10:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.