Sequence points and partial order
Asked Answered
C

3

23

A few days back there was a discussion here about whether the expression

i = ++i + 1

invokes UB (Undefined Behavior) or not.

Finally the conclusion was made that it invokes UB as the value of 'i' is changing more than once between two sequence points.

I was involved in a discussion with Johannes Schaub in that same thread. According to him

i=(i,i++,i)+1 ------ (1) /* invokes UB as well */

I said (1) does not invoke UB because the side effects of the previous subexpressions are cleared by the comma operator ',' between i and i++ and between i++ and i.

Then he gave the following explanation:

"Yes the sequence point after i++ completes all side effects before it, but there is nothing that stops the assignment side effect overlapping with the side effect of i++.The underlying problem is that the side effect of an assignment is not specified to happen after or before the evaluation of both operands of the assignment, and so sequence points cannot do anything with regard to protecting this: Sequence points induce a partial order: Just because there is a sequence point after and before i++ doesn't mean all side effects are sequenced with regard to i.

Also, notice that merely a sequence point means nothing: The order of evaluations isn't dictated by the form of code. It's dictated by semantic rules. In this case, there is no semantic rule saying when the assignment side effect happens with regard to evaluating both of its operands or subexpressions of those operands".

The statement written in "bold" confused me. As far as I know:

"At certain specified points in the execution sequence called sequence points,all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place."

Since,comma operators also specify execution order the side effect of i++ have been cancelled when we reach the last i.He(Johannes) would have been right had the order of evaluation been not specified(but in case of comma operator it is well specified).

So I just want to know whether (1) invokes UB or not?. Can someone give another valid explanation?

Thanks!

Caftan answered 13/12, 2009 at 8:29 Comment(5)
I think I'm with litb on ths, but I have to ask "Why do you care?" No C or C++ programmer worth his or her salt writes code like that.Forebode
Frankly speaking I dont't care (I know that these sort of expressions should be avoided while writing a C/C++ code)but this was just to clear my doubts.No offense meant. :-)Caftan
Details like this are esoteric enough that frankly I wouldn't trust the compiler writers to have gotten them right either.Hardan
I'd agree - this question hopefully has little real utility. But sometimes one is just curious about these details, and there's not necessarily any harm in that (unless someone goes off and starts writing code like the examples).Mcneese
Yeah, why not stick to writing simpler, cleaner code like int i=0,j; j=i+(i++); Stroustrup's easy-to-remember rule (not more than one write to a variable) says its ok. (Just joking)Higginson
M
11

The C standard says this about assignment operators (C90 6.3.16 or C99 6.5.16 Assignment operators):

The side effect of updating the stored value of the left operand shall occur between the previous and the next sequence point.

It seems to me that in the statement:

i=(i,i++,i)+1;

the sequence point 'previous' to the assignment operator would be the second comma operator and the 'next' sequence point would be the end of the expression. So I'd say that the expression doesn't invoke undefined behavior.

However, this expression:

*(some_ptr + i) = (i,i++,i)+1;

would have undefined behavior because the order of evaluation of the 2 operands of the assignment operator is undefined, and in this case instead of the problem being when the assignment operator's side effect takes place, the problem is you don't know whether the value of i used in the left handle operand will be evaluated before or after the right hand side. This order of evaluation problem doesn't occur in the first example because in that expression the value of i isn't actually used in the left-hand side - all that the assignment operator is interested in is the "lvalue-ness" of i.

But I also think that all this is sketchy enough (and my understanding of the nuances involved are sketchy enough) that I wouldn't be surprised if someone can convince me otherwise (on either count).

Mcneese answered 13/12, 2009 at 8:52 Comment(6)
yeah *(some_ptr + i) = (i,i++,i)+1; is indeed UB .Its behavior is similar to a[i]=++i, which also invokes UBCaftan
That's true, the second comma introduces a sequence point between the increment and the last read of i, which are both ordered before the assignment to i. Since there's no possibility to perform an operation involving a read after an operation involving a write without a sequence point, the expression is well-defined. BTW i=(i++,i++,i)+1 would also be ok (the first increment is ordered before the second one, and the rest is as in the original example). This matter is quite clearly discussed in <open-std.org/JTC1/SC22/wg14/www/docs/n926.htm>.Higginson
@Higginson - thanks for the pointer to N926; I'll have to give it a few days to be able to read more carefully.Mcneese
I've just noticed this thread on the screen of my colleague :) "the sequence point 'previous' to the assignment operator would be the second comma operator" -> to me, the "previous" sequence point is the sequence point prior to the assignment expression. That is, in "a; a = (b, c); d" the "previous sequence point" is the first ";" and the next sequence point is the second ";". I'm not at all sure though. I believe in C++0x it is definitely not undefined behavior though :)Ho
I've just recalled the very long thread on usenet about this :) (see Michael Foukarakis' link below). We concluded back then it's fine in C1x and C++0x but we didn't come to a conclusion about C99/C89 and C++03, and i still think they make this undefined. But in any way, since i now remember that thread was very long and we didn't come to a clear result about pre-C++0x, i don't think it makes sense to rediscuss it in here :)Ho
@litb: I definitely wouldn't claim to be certain about any of this. As alluded to in the comments to the question - this is mostly academic stuff, and to be honest, too much thinking about sequence points starts to hurt my head (a sign that I'm learning something?). I may have to start monitoring comp.lang.c, etc. on usenet, to try and keep better up to date on the evolution (and history) of the standards, but that'll have to wait for some other stuff to settle down in real-life.Mcneese
J
4

i=(i,i++,i)+1 ------ (1) /* invokes UB as well */

It does not invoke undefined behaviour. The side effect of i++ will take place before the evaluation of the next sequence point, which is denoted by the comma following it, and also before the assignment.

Nice language sudoku, though. :-)

edit: There's a more elaborate explanation here.

Jackofalltrades answered 14/12, 2009 at 8:29 Comment(9)
That's what I tried to tell litb, but he is still not convinced.Caftan
Well, if the standard won't convince him, I'm not sure exactly what will.Jackofalltrades
BTW, is there some standards guy who is actually responsible for the [sequence point] problem wording / semantics, so that we could just email him a nice [community?:)] letter asking for a)rationale, b)reasoning algorithms, c)reading advice? That would be handy..Higginson
@mlvjir: Look at this : groups.google.co.in/group/comp.lang.c/browse_thread/thread/…Caftan
@Prasoon Thanks, I've already followed that link yesterday. But who's the guy there (I mean, the authority on C / seq. pt / standard)?Higginson
@mlvljr: I think his name is Peter Seebach(aka Seebs), you might find some of his posts in that thread. :)Caftan
Seems he's not actually into the standard things - i.e. the guy says he's unsure about some seq. pt aspects. BTW there's is an official document addressing (between other) this matter - <open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf>Higginson
Please correct me if i'm wrong. That thread did merely find out it's valid in C1x and C++0x. However they did not prove that it doesn't invoke undefined behavior in C++03 or C89/C99Ho
The rationale and proof in that thread also hold for C99. I'm not 100% certain about C89, but I would expect the same to be true for it as well.Jackofalltrades
N
3

I believe that the following expression definitely has undefined behaviour.

i + ((i, i++, i) + 1)

The reason is that the comma operator specifies sequence points between the subexpressions in parentheses but does not specify where in that sequence the evaluation of the left hand operand of + occurs. One possibility is between the sequence points surrounding i++ and this violates the 5/4 as i is written to between two sequence points but is also read twice between the same sequence points and not just to determine the value to be stored but also to determine the value of the first operand to the + operator.

This also has undefined behaviour.

i += (i, i++, i) + 1;

Now, I am not so sure about this statement.

i = (i, i++, i) + 1;

Although the same principals apply, i must be "evaluated" as a modifiable lvalue and can be done so at any time, but I'm not convinced that its value is ever read as part of this. (Or is there another restriction that the expression violates to cause UB?)

The sub-expression (i, i++, i) happens as part of determining the value to be stored and that sub-expression contains a sequence point after the storage of a value to i. I don't see any way that this wouldn't require the side effect of i++ to be complete before the determination of the value to be stored and hence the earliest possible point that the assignment side effect could occur.

After this sequnce point i's value is read at most once and only to determine the value that will be stored back to i, so this last part is fine.

Nonjuror answered 13/12, 2009 at 10:30 Comment(8)
UB just won't guarantee that something will work as expected right? Because all of those statements compile correctly and don't crash (and I have tested them many times).Polloch
If something has UB it means that there isn't really an "as expected" behaviour. The expression might do what you want, but there's no guarantee that there isn't a conforming implementation somewhere were something completely different happens.Nonjuror
According to the semantics of seq. points and friends assigning does not involve reading, i.e. int a, b, c; a=b=c=1; is well-defined since there are not only no sqe. points but also no reads here (c=1 expression a)performs a write to variable c and b)returns the value which was assigned, and there would be no need to actually "read" it (in any sense) ever). The algorotm to reason about sequence points and well (or ill) -defined is covered in <open-std.org/JTC1/SC22/wg14/www/docs/n926.htm>Higginson
The paper that you link to is very interesting and I think that it supports my view, however it's not normative and I have to say that I'm not completely convinced that the analysis approach described can be derived directly from the wording of the standard.Nonjuror
The paper actually deals with the C language, embodies the committee's members view of the problem and states (in the last part): "The rules listed in section 5 is an interpretation of the Standard. We want an interpretation that gives a set of well-defined expressions matching our expectations (as programmers). In this respect, the Standard is unclear in the areas of floating-point flags, volatile and signal handling. The Standard is also not adequate in describing function calls. Even though DR087 suggested function calls do not overlap, the Standard does not explicitly talk about this. "Higginson
.. [only 2 chars were left] so the [C language] standard seems just to be unclear (as probably the C++ one). At least, has anyone formally proved consistency of any of them? :)Higginson
@Charles - regarding "i must be evaluated as a modifiable lvalue": the standard says, "Except when it is ... the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue)." This implies that the left operand of the assignment operator isn't converted to the value stored, so the value isn't 'read'. The operand remains an lvalue, which is an object type (it designates the object).Mcneese
Yes, my slightly over-verbose langauge was, perhaps, misleading in this case but I was trying to catch all posibilities such as the left hand operand being a function call returning a refernce or a cast operation indirectly calling a user-defined conversion to a reference type. It's the lvalue that's result of this evaluation whose value isn't read, is what I think I was trying to say.Nonjuror

© 2022 - 2024 — McMap. All rights reserved.