Is (x++, y) + (y++, x) undefined or unspecified, and if unspecified, what can it compute?
Asked Answered
M

4

1

The comma sequence operator introduces a sequence point in an expression. I am wondering whether this means that the program below avoids undefined behavior.

int x, y;

int main()
{
  return (x++, y) + (y++, x);
}

If it does avoid undefined behavior, it could still be unspecified, that is, return one of several possible values. I would think that in C99, it can only compute 1, but actually, various versions of GCC compile this program into an executable that returns 2. Clang generates an executable that returns 1, apparently agreeing with my intuition.

Lastly, is this something that changed in C11?

Marilee answered 18/12, 2012 at 15:12 Comment(7)
It's still undefined because the two operands of + are evaluated in parallel, so to speak.Threecornered
@Threecornered How does the “the two operand of + are evaluated in parallel” explanation not apply to f() + g() (which the consensus is that it is not undefined #3951517 )?Marilee
I'm not sure how to interpret this. "If an attempt is made to modify the result of a comma operator or to access it after the next sequence point, the behavior is undefined."Charger
@jsn I have never been quite sure what his sentence was about, but I thought that it was concerned with expressions whose value is an allocated object, e.g. f() when f() returns a struct. For what it's worth, GCC compiles (x++, y+1) + (y++, x+1) into 4, and surely that cannot be excused by this sentence.Marilee
@PascalCuoq The consensus there is that the order is unspecified, not undefined because there's a sequence point before each function call and function calls count as side effects, i.e. function executions don't interleave with each other.Threecornered
The question is, in e1 + e2, are the evaluations of e1 and e2 unsequenced or indeterminately sequenced? If unsequenced, it's undefined behaviour, if indeterminately sequenced, merely unspecified.Obviate
@Threecornered But how are the sequence points before function calls different from the sequence points in, say, (0, x++, y+1) + (0, y++, x+1)? (which GCC still evaluates as 4)Marilee
T
6

Take the expression:

(x++, y) + (y++, x)

Evaluate left-to-right:

x++  // yield 0, schedule increment of x
,    // sequence point: x definitely incremented now
y    // yield 0
y++  // yield 0, schedule increment of y
// explode because you just read from y and wrote to y
// with no intervening sequence point

There's nothing in the standard that forbids this, so the whole thing has undefined behavior.

Contrast this pseudocode:

f() { return x++, y; }
g() { return y++, x; }
f() + g()

Acoording to C99 (5.1.2.3/2) the calls to f and g themselves count as side effects, and the function call operator contains a sequence point just before it enters a function. This means function executions can't interleave.

Under the "evaluate things in parallel" model:

f()  // arbitrarily start with f: sequence point; enter f
g()  // at the same time, start calling g: sequence point

Since the execution of f counts as a side effect itself, the sequence point in g() suspends execution until f has returned. Thus, no undefined behavior.

Threecornered answered 18/12, 2012 at 15:59 Comment(1)
I understand what you meant in the comment now. Thanks for a good answer.Marilee
I
5

The whole chapter 6.5 of the standard mentions the order of evaluation, on operator basis. The best summary of the order of evaluation I could found was the (non-normative) Annex J of the standard:

C11 Annex J

J.1 Unspecified behavior

  • The order in which subexpressions are evaluated and the order in which side effects take place, except as specified for the function-call (), &&, ||, ?:, and comma operators (6.5)

In your example, you cannot know whether the sub-expression (x++, y) or (y++, x) is evaluated first, since the order of evaluation of the operands of the + operator is unspecified.

As for undefined behavior, the comma operator solved nothing. If (x++, y) is evaluated first, then y may get evaluated immediately before y++ of the other sub-expression. Since y is accessed twice without a sequence point in between, for other purposes than to determine the value to store, the behavior is undefined. More info here.

So your program has undefined and unspecified behavior both.

(In addition, it has implementation-defined behavior since int main(), rather than int main (void) is not one of the well-defined versions of main in a hosted application.)

Imponderabilia answered 18/12, 2012 at 16:13 Comment(2)
If your remark is that y may get scheduled next to y++, then let us consider the expression (0, x++, y+1) + (0, y++, x+1). GCC still evaluates this to 4.Marilee
@PascalCuoq I don't think that expression is UB, though you still have an unspecified order of evaluation of the sub-expressions. You have 4 sub-expressions here, nested into each other. The expression is equivalent to ((0, x++), y+1) + ((0, y++), x+1) where each parenthesis is a sub-expression. I suppose GCC may evaluate it like this: 0, x++ = 1, 0, y++= 1, (1, 1+1) + (1, 1+1) = 2 + 2= 4.Imponderabilia
C
1

My understanding is that any of the following evaluation orders are legal:

  • x++, y++, y, x
  • x++, y, y++, x
  • x++, y++, x, y
  • y++, x, x++, y
  • ...

But not:

  • y, x++, x, y++
  • ...

In my understanding, the only ordering that is required is that (x++) must come before (y), and (y++) must come before (x).

Therefore, I believe that this is still undefined behavior.

Cerebritis answered 18/12, 2012 at 15:26 Comment(2)
If your explanation is that the compiler can choose one of several orders for computations, then it is an explanation of why my program is unspecified behavior. Undefined behavior on the other hand allows the compiler to do anything (nasal demons), like when the program contravenes 6.5:2 (“Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression”). Which I am not sure my program does.Marilee
The listed orders of evaluation are correct, but the 5th one is not allowed because of operator precedence, rather than the order of evaluation of sub-expressions. As part of the precedence rules of the standard, the comma operator in itself must always be evaluated from left to right. And no, neither of this proves that the code in the original post is undefined behavior.Imponderabilia
M
1

The standard explicitly declares which operators introduce sequence points, + is not among them. So your expression can well be evaluated in a way that the forbidden subexpressions are evaluated "close" to each other, without a sequence point in between them.

On a pragmatical base, there is a reason for this interdiction, namely that the evaluation of the operands of such an operation may be scheduled in parallel, e.g if the processor has several pipelines for the operation of the subexpressions in question. You can easily convince yourself that under such a model of evaluation, anything can happen, the result is not predictable.

In your toy example, the conflict is predictable, but this wouldn't necessarily be the case if you would use some pointer indirection in the subexpressions. Then you wouldn't know if one of the subexpressions would potentially alias the other. So basically not much that C can do if it wants to allow for this type of optimization.

Messalina answered 18/12, 2012 at 16:54 Comment(2)
Everyone here, included me, agrees that the intention is to allow efficient compilation. I am only arguing that the C99 standard is not doing a good job of actually stating that an expression such as (0, x++, y+1) + (0, y++, x+1) is undefined. Clauses such as 6.5:2 or 6.5:3 do not explicitly say this expression is undefined. They do not use the words “in parallel”, and they place function calls and , at the same level. This may have changed in C11, but I am not familiar with it yet.Marilee
@PascalCuoq, C11 is a bit more precise on the model for sequencing operations. But the concept of data race is only introduced for thread parallelism, and it is not completely clear to me if the thread concept is relevant, here. The evaluation of the two subexpressions could be seen as two different threads of execution, but I don't know if this is pushing things too far.Messalina

© 2022 - 2024 — McMap. All rights reserved.