Which of the following combinations of post & pre-increment operators have undefined behaviour in C?
Asked Answered
A

7

5

I've read, Could anyone explain these undefined behaviors (i = i++ + ++i , i = i++, etc...) and tried understanding Sequence points on "comp.lang.c FAQ" after wasting more than 2 hours of time trying to explain the following results by gcc compiler.

expression(i=1;j=2)     i       j       k
k = i++ + j++;          2       3       3
k = i++ + ++j;          2       3       4
k = ++i + j++;          2       3       4
k = ++i + ++j;          2       3       5

k = i++ + i++;          3               2
k = i++ + ++i;          3               4
k = ++i + i++;          3               4
k = ++i + ++i;          3               6

i = i++ + j++;          4       3
i = i++ + ++j;          5       3
i = ++i + j++;          4       3
i = ++i + ++j;          5       3

i = i++ + i++;          4
i = i++ + ++i;          5
i = ++i + i++;          5
i = ++i + ++i;          6

Question:

  1. I want to know if all the expressions shown (in 4 groups) in above figure have undefined behavior? If only some of them have undefined behavior which ones does and which ones doesn't?

  2. For defined behaviour expressions, kindly can you show (not explain) how compiler evaluates them. Just to make sure, if I got this pre-increment & post increment correctly.

Background:

Today, I've attended a campus interview, in which I was asked to explain the results of i++ + ++i for a given value of i. After compiling that expression in gcc, I realized that the answer I gave in interview was wrong. I decided not to make such mistake in future and hence, tried to compile all possible combinations of pre and post increment operators and compile them in gcc and then try to explain the results. I struggled for more than 2 hours. I couldn't find single behaviour of evaluation of these expressions. So, I gave up and turned to stackoverflow. After little bit of reading archives, found that there is something like sequence point and undefined behaviour.

Amino answered 28/11, 2012 at 22:7 Comment(3)
I think the undefined groups are 2, 3, and 4? They all increment and use the same variable in the same expression, with no intervening sequence point.Odyssey
If you post code in a question, please post it as a code block (without line numbers), not as an image.Hewitt
aslo read: https://mcmap.net/q/23924/-pre-post-increment-pointer & #4866099 & https://mcmap.net/q/23925/-post-increment-and-pre-increment-within-a-39-for-39-loop-produce-same-output-duplicateAmino
G
9

Except the first group, all expressions in the other three groups have undefined behaviour.

How the defined behviour is evaluated (group 1):

i=1, j=2;

k=i++ + j++; // 1 + 2 = 3
k=i++ + ++j; // 1 + 3 = 4
k=++i + ++j; // 2 + 3 = 5
k=++i + j++; // 2 + 2 = 4

It's fairly straight forward. post-increment vs pre-increment thing.

In group 2 and group 4, it's quite easy to see the undefined behaviours.

Group 2 has undefined behaviour because = operator doesn't introduce a sequence point.

Geodynamics answered 28/11, 2012 at 22:11 Comment(5)
@nos: Not similar enough. Each of the statements in the third group modifies i twice. That makes their behavior undefined.Hewitt
@Keith: nitpicking, but merely modifying a variable multiple times in one statement doesn't make it undefined...Odyssey
@Mehrdad: What do you mean? Modifying an object twice with no intervening sequence point does make the behavior undefined.Hewitt
@KeithThompson: Yeah, you forgot to mention sequence points in your first comment.Odyssey
@Mehrdad: Ok. It was a comment, not an answer (I mentioned them in my answer).Hewitt
H
5

There are no sequence points within any of these statements. There are sequence points between them.

If you modify the same object twice between consecutive sequence points (in this case, either via = or via prefix or postfix ++), the behavior is undefined. So the behavior of the first group of 4 statements is well defined; the behavior of the others is undefined.

If the behavior is defined, then i++ yields the previous value of i, and as a side effect modifies i by adding 1 to it. ++i modifies i by adding 1 to it, and then yields the modified value.

Hewitt answered 28/11, 2012 at 22:17 Comment(0)
H
5

I want to know if all the expressions shown (in 4 groups) in above figure have undefined behavior?

Lines 2 through 5:

k = i++ + j++;
k = i++ + ++j;
k = ++i + ++j;
k = ++i + j++;

are all well-defined. All other expressions are undefined, because they all attempt to modify the value of an object through the evaluation of an expression more than once between sequence points (for these examples, the sequence point occurs at the ';' terminating each statement). For example, i = i++; is undefined because we're trying to modify the value of i through both an assignment and a postfix ++ without an intervening sequence point. FYI = operator doesn't introduce a sequence point. || && ?: and ,comma operators introduce sequence points

For defined behaviour expressions, kindly can you show (not explain) how compiler evaluates them.

Let's start with

k = i++ + j++;

The expression a++ evaluates to the current value of a, and at some point before the next sequence point, a is incremented by 1. So, logically, the evaluation goes something like

k = 1 + 2; // i++ evaluates to 1, j++ evaluates to 2
i = i + 1; // i is incremented and becomes 2
j = j + 1; // j is incremented and becomes 3

However...

The exact order in which the expressions i++ and j++ are evaluated, and the order in which their side effects are applied, is unspecified. The following is a perfectly reasonable ordering of operations (using pseudo-assembly code):

mov j, r0        ; read the value of j into register r0
mov i, r1        ; read the value of i into register r1
add r0, r1, r2   ; add the contents of r0 to r1, store result to r2
mov r2, k        ; write result to k
inc r1           ; increment value of i
inc r0           ; increment value of j
mov r0, j        ; store result of j++
mov r1, i        ; store result of i++

DO NOT ASSUME LEFT-TO-RIGHT EVALUATION OF ARITHMETIC EXPRESSIONS. DO NOT ASSUME THAT OPERANDS OF ++ and -- ARE UPDATED IMMEDIATELY AFTER EVALUATION.

Because of this, the result of expressions like i++ + ++i will vary based on the compiler, compiler settings, and even the surrounding code. The behavior is left undefined so that the compiler isn't required to "do the right thing", whatever that may be. You will get a result, but it won't necessarily be the result you expect, and it won't be consistent across all platforms.

Looking at

k = i++ + ++j;

the logical evaluation is

k = 1 + 3  // i++ evaluates to i (1), ++j evaluates to j + 1 (2 + 1 = 3)
i = i + 1
j = j + 1

Again, here's one possible ordering of operations:

mov j, r0
inc r0
mov i, r1
add r0, r1, r2
mov r2, k
mov r0, j
inc r1
mov r1, i

Or it could do something else. The compiler is free to change the order in which individual expressions are evaluated if it leads to a more efficient order of operations (which my examples almost certainly aren't).

Hell answered 28/11, 2012 at 22:46 Comment(1)
+1 Very well shown, where the undefined behavior comes from. It would be even better if there also were an undefined example similarly explained with assembly code!Dyna
M
2

The first group are all defined. They all increment the values of both i and j as a side-effect sometime before the next sequence point, so i is left as 2 and j as 3. In addition, i++ evaluates to 1, ++i evaluates to 2, j++ evaluates to 2 and ++j evaluates to 3. This means that the first assigns 1 + 2 to k, the second assigns 1 + 3 to k, the third assigns 2 + 3 to k and the fourth assigns 2 + 2 to k.

The remainder are all undefined behaviour. In the second and third groups, i is modified twice before a sequence point; in the fourth group i is modified three times before a sequence point.

Mezzorelievo answered 28/11, 2012 at 22:18 Comment(0)
M
0

In cases where a compiler can tell that two lvalue expressions identify the same object, there would be no meaningful cost to having it behave in some sensible fashion. The more interesting scenarios are those in which one or more of the operands are dereferenced pointers.

Given the code:

void test(unsigned *a, unsigned *b, unsigned *c)
{
  (*a) = (*b)++ + (*c)++;
}

there are many sensible ways in which a compiler might process that. It could load b and c, add them, store the result to a, and then increment b and c, or it could load a and b, compute a+b, a+1, and b+1, and then write them in arbitrary sequence, or perform any of countless other sequences of operations. On some processors, some arrangements might be more efficient than others, and a compiler should have no reason to expect that programmers would regard any arrangement as more suitable than any other.

Note that even though on most hardware platforms there would be a limited number of plausible behaviors that could result from passing identical pointers to a, b, and c, the authors of the Standard make no effort to distinguish plausible and implausible outcomes. Even though many implementations could easily at essentially zero cost offer some behavioral guarantee (e.g. guarantee that code like the above would always set *a, *b, and *c to some possibly-Unspecified values without any other side-effects), and even though such a guarantee might sometimes be useful (if pointers will identify distinct objects in cases where the objects' values matter, but might not do so otherwise) it is fashionable for compiler writers to regard any slight possibility of useful optimization they could achieve when granted carte blanche to trigger arbitrarily-destructive side-effects will be worth more than the value programmers could receive from an assurance of constrained behavior.

Meander answered 18/5, 2017 at 19:23 Comment(0)
L
0

Commas are a bit tricky. They do go left to right when in pairs (for vars in for loops really). Statements separated by commas are not guaranteed to be evaluated in a given order if placed in more than a pair of statements. Also note that where function arguments and declarations are separated by commas the order of execution is not guaranteed.

So

int a=0;
function_call(++a, ++a, ++a);

can have unpredictable results.

Lucania answered 4/6, 2019 at 13:49 Comment(0)
S
-1

In most cases gcc implements pre increments first and use those values in the operations and after that evaluates post increments.

For example. In block 2 Pre increments none so for i 1 is used

k = i++ + i++ // hence k = 1+1=2

And two post increments in i so i= 3

One pre increment changes i to 2

k = i++ + ++i // hence k= 2+2= 4

One post increment in i so i= 3

Same for k= ++i + i++

Two pre increments in i makes it 3

k=++i + ++i // hence k=3+3= 6

And i = 3

Hope that explains a bit. But it purely depends on compiler.

Serica answered 18/5, 2017 at 15:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.