being sure about "unknown evaluation order"

Asked 29/8, 2017 at 11:22 Answered 29/8, 2017 at 11:55

Solved c++c static-analysis operator-precedence cppcheck

Since version 1.80, Cppcheck tells me that

Expression 'msg[ipos++]=checksum(&msg[1],ipos-1)' depends on order of evaluation of side effects

in this code sequence (simplified, data is a variable)

BYTE msg[MAX_MSG_SIZE];  // msg can be smaller, depending on data encoded
int ipos = 0;
msg[ipos++] = MSG_START;
ipos += encode(&msg[ipos], data);
msg[ipos++] = checksum(&msg[1], ipos-1);  // <---- Undefined Behaviour?
msg[ipos++] = MSG_END;   // increment ipos to the actual size of msg

and treats this as an error, not a portability issue.

It's C code (incorporated into a C++-dominated project), compiled with a C++98-complient compiler, and meanwhile runs as expected for decades. Cppcheck is run with C++03, C89, auto-detect language.

I confess that the code should better be rewritten. But before doing this, I try to figure out: Is it really dependent on evaluation order? As I understand it, the right operand is being evaluated first (it needs to before the call), then the assignment is taking place (to msg[ipos]) with the increment of ipos done last.

Am I wrong with this assumption, or is it just a false positive?

Blackington answered 29/8, 2017 at 11:22 Comment(8)

Help. "There is no concept of left-to-right or right-to-left evaluation in C" – Malachy 29/8, 2017 at 11:27

Can't you just write msg[ipos] = checksum(&msg[1], ipos++)? – Timbered 29/8, 2017 at 11:27

en.cppreference.com/w/c/language/eval_order – Popper 29/8, 2017 at 11:30

@ErikW rewriting comes next, there are more than this one occurrence of the pattern, and the ipos++ patter on the left tells that this is sequential writing into a buffer. – Blackington 29/8, 2017 at 11:31

@BiagioFesta, true :-) but isn't the call forcing ipos-1 being evaluated first? – Blackington 29/8, 2017 at 11:35

@ErikW That suffers from the same ordering problem. – Multiform 29/8, 2017 at 11:51

"meanwhile runs as expected for decades" – no offense, but this does sound a bit like "works for me". Unspecified and even undefined behavior is very different from e.g. a coin flip. One compiler may always do what you want, another always what you don't want, and for a third, the behavior may indeed be sporadic. – Fourlegged 29/8, 2017 at 13:47

@ArneVogel it works (not only "for me") as long as the code base is fixed to the current platform/toolchain. Be reassured: the code will change to make porting possible. Posting it here was worth the lesson it can teach. – Blackington 29/8, 2017 at 14:0

This code does indeed depend on evaluation order in a way which is not well defined:

msg[ipos++] = checksum(&msg[1], ipos-1);

Specifically, it is not specified whether ipos++ will increment before or after ipos-1 is evaluated. This is because there is no "sequence point" at the =, only at the end of the full expression (the ;).

The function call is a sequence point. But that only guarantees that ipos-1 happens before the function call. It does not guarantee that ipos++ happens after.

It appears the code should be rewritten this way:

msg[ipos] = checksum(&msg[1], ipos-1);
ipos++; // or ++ipos

Archduke answered 29/8, 2017 at 11:29 Comment(7)

@Wolf: Yes, the function call is a sequence point. But that only guarantees that ipos-1 happens before the function call. It does not guarantee that ipos++ happens after. – Archduke 29/8, 2017 at 11:34

I'd like to accept your eye-opening answer, but could you also incorporate the detail about the undefined order of call and calculation of the assignment target. My fault was to assume that having ipos-1 before the call and assignment after the call would be enough to be safe here... – Blackington 29/8, 2017 at 11:41

@Blackington the compiler is free to calculate the memory address at which the assignment will be written (and thus update ipos) before evaluating the RHS. – Volpe 29/8, 2017 at 11:48

Is it really only unspecified or actually undefined? Beware that it's pre-C++11. – Milwaukee 29/8, 2017 at 11:51

What the newer standards say is irrelevant, since the question is explicitly about C89, C++98 and C++03. – Offertory 29/8, 2017 at 11:59

@Wolf: I have added the text from my comment to the answer, as you requested. – Archduke 29/8, 2017 at 12:24

@Thanks for adding that detail :-) – Blackington 29/8, 2017 at 13:11

The order of evaluation of the operands of = is unspecified. So to begin with, the code relies on unspecified behavior.

What makes the case even worse is that ipos is used twice in the same expression with no sequence point in between, for unrelated purposes - which leads to undefined behavior.

C99 6.5

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

The very same text applies to C90, C99, C++98 and C++03. In C11 and C++11 the wording has changed but the meaning is the same. It is undefined behavior, pre-C++11.

The compiler is not required to give a diagnostic for undefined behavior. You were lucky that it did. It is not a false positive - your code contains a severe bug, all the way from the original C code.

Offertory answered 29/8, 2017 at 11:55 Comment(8)

Regarding the assignment operator specifically, C99 says "The side effect of updating the stored value of the left operand shall occur between the previous and the next sequence point.". But that's irrelevant, because the UB occurs during the value calculation itself. – Offertory 29/8, 2017 at 11:57

For what it's worth, it wasn't the compiler that gave the warning--it was a static analysis tool (cppcheck, which is free and not extremely elaborate, but useful). – Archduke 29/8, 2017 at 12:23

@JohnZwinck gcc/g++ with enough -Wall and -Wextra flags also tend to warn for this UB. (Specifically -Wsequence-point.) – Offertory 29/8, 2017 at 12:49

for unrelated purposes -- the usage is related. The value of data controls how many bytes are written and encode returns this amount (and it's stored in ipos). Maybe I should add this to the question. – Blackington 29/8, 2017 at 13:16

@Blackington No, it is not obviously related. Neither the reader or the compiler can tell what that function does. You are calling a function which does <whatever> with the variable as parameter, in the same expression as you increase that variable. You can't know if the variable will get increased before or after the function call. The variable is not "read to determine the value to be stored". This refers to cases like i = i + 1; which is well-defined. – Offertory 29/8, 2017 at 13:24

@Offertory I see your point. The relation is most probably the originally intended one, a developer will read this from the context (as I did today: it's not "my code"), of course a compiler cannot deduce it (I did not claim that the relation was obvious). – Blackington 29/8, 2017 at 13:31

@Offertory What is the change with C++14 you are referring to? Do new rules apply to evaluation order or are compilers required to raise a compiler error? – Blackington 29/8, 2017 at 14:20

@Blackington It was changed in C++11 even (but not in C11). I mixed up the new C++ standards. Answer corrected. See this: en.cppreference.com/w/cpp/language/eval_order, scroll down to "undefined behavior". – Offertory 29/8, 2017 at 14:27

Recommended topics

Hot tags