Post increment behavior

Asked 8/6, 2018 at 10:13 Answered 8/6, 2018 at 10:41

Recently, I upgraded my project from gcc 4.3 to gcc 5.5. After that, I see a change of behaviour in the post-increment operator which is causing issues in my project. I am using a global variable as a control variable. For example, consider this sample program:

int i = 0;
int main()
{
        int x[10];

        x[i++] = 5; ===> culprit

        return 0;

}

In the above snippet, the value of i should be incremented only after 5 has been assigned to x[0] which would guarantee that x[0] has a proper valid value assigned before i is incremented.

Now here comes the problem, I see that after moving to gcc 5.5, the assembly instructions have changed and the value of i gets incremented even before the assignment has happened. Assembly instruction of the above snippet:

Dump of assembler code for function main():
6       {
   0x0000000000400636 <+0>:     push   %rbp
   0x0000000000400637 <+1>:     mov    %rsp,%rbp

7               int x[10];
8
9               x[i++] = 1;
   0x000000000040063a <+4>:     mov    0x200a00(%rip),%eax        # 0x601040 <i>
   0x0000000000400640 <+10>:    lea    0x1(%rax),%edx
   0x0000000000400643 <+13>:    mov    %edx,0x2009f7(%rip)        # 0x601040 <i>  ====> i gets incremented here
   0x0000000000400649 <+19>:    cltq
   0x000000000040064b <+21>:    movl   $0x5,-0x30(%rbp,%rax,4)    =====> x[0] is assigned value here

10
11              return 0;
   0x0000000000400653 <+29>:    mov    $0x0,%eax

12
13      }
   0x0000000000400658 <+34>:    pop    %rbp
   0x0000000000400659 <+35>:    retq

Because of the above assembly another thread within the process using variable i starts reading incorrect values from the global array.

Now the same code when compiled using gcc 4.3, adheres to the behaviour which I understand, i.e., the value is first assigned and then i is incremented. Assembly instruction using gcc 4.3 for the same snippet:

Dump of assembler code for function main():
5       int main()
   0x00000000004005da <+0>:     push   %rbp
   0x00000000004005db <+1>:     mov    %rsp,%rbp

6       {
7               int x[10];
8
9               x[i++] = 1;
   0x00000000004005de <+4>:     mov    0x200a64(%rip),%edx        # 0x601048 <i>
   0x00000000004005e4 <+10>:    movslq %edx,%rax
   0x00000000004005e7 <+13>:    movl   $0x5,-0x30(%rbp,%rax,4)     ======> x[0] gets assigned here
   0x00000000004005ef <+21>:    lea    0x1(%rdx),%eax
   0x00000000004005f2 <+24>:    mov    %eax,0x200a50(%rip)        # 0x601048 <i>   ======> i gets incremented here

10
11              return 0;
   0x00000000004005f8 <+30>:    mov    $0x0,%eax

12
13      }
   0x00000000004005fd <+35>:    leaveq
   0x00000000004005fe <+36>:    retq

I want to know if this is what is the expected behaviour with the new compilers? Is there any switch using which I can switch back to the old behaviour? Or is this a bug present in the new compiler?

Any help or leads will be appreciated.

Note: I want to avoid locks during the reading of i because of performance issues. The culprit line in the above code gets executed within a lock. So, only one thread can update i at any point but because of the change in assembly instructions within the compilers, a race condition is introduced without any code change.

Edit 1: I know there are locking issues and I am also keeping that as an option but what I actually want to know is if there is any switch or flag using which I can get back to the older behaviour. The code base is very very huge and I would have to go through the entire code base to check if there exist similar problems at other locations in the code. So, getting the old behaviour back would be the life-saver.

Spore answered 8/6, 2018 at 10:13 Comment(9)

You should definitelly use locks. You were depending on race condition and now You see the consequences – Renter 8/6, 2018 at 10:17

Read about sequence points: https://mcmap.net/q/23913/-sequence-points-in-c/…. You will have no problem if doing x[i] = 0; i++;. – Dwelt 8/6, 2018 at 10:20

In the above snippet, the value of i should be incremented only after 5 has been assigned to x[0] which would guarantee that x[0] has a proper valid value assigned before i is incremented. -- any basis for this assumption? – Hemstitch 8/6, 2018 at 10:23

"I want to avoid locks during the reading of i because of performance issues" Algorithm 1st, Code stability 2nd, bugs/extensions 3rd, performance 7th (because the bulk of the performance will be from #1). How do you know locks are going to be a problem compared to the rest of the code. I think you need to train yourself to not assume. – Transfuse 8/6, 2018 at 10:36

'I am using a global variable as a control variable'.... does not fill me with any kind of confidence with your overall design:( – Offence 8/6, 2018 at 10:49

Don't use the ++ operators together with other operators in the same expression, and there will be no problems. There exists no case when you have to mix ++ with other operators. – Estimation 8/6, 2018 at 11:9

@UKMonkey: This is a very highly used code path and the locking was removed a few years back by some other member as part of performance improvements. There was a drastic improvement in performance after that. And the behaviour used to adhere our understanding till gcc 4.3. I thought this was the basic for post-increment that the value will only be incremented after the entire statement is executed or before it is used again in the same statement.. – Spore 8/6, 2018 at 14:26

@lurker: Behaviour has changed in both gcc and g++ – Spore 8/6, 2018 at 14:46

I see no problem with either assembly sequence. In the 'problem' sequence, the index into 'x' is in the register 'rax' . that address+1 is used to set variable 'i' then 'rax' is used as the address to receive the value '1' I.E. there is no problem with the posted code. – Haynor 9/6, 2018 at 4:18

You misunderstood the guarantees that post-increment offers. It guarantees that the location in which x is stored will be calculated using the old value of i. It absolutely does not guarantee that x will be stored before the updated value is stored in i.

The compiler is perfectly at liberty to convert the code to:

int temp = i;
i = temp+1;
x[temp] = 5;

Furthermore, if you have one thread modifying i and another thread reading i, you have no guarantees about which value the other thread will see. You don't even get a guarantee that it will see either the new value or the old value unless i is std::atomic.

Given you are trying to update both i and x in a coordinated way, you will have to have a lock.

The compiler could even convert your code to:

i = i + 1;
// <<<<
x[i-1] = 5;

Which is interesting if another thread jumps in and modifies i at the point marked <<<<.

Polymerism answered 8/6, 2018 at 10:25 Comment(10)

@Frankie_C You're not sure that the compiler is at liberty to do this int temp = i; i = temp+1; x[temp] = 5;? I think you should read #4176828 – Transfuse 8/6, 2018 at 11:0

@Frankie_C: Precedence is irrelevant because the order of operations only determines what the final results must be as defined by the C standard’s “observable behavior.” That observable behavior includes things like output and changes to volatile objects, but it does not include one thread looking at changes made by another thread without use of the various synchronzing features. If a progrm has a = 3; b = 4;, the compiler is free to set b first, then a. – Harmonious 8/6, 2018 at 11:13

@Frankie_C Operator precedence has nothing to do with order of evaluation or sequencing. Operator precedence just states how an expression should be parsed, not the order in which it should be executed. The only guarantee of order of execution in C is sequence points. And even when sequence points are used, the compiler may re-order the instructions if it doesn't change the observable behavior and there are no side effects . – Estimation 8/6, 2018 at 11:14

Gah, fastest gun in the west no longer. – Estimation 8/6, 2018 at 11:14

@Estimation : I have only just noticed that this question is tagged both C++ and C. C++ no longer uses "sequence points", instead it refers to side effects being "sequenced before" or "sequenced after" (or "not sequenced"). For the sample code shown, C and C++ agree on the guarantees, but I believe there are one or two odd corners where C++ defines sequencing but C does not. – Polymerism 8/6, 2018 at 11:17

@MartinBonner Nah C and C++ still use sequence points (C11 also introduced "sequenced before" terms at the same time as C++11). The meaning is the very same. "It's not Prince, it's the artist formerly known as Prince". Same guy, same behavior. – Estimation 8/6, 2018 at 11:21

@EricPostpischil @Estimation AAARGH I have misunderstood :( I was reasoning on compiler using the incremented i to compute array element address resulting in access to element X[1] instead of X[0]. I must definitely refrain from any comment/answer till after holidays! – Canner 8/6, 2018 at 15:7

Not to revive this thread, but couldn't you separate out the instructions on different lines and use a memory barrier instead of having to resort to a lock? – Lieselotteliestal 10/2, 2020 at 22:47

@Lieselotteliestal Putting the instructions in different statements x[i] = 0; i++; will definitely guarantee that the increment happens after the store in the abstract machine, but it won't help with multi-thread access. All the transforms I discuss above will still apply, because the code has no "side effects" (in the sense used by the standard). I have no idea whether memory barriers are enough - I find it hard to reason about multi-threaded code with locks, and membars are even trickier. – Polymerism 11/2, 2020 at 6:32

I may just use a compiler directive to turn off optimizations for the block of code I don't want to be reordered. It's kind of amazing that something so conceptually simple as "do as I say, in the order I say" can't be enforced without some ugly hacks, if that even guarantees anything – Lieselotteliestal 11/2, 2020 at 14:32

When reading and writing the same variable from different threads, you must use some synchronization mechanism, e.g. a mutex or an atomic variable (if you want to synchronize just a single variable). Platforms other than x86 are much less forgiving in this sense.

In addition, when more than one variable is being modified, you need to ensure happens-before memory ordering semantics, in order for the other thread to "see" the new values in the right order in time. Using a mutex automatically ensures acquire-release semantics (i.e. anything happened in one thread before releasing the mutex will be visible to the thread that locks it again). Without memory ordering you have no guarantee when the threads will see each other's changes in time (or at all).

An atomic also comes with memory ordering, but only for the variable itself.

Without any memory synchronization the compiler assumes there are no other threads running, and is free to order memory accesses any way it likes. It may move the increment part before, after, or really anywhere as long as there is no observable effect in the current thread.

If you want to know more I recommend watching Herb's excellent talk on memory ordering.

Priscella answered 8/6, 2018 at 10:41 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags