Understanding more about i++ and i=i+1

Asked 5/10, 2014 at 4:19 Answered 8/10, 2014 at 7:46

c++c assembly optimization post-increment

I was wondering if there is difference between the two forms of increment. Some of the links says i++ is faster that i=i+1;

Also as one of the person my observation is also the same for assembly code. please check the image where both the assembly code are same for i++ and i=i+1 - enter image description here

There is another link that says that it used to be true previously that increment operator was faster than addition and assignment, but now compilers optimize i++ and i=i+1 the same.

Is there any official document/paper which we can refer to confirm what is exactly right ? (I usually go with credit's and number of accepted answers of a person on stackoverflow. Could't find any such thing on the links I provided).

Squamous answered 5/10, 2014 at 4:19 Comment(8)

It really depends upon the way the compiler handles it. I don't think there is any official paper suggesting this behavior. – Pedicle 5/10, 2014 at 4:23

The reason in your link for higher performance involve two steps of ADD & Assignment. Yes, Assignment can be a time consumption but in this case it wouldn't matter because the variable to be assigned with i + 1 is already available in the register. So, the write allocate doesn't result in performance damage. – Stockholder 5/10, 2014 at 4:24

I would think most compilers would optimize them to be the same, but in some instruction sets, Increment is atomic but assignment and addition takes two calls. – Mucker 5/10, 2014 at 4:29

In both these cases they've been optimized to effectively ++i, which has decidedly different sequencing characteristics than i++, but the same expression value as i=i+1. I suspect things would be considerably different had you used them as rvalues, and the optimizer detecting they were not used as-such was given the green light to reduce them both to ++i. I would expect it in the latter, but you should only see it with the former if it is not used as an rvalue. – Fun 5/10, 2014 at 4:36

The best source there is is to check the assembly code, and you're doing that. This is debug code though, so fairly meaningless. – Moitoso 5/10, 2014 at 8:6

Bottom line: the compiler knows better, trust it. – Obstipation 28/5, 2016 at 19:18

Oh, and if you get this question in an interview, just walk out, just know that your interviewer has no clue. – Obstipation 28/5, 2016 at 19:26

There's a subtle difference between incrementing and adding a constant that happens to be one. For example if you are doing a football league, it should be points +=3 for a win and points += 1 for a draw, not points++ for a draw. – Silesia 22/5, 2017 at 12:42

Actually, if you do C++, it is much better to get used to write ++i instead. The reason is simple: i++ requires a copy.

a = ++i; // a is set to the result of i+1
a = i++; // make a copy of i, compute i+1, save the copy of i in a

Without optimizations, the assembly code would look like this:

a = ++i;                            a = i++;

MOV eax, (i)                        MOV eax, (i)
                                    PUSH eax
ADD eax, 1                          ADD eax, 1
MOV (i), eax                        MOV (i), eax
                                    POP eax
MOV (a), eax                        MOV (a), eax

Now, with optimizations, the result is the same in C where the ++ operator only applies to integers and pointers.

The ++ and -- are there because most processors had an INC and a DEC instruction at the time C was written. So if you were to use an index register, these instructions would be applied:

char a[256];
...init 'a' in some way...
int sum =0;
for(int i = 0; i < 100; ++i)
{
    sum += a[i];
}

This could be done with a simple INC as in (6502):

    LDA #00
    LDY #00
LOOP:
    CLC
    ADC ($80),Y
    INY              <-- ++i or i++
    CPY #100
    BCC LOOP

Note that in C we have another notation to increment a variable:

i += 1;

This is practical if you need to increment the register by more than 1:

i += 3;

Or to double the register each time:

i += i;   // (equivalent to  i *= 2;  or  i <<= 1;  in C++)

Question: Why is INC and DEC not used with all 80x86?

There has been time when the ADD reg, 1 and SUB reg, 1 instructions were faster than the INC reg and DEC reg. In the old days, it was faster because the instruction was smaller and we had no cache (or very little). Today, either instruction is probably about the same.

From a comment below, a reason for the "slowness" was the FLAGS register:

Intel Optimization Reference, section 3.5.1.1 Use of the INC and DEC Instructions

From another, more current comment, it looks like the slowness referenced in that Intel document has been fixed in newer processors. So the use or non-use of these instructions should depend on the target processor if known in advance.

As pointed out by phresnel in a comment, the difference between i++ and ++i was probably not clear for many people. For an integer, the optimization is really trivial and will most certainly happen even with -O0. However, in C++, that's a different story. There is a class with both increment operators clearly showing that a copy is required for i++ (even if data_ is just an integer, although in that case you could also do: return data_++ -- it still requires a well hidden copy!):

class A
{
public:
    A& operator ++ () // ++i -- no copy
    {
        ...apply the ++ operation to 'data_'...
        return *this;    // return a reference to this
    }

    A  operator ++ (int) // i++ -- needs a temporary copy
    {
        // remember that the 'int' is totally ignored in the function,
        // its only purpose is to distinguish '++i' from 'i++'

        A copy = *this;    // here we need a copy
        ++*this;
        return copy;       // and here we return said copy
    }

private:
    some_type_t   data_;
};

Note that modern C++ compilers do not make two copies in the i++ function as the returned value can be optimized out without the need of an extra copy.

The difference between both cases can be shown as clearly slower if using i++ as described in Is there a performance difference between i++ and ++i in C++? (link mentioned by phresnel)

Ancestral answered 5/10, 2014 at 4:37 Comment(16)

"in C where the ++ operator only applies to integers" -- and pointers (and real floating types). Further suggest omitting C++ syntax in int sum(0); and for(int i(0); ...). And the equivalences in the last snippet also hold for C (the second one is only equivalent for non-negative i). – Ugric 5/10, 2014 at 5:42

@mafso, I thought ++ and -- were not allowed on floating points. But indeed, pointers support them! The fact is in C++ you can overload those operators and the results are much faster when put on the left side if possible. Now... i *= 2 is the same as i += i, even if i is negative. – Ancestral 5/10, 2014 at 7:36

I was surprised, too, but see C11 (draft) 6.5.3.1 p1 (that's a constraint section, you can also just try it with a conforming compiler). I'm not sure if this should be considered good style, I just mentioned it for completeness. I got your point about C++ and why you prefer prefix increment/decrement, I'm talking about the comment in the last line of code: "equivalent to i *= 2; or i <<= 1; in C++". I'm confused about the "in C++" there (the equivalence holds for C, too). And left-shifting negative numbers (even by 0, strictly speaking) is UB, but i+i is defined for some negative i. – Ugric 5/10, 2014 at 7:58

The reason INC/DEC should be replaced with ADD/SUB is "because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore creating false dependencies on earlier instructions that set the flags." see Intel Optimization Reference, section 3.5.1.1 Use of the INC and DEC Instructions – Polygon 5/10, 2014 at 8:4

@mafso, The shift was not available in C when I was still using that language. I'm also surprised that the shift would be undefined. Most compilers will transform i * 2 in SHL eax, 1... (or whatever corresponds to your variable type.) – Ancestral 5/10, 2014 at 8:11

The UB may origin from non-2's complement (what the C standard still addresses), but now that it is UB, compilers are free to optimize e.g. n<<0 >= 0 to 1 (I could get neither Gcc nor Clang to do things like that, though), the mapping to assembly isn't the only problem with UB. FYI (I've just looked it up), C and C++ differ slightly here, C++ makes left-shifting a positive number into the sign bit (but not beyond) implementation-defined, but otherwise, they agree: Left-shifting negative signed integers is always UB. – Ugric 5/10, 2014 at 9:9

This is complete nonsense. a= i++; causes i to be incremented and copied to a. a= ++i; causes i to be copied to a and be incremented. (It may even be the case that i and a be merged into a single variable by the optimizer.) – Vernettaverneuil 5/10, 2014 at 19:48

@YvesDaoust, no, the other way around. ++i first increments i then saves the result in a. And vice versa for the other one. And yes, in C it generally does not make much of a difference, although in very complex expressions, the compiler may not be able to optimize as much. Also both expressions set a different value in a so they are not equivalent. – Ancestral 5/10, 2014 at 21:29

Yep: i++ requires no copy. These upvotes are unearned. – Vernettaverneuil 6/10, 2014 at 6:25

The argument should not be "Without optimizations, the assembly code would look like this:", because nobody who's sane in their mind disables optimizations in a release. The argument should instead involve semantics and a side-effect demonstration about when ++() is really superiour to ++(int) and vice versa. – Dedededen 28/5, 2016 at 16:42

@phresnel How do you think we can better and better optimize compilers if it is not by checking out the final optimizations? Some super critical code needs us to do that. More and more rarely, maybe, but once in a while the compiler will do it wrong or our loops are slightly too long and in order to find yet another optimization, you have to look at the final code... – Ancestral 28/5, 2016 at 18:3

@AlexisWilke: Checking out the final optimizations is good if you need to. But most code should be written with meaning in mind, which is the point of using a higher level language. The removal of an unused value and the propagation of this semantic void up the code tree is pretty basic. If you really want to give a meaningful explanation why ++i should be preferred over i++ in case the value before incrementation is not required, maybe look here (https://mcmap.net/q/23970/-is-there-a-performance-difference-between-i-and-i-in-c), which contains some examples of non O(1) copying algorithms. – Dedededen 28/5, 2016 at 18:28

@phresnel, Thank you for the link. That may help people like Yves Daoust to better understand my argument. Only, showing the result of the compiler with that Something class rather than just an int is going to take pages of assembly code and does not really add to my example. The very simple PUSH and POP represent the copy (rather trivial here, I agree), in a way I think most people can easily understand that there is something more that happens when you use i++. – Ancestral 28/5, 2016 at 18:36

@phresnel, also the sample implementation of operator(int) is good in that linked answer. It clearly shows that you have to make a temporary copy of the object before you apply the increment and then return that copy. – Ancestral 28/5, 2016 at 18:38

@AlexisWilke: Yes, that's an example for when the compiler has no insight into the implementation of operator++() / operator++(int). In case of int i; ++i; the compiler has insight. – Dedededen 28/5, 2016 at 18:41

@amdn: The suggestion to avoid inc/dec is obsolete. They only have a false dependency on the flag they don't modify (CF) on P4, not on other microarchitectures. Even Intel's own compiler emits inc in many cases instead of add reg, 1. Intel is just slow to update their optimization manual. – Schism 28/5, 2016 at 19:6

There is no official document. The c spec does not declare that i++ must be faster than i+1 so compilers/optimizers are free to do what they like (and they can make different choices based on surrounding code and optimization level).

I use i++ because it's faster for me to read, with fewer characters to mistype.

Ingunna answered 5/10, 2014 at 4:24 Comment(0)

Runthe code profiler between both of them it will be same for both i++ and i = i+1 for current version of gcc compiler. This purely depends on the compiler optimizations.

If you speak about processor specifically you can see the below chart for machine cycles,

INC - Increment

    Usage:  INC     dest
    Modifies flags: AF OF PF SF ZF

    Adds one to destination unsigned binary operand.

                             Clocks                 Size
    Operands         808x  286   386   486          Bytes

    reg8              3     2     2     1             2
    reg16             3     2     2     1             1
    reg32             3     2     2     1             1
    mem             15+EA   7     6     3            2-4  (W88=23+EA)

ADD - Arithmetic Addition

    Usage:  ADD     dest,src
    Modifies flags: AF CF OF PF SF ZF

    Adds "src" to "dest" and replacing the original contents of "dest".
    Both operands are binary.

                             Clocks                 Size
    Operands         808x  286   386   486          Bytes

    reg,reg           3     2     2     1             2
    mem,reg         16+EA   7     7     3            2-4  (W88=24+EA)
    reg,mem          9+EA   7     6     2            2-4  (W88=13+EA)
    reg,immed         4     3     2     1            3-4
    mem,immed       17+EA   7     7     3            3-6  (W88=23+EA)
    accum,immed       4     3     2     1            2-3

You can find the machine cycles taken for each instruction ADD and INC for various processors, 8086, 80286, 80386, 80486,above, you can find this documented in intel processor manuals. Lower the machine cycles faster the response.

Eurhythmy answered 5/10, 2014 at 8:45 Comment(1)

As mentione by amdn below, the INC and DEC are slower on modern processors which reorganize many instructions but are limited in that reorganization when an instruction does not change all the flags in CFLAGS. The instruction pipeline has to be drained quite a bit before the processor can go on. That being said, the speed is indeed going to be the same in most cases because the optimizer will generate code which is exactly the same in both cases. – Ancestral 5/10, 2014 at 21:35

-1

rmv=10;

rmv=rmv++;//rmv+1 both are same

printf("1.time=%d",rmv);

//output is 10 than second time increment 10+1 so,value is11

printf("2.time=%d",rmv++);//value is 11

Quechua answered 8/10, 2014 at 7:46 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags