i++ less efficient than ++i, how to show this?

Asked 12/7, 2009 at 19:40 Answered 13/7, 2009 at 9:9

Solved c++optimization prefix postfix-operator

I am trying to show by example that the prefix increment is more efficient than the postfix increment.

In theory this makes sense: i++ needs to be able to return the unincremented original value and therefore store it, whereas ++i can return the incremented value without storing the previous value.

But is there a good example to show this in practice?

I tried the following code:

int array[100];

int main()
{
  for(int i = 0; i < sizeof(array)/sizeof(*array); i++)
    array[i] = 1;
}

I compiled it using gcc 4.4.0 like this:

gcc -Wa,-adhls -O0 myfile.cpp

I did this again, with the postfix increment changed to a prefix increment:

for(int i = 0; i < sizeof(array)/sizeof(*array); ++i)

The result is identical assembly code in both cases.

This was somewhat unexpected. It seemed like that by turning off optimizations (with -O0) I should see a difference to show the concept. What am I missing? Is there a better example to show this?

Hudibrastic answered 12/7, 2009 at 19:40 Comment(5)

The compiler is smart enough to deduce that ++i and i++ would produce the same outcome in your loop-example. Try actually using the result by assigning it to a variable and computing something with it, like an array index or something. But I daresay you're going to see negligible differences. – Doddering 12/7, 2009 at 19:44

by the way: sizeof(array)/sizeof(array[0])... sizeof(*array)=sizeof(int) – Thrombin 12/7, 2009 at 19:47

You're missing that the compiler IS smarter than you. In your example code , i++ and ++i will have the same effect no matter what. That's not even a compiler optimization, it's just normal code generation. If you want different assembly code with optimization turned off, make code in which i++ and ++i is part of a larger expression. (you'll find when you turn on optimizations again that it doesn't matter). You should be testing post vs preincrement in C++ rather, where it can make small difference (e.g. on iterators) – Counteraccusation 12/7, 2009 at 20:52

With the integer you are unlikely to see a difference the compiler can do some smart optimizations around this. You may be able to dect a difference with user defined types that have the pre and post increment defined correctly (ie they behave like the built in types). – Inbreed 12/7, 2009 at 22:8

@Artyom: Yes sure, but if you would ever decide to change the type that array stores in its elements, you are likely to end up with a bug if you use sizeof(int) instead of sizeof(*array) since the chance is big that you will forget to also change on that place. Code is constantly changing, thus always aim to make it as easy as possible to make changes to it, which usually means reducing the number of places you have to change in the code in order for a change to take place, especially if those are places that will compile even if you don't change them (like in this case). – Sherillsherilyn 7/1, 2014 at 15:15

In the general case, the post increment will result in a copy where a pre-increment will not. Of course this will be optimized away in a large number of cases and in the cases where it isn't the copy operation will be negligible (ie., for built in types).

Here's a small example that show the potential inefficiency of post-increment.

#include <stdio.h>

class foo 
{

public:
    int x;

    foo() : x(0) { 
        printf( "construct foo()\n"); 
    };

    foo( foo const& other) { 
        printf( "copy foo()\n"); 
        x = other.x; 
    };

    foo& operator=( foo const& rhs) { 
        printf( "assign foo()\n"); 
        x = rhs.x;
        return *this; 
    };

    foo& operator++() { 
        printf( "preincrement foo\n"); 
        ++x; 
        return *this; 
    };

    foo operator++( int) { 
        printf( "postincrement foo\n"); 
        foo temp( *this);
        ++x;
        return temp; 
    };

};


int main()
{
    foo bar;

    printf( "\n" "preinc example: \n");
    ++bar;

    printf( "\n" "postinc example: \n");
    bar++;
}

The results from an optimized build (which actually removes a second copy operation in the post-increment case due to RVO):

construct foo()

preinc example: 
preincrement foo

postinc example: 
postincrement foo
copy foo()

In general, if you don't need the semantics of the post-increment, why take the chance that an unnecessary copy will occur?

Of course, it's good to keep in mind that a custom operator++() - either the pre or post variant - is free to return whatever it wants (or even do whatever it wants), and I'd imagine that there are quite a few that don't follow the usual rules. Occasionally I've come across implementations that return "void", which makes the usual semantic difference go away.

Checkpoint answered 12/7, 2009 at 20:28 Comment(1)

This example clearly illustrates the problem of i++ - if i is a complex type where copy is expensive (perhaps a sophisticated reference counted iterator) then the copy must be avoided. – Chimney 12/7, 2009 at 20:50

You won't see any difference with integers. You need to use iterators or something where post and prefix really do something different. And you need to turn all optimisations on, not off!

Undersell answered 12/7, 2009 at 19:45 Comment(10)

Can you elaborate on that optimization comment, please? Thanks! – Hudibrastic 12/7, 2009 at 19:54

Whenever benchmarking anything, you need to get the compiler to generate the best code it can. with low levels of optimisation, all sorts of factors, like poor register use, may come into play and distort the results. I've been bitten by this when posting benchmarks here before. – Undersell 12/7, 2009 at 20:0

I think he just wants to see that, without optimization, ++x should generate faster code than x++. – Apgar 12/7, 2009 at 20:6

@GMan My point is that without optimisations other factors may get in the way of demonstrating that, even were it true in an ideal world. – Undersell 12/7, 2009 at 20:11

+1 concise, complete answer. specially for mentioning the optimization pitfall. – Tallage 12/7, 2009 at 20:28

@Mihail No, probably not. A compiler might produce identical code for the two samples posted by the questioner, but probably won't. One will be slightly different in speed from the other, assuming all optimisations. Such is life. The question is, are such differences significant and reproduceable? – Undersell 12/7, 2009 at 20:52

@Nail: Yeah, it's reproduceable, and on my computer speed up is about 20% -- but this is in case we are not doing much things except ++, so it can show that ++i is faster. But in real life too hard to find tasks on which it would get noticable benefits - this is true. – Valdavaldas 12/7, 2009 at 21:22

@Mihail If you are seeing a 20% difference, I suggest you post your code, together with your testing protocol (how it was compiled, how the different versions were run (and how often and in what order), how it was timed etc.) as an SO question. There should not be 20% difference. – Undersell 12/7, 2009 at 21:32

@Neil: sorry for missprint in your name in previous comment. And yeah, i've already posted code in my answer. Look downwards. And I'll be glad if you can help pointing out what is the problem in my example -- i hate vagueness... – Valdavaldas 12/7, 2009 at 21:46

@Mihail I've posted my own benchmark in a second answer below. – Undersell 12/7, 2009 at 22:21

I like to follow the rule of "say what you mean".

++i simply increments. i++ increments and has a special, non-intuitive result of evaluation. I only use i++ if I explicitly want that behavior, and use ++i in all other cases. If you follow this practice, when you do see i++ in code, it's obvious that post-increment behavior really was intended.

Enchant answered 12/7, 2009 at 19:56 Comment(3)

I bet that if the language had been named ++C, people would do this by default much more often. – Kashakashden 12/7, 2009 at 20:25

Perhaps it is called C++ because it still acts like C when used a certain way. – Chimney 12/7, 2009 at 20:48

"when you do see i++ in code, it's obvious that post-increment behavior really was intended." You must be one of the special few who remember the difference ;-) – Jen 13/7, 2009 at 6:28

Several points:

First, you're unlikely to see a major performance difference in any way
Second, your benchmarking is useless if you have optimizations disabled. What we want to know is if this change gives us more or less efficient code, which means that we have to use it with the most efficient code the compiler is able to produce. We don't care whether it is faster in unoptimized builds, we need to know if it is faster in optimized ones.
For built-in datatypes like integers, the compiler is generally able to optimize the difference away. The problem mainly occurs for more complex types with overloaded increment iterators, where the compiler can't trivially see that the two operations would be equivalent in the context.
You should use the code that clearest expresses your intent. Do you want to "add one to the value", or "add one to the value, but keep working on the original value a bit longer"? Usually, the former is the case, and then a pre-increment better expresses your intent.

If you want to show the difference, the simplest option is simply to impement both operators, and point out that one requires an extra copy, the other does not.

Sluggish answered 12/7, 2009 at 20:5 Comment(0)

Try to use while or do something with returned value, e.g.:

#define SOME_BIG_CONSTANT 1000000000

int _tmain(int argc, _TCHAR* argv[])
{
    int i = 1;
    int d = 0;

    DWORD d1 = GetTickCount();
    while(i < SOME_BIG_CONSTANT + 1)
    {
        d += i++;
    }
    DWORD t1 = GetTickCount() - d1;

    printf("%d", d);
    printf("\ni++ > %d <\n", t1);

    i = 0;
    d = 0;

    d1 = GetTickCount();
    while(i < SOME_BIG_CONSTANT)
    {
        d += ++i;

    }
    t1 = GetTickCount() - d1;

    printf("%d", d);
    printf("\n++i > %d <\n", t1);

    return 0;
}

Compiled with VS 2005 using /O2 or /Ox, tried on my desktop and on laptop.

Stably get something around on laptop, on desktop numbers are a bit different (but rate is about the same):

i++ > 8xx < 
++i > 6xx <

xx means that numbers are different e.g. 813 vs 640 - still around 20% speed up.

And one more point - if you replace "d +=" with "d = " you will see nice optimization trick:

i++ > 935 <
++i > 0 <

However, it's quite specific. But after all, I don't see any reasons to change my mind and think there is no difference :)

Valdavaldas answered 12/7, 2009 at 19:44 Comment(9)

So long as i and d are integers you shouldn't see any difference here. – Backhouse 12/7, 2009 at 20:5

However, in VS2005 with /O2 option, i really see difference -- ++i is faster. But, I'm just curious why in debug version - i++ gives me better perfomance? – Valdavaldas 12/7, 2009 at 20:21

See my second aswer below for my results. – Undersell 12/7, 2009 at 22:20

-1 You calculate different things: the result d in first and second loop is different in SOME_BIG_CONSTANT value, because in first you add before incrementing and in second after. It actually results different code. You should try: "d+=i; i++;" or "d+=i; ++i;" to see the difference (and you would not find) – Thrombin 13/7, 2009 at 6:5

Also, I'd suggest to put printf outsize the GetTickCount measurements. – Thrombin 13/7, 2009 at 6:6

Thanks, I've fixed code - now i calculates same values. And printf really is better to place outside, to make test cleaner. – Valdavaldas 13/7, 2009 at 8:36

You still do different things, even you calculate same values. – Thrombin 13/7, 2009 at 9:20

Still can you give me example where this two different operators can do absolutely same thing. Question - is that difference really so impacting? – Valdavaldas 13/7, 2009 at 9:22

The idea is not to do the same work but to do the same amount of work. – Valdavaldas 13/7, 2009 at 13:35

This code and its comments should demonstrate the differences between the two.

class a {
    int index;
    some_ridiculously_big_type big;

    //etc...

};

// prefix ++a
void operator++ (a& _a) {
    ++_a.index
}

// postfix a++
void operator++ (a& _a, int b) {
    _a.index++;
}

// now the program
int main (void) {
    a my_a;

    // prefix:
    // 1. updates my_a.index
    // 2. copies my_a.index to b
    int b = (++my_a).index; 

    // postfix
    // 1. creates a copy of my_a, including the *big* member.
    // 2. updates my_a.index
    // 3. copies index out of the **copy** of my_a that was created in step 1
    int c = (my_a++).index; 
}

You can see that the postfix has an extra step (step 1) which involves creating a copy of the object. This has both implications for both memory consumption and runtime. That is why prefix is more efficient that postfix for non-basic types.

Depending on some_ridiculously_big_type and also on whatever you do with the result of the incrememt, you'll be able to see the difference either with or without optimizations.

Backhouse answered 12/7, 2009 at 20:5 Comment(0)

In response to Mihail, this is a somewhat more portable version his code:

#include <cstdio>
#include <ctime>
using namespace std;

#define SOME_BIG_CONSTANT 100000000
#define OUTER 40
int main( int argc, char * argv[] ) {

    int d = 0;
    time_t now = time(0);
    if ( argc == 1 ) {
        for ( int n = 0; n < OUTER; n++ ) {
            int i = 0;
            while(i < SOME_BIG_CONSTANT) {
                d += i++;
            }
        }
    }
    else {
        for ( int n = 0; n < OUTER; n++ ) {
            int i = 0;
            while(i < SOME_BIG_CONSTANT) {
                d += ++i;
            }
        }
    }
    int t = time(0) - now;  
    printf( "%d\n", t );
    return d % 2;
}

The outer loops are there to allow me to fiddle the timings to get something suitable on my platform.

I don't use VC++ any more, so i compiled it (on Windows) with:

g++ -O3 t.cpp

I then ran it by alternating:

a.exe

and

a.exe 1

My timing results were approximately the same for both cases. Sometimes one version would be faster by up to 20% and sometimes the other. This I would guess is due to other processes running on my system.

Undersell answered 12/7, 2009 at 22:19 Comment(3)

So, it seems to have quite different behavior when compiled with different tools and run on different platforms. – Valdavaldas 13/7, 2009 at 8:44

Please try my code with your compiler before you make that assumption. – Undersell 13/7, 2009 at 8:48

10-11 vs 13. The more complicated program is, less difference in ++ you will see. – Valdavaldas 13/7, 2009 at 9:29

Perhaps you could just show the theoretical difference by writing out both versions with x86 assembly instructions? As many people have pointed out before, compiler will always make its own decisions on how best to compile/assemble the program.

If the example is meant for students not familiar with the x86 instruction set, you might consider using the MIPS32 instruction set -- for some odd reason many people seem to find it to be easier to comprehend than x86 assembly.

Alkalosis answered 13/7, 2009 at 9:9 Comment(0)

-4

Ok, all this prefix/postfix "optimization" is just... some big misunderstanding.

The major idea that i++ returns its original copy and thus requires copying the value.

This may be correct for some unefficient implementations of iterators. However in 99% of cases even with STL iterators there is no difference because compiler knows how to optimize it and the actual iterators are just pointers that look like class. And of course there is no difference for primitive types like integers on pointers.

So... forget about it.

EDIT: Clearification

As I had mentioned, most of STL iterator classes are just pointers wrapped with classes, that have all member functions inlined allowing out-optimization of such irrelevant copy.

And yes, if you have your own iterators without inlined member functions, then it may work slower. But, you should just understand what compiler does and what does not.

As a small prove, take this code:

int sum1(vector<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();x++)
            n+=*x;
    return n;
}

int sum2(vector<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();++x)
            n+=*x;
    return n;
}

int sum3(set<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();x++)
            n+=*x;
    return n;
}

int sum4(set<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();++x)
            n+=*x;
    return n;
}

Compile it to assembly and compare sum1 and sum2, sum3 and sum4...

I just can tell you... gcc give exactly the same code with -02.

Thrombin answered 12/7, 2009 at 19:45 Comment(4)

There are plenty of cases where the iterators are not just pointers", and where the compiler has little chance of optimizing it. – Sluggish 12/7, 2009 at 20:2

It seems important enough that every other text book points it out. I am looking for a clear example to demonstrate this concept to somebody new to programming. So don't tell me to forget about it. – Hudibrastic 12/7, 2009 at 20:5

I don't get: "The major idea that i++ returns its original copy and thus requires copying the value. This may be correct for some unefficient implementations of iterators." This is correct...always. i++ is suppose to return the original value. Calling something inefficient because it follows the standard seems a bit odd. – Apgar 12/7, 2009 at 20:14

In general, given that ++i would be faster than i++, the compiler could convert 'op(i++)' into 'op(i); ++i', as you point out. Now the fact is that whenever the iterator is not a basic type (pointer) and has user provided increment operators, the compiler cannot perform that optimization as it cannot know about possible side effects (including but not limited to the postincrement operation throwing an exception and thus never executing 'op(i)'). Now performance wise the difference could be negligible and not worth revisiting code, using pre instead of post increment as style rule is fine. – Discriminatory 12/7, 2009 at 22:31

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags