Is cutting if statements by using function pointers going to be more efficient?
Asked Answered
H

8

7

So, there's this rule to try to pull if statements out of high repetition loops:

for( int i = 0 ; i < 10000 ; i++ )
{
    if( someModeSettingOn )  doThis( data[i] ) ;
    else  doThat( data[i] ) ;
}

They say, it's better to break it up, to put the if statement outside:

if( someModeSettingOn )
  for( int i = 0 ; i < 10000 ; i++ )
    doThis( data[i] ) ;
else
  for( int i = 0 ; i < 10000 ; i++ )
    doThat( data[i] ) ;      

(In case you're saying "Ho! Don't optimize that yourself! The compiler will do it!") Sure the optimizer might do this for you. But in Typical C++ Bullshit (which I don't agree with all his points, eg his attitude towards virtual functions) Mike Acton says "Why make the compiler guess at something you know? Pretty much best point of those stickies, for me.

So why not use a function pointer instead?

FunctionPointer *fp ;
if( someModeSettingOn )  fp = func1 ;
else fp = func2 ;

for( int i = 0 ; i < 10000 ; i++ )
{
    fp( data[i] ) ;
}

Is there some kind of hidden overhead to function pointers? Is it is efficient as calling a straight function?

Hodson answered 19/6, 2012 at 14:35 Comment(1)
You should profile it in your own circumstances and see. Branch prediction has come a long way to reducing the overhead of a consistent if statement.Specialist
A
7

In this example it's impossible to say which case will be faster. You need to profile this code on target platform/compiler to estimate it.

And in general, in 99% case such code need not to be optimized. It's example of evil premature optimization. Write human-readable code and optimize it only if need after profiling.

Airman answered 19/6, 2012 at 14:55 Comment(0)
J
6

Don't guess, measure.

But, if I absolutely had to guess, I'd say the third variant (function pointer) is going to be slower than the second variant (if outside loops), which I suspect might play with CPU's branch prediction better.

The first variant may or may not be equivalent to the second one, depending on how smart the compiler is, as you have already noted.

Jay answered 19/6, 2012 at 15:7 Comment(0)
P
6

Why make the compiler guess at something you know?

Because you may complicate the code for future maintainers without providing any tangible benefit to the users of your code. This change smells strongly of premature optimization and only after profiling would I consider anything other than the obvious (if inside loop) implementation.

Given that profiling shows it to be a problem then as a guess I believe pulling the if out of the loop would be faster than the function pointer because the pointer may add a level of indirection that the compiler can't optimize away. It will also decrease the likelihood that the compiler can inline any calls.

However I would also consider an alternate design using an abstract interface instead of an if within the loop. Then each data object already knows what to do automatically.

Poplar answered 19/6, 2012 at 15:31 Comment(1)
Not only will the indirection of the pointer hurt, but the function call itself might prevent some optimizations.Specialist
G
2

My bet would be on the second version to be the fastest with the if/else outside the loop provided that I get a refund when we tie and test this across the widest range of compilers. :-D I make this bet with quite a number of years with VTune in hand.

That said, I would actually be happy if I lost the bet. I think it's very feasible that many compilers nowadays could optimize the first version to rival the second, detecting that you're repeatedly checking a variable which doesn't change inside the loop and therefore effectively hoisting the branching to occur outside the loop.

However, I haven't encountered a case yet where I've seen an optimizer do the analogical equivalent of inlining an indirect function call... though if there was a case where an optimizer could do this, yours would definitely be the easiest since it assigns the addresses to the functions to call in the same function in which it calls those functions through the function pointers. I'd be really pleasantly surprised if optimizers can do that now, especially because I like your third version best from a maintainability standpoint (easiest one to change if we want to add new conditions which lead to different functions to call, e.g.).

Still, if it fails to inline, then the function pointer solution will have a tendency to be the most costly, not only because of the long jump and potentially the additional stack spills and so forth, but also because the optimizer will lack information -- there's an optimizer barrier when it doesn't know what function is going to be called through a pointer. At that point it can no longer coalesce all this information in IR and do the best job of instruction selection, register allocation, etc. This compiler design aspect of indirect function calls isn't discussed quite as often, but is potentially the most expensive part of calling a function indirectly.

Gunboat answered 6/1, 2018 at 12:47 Comment(0)
M
1

Not sure if it qualifies as "hidden", but of course using a function pointer requires one more level of indirection.

The compiler has to generate code to dereference the pointer, and then jump to the resulting address, as opposed to code that just directly jumps to a constant address, for a normal function call.

Misti answered 19/6, 2012 at 14:57 Comment(2)
"code that just directly jumps to a constant address, for a normal function call" - and that's the worst case of a normal function call. It might be inlined.Brit
Mind you, there isn't necessarily "one more level of indirection" in using a function pointer. It's some kind of call <register> instruction, compared to some kind of call <constant> function (with the constant fixed up by the dynamic linker), or conceivably set <register> <constant> then call <register>. I don't think it would generally be said that x + y involves an extra level of indirection compared with x + 12345. If stuff gets spilled to stack, then maybe it's extra work to use the variable, but the real reasons for function pointers generally being slower are not this.Brit
E
1

You have three cases:

If inside the loop, function pointer de-ref inside the loop, if outside the loop.

Of the three, WITH NO COMPILER OPTIMIZATION, the third is going to be the best. The first does a conditional and the second does a pointer de-reference on top of the code you want to run, while the third just runs what you want it to.

If you want to optimize yourself do NOT do the function pointer version! If you don't trust the compiler to optimize, then the extra indirection might end up costing you, and it's a lot easier to break accidentally in the future (in my opinion).

Eurystheus answered 19/6, 2012 at 15:9 Comment(3)
Your answer would have been stronger if you had ordered the three cases the same as the question. And I disagree that if outside the loop is guaranteed to be faster even with no optimization. I do agree that it's unlikely to be slower.Specialist
@MarkRansom - Interesting. What's your thought on when it wouldn't be slower? The only thing I can come up with is if the CPU is doing branch prediction, and you never lose the CPU in the course of the loop.Eurystheus
Not just branch prediction - many loops are memory bandwidth constrained so optimizing every last cycle doesn't really help. Oh how I miss the days when you could look at an instruction and know exactly how many clocks it would take! OK, maybe not.Specialist
S
1

You have to measure which is faster - but I very much doubt the function pointer answer will be faster. Checking a flag probalby has zero latency on modern processors with deep multiple pipelines. Whereas a function pointer will make it likely that the compiler will be forced to do an actual function call, pushing registers etc.

"Why make the compiler guess at something you know?"

Both you and the compiler know some things at compile time - but the processor knows even more things at run time - like if there are empty pipelines in that inner loop. The days of doing this kind of optimization are gone outside of embedded systems and graphics shaders.

Scholiast answered 19/6, 2012 at 16:43 Comment(0)
V
1

The others all raise very valid points, most notably that you have to measure. I want to add three things:

  1. One important aspect is that using function pointers often prevents inlining, which can kill the performance of your code. But it definitely depends. Try to play around with the godbolt compiler explorer and have a look at the assembly generated:

    https://godbolt.org/g/85ZzpK

    Note than when doThis and doThat are not defined, e.g. as could happen across DSO boundaries, there won't be much of a difference.

  2. The second point is related to the branch prediction. Have a look at https://danluu.com/branch-prediction/. It should make it clear that the code you have here is actually an ideal case for the branch predictor and thus you probably don't have to bother. Again, a good profiler like perf or VTune will tell you whether you are suffering from branch mispredictions or not.

  3. Finally, there was at least one scenario I've seen where hoisting out the conditionals form a loop made a huge difference, despite the above reasoning. This was in a tight mathematical loop, which was not getting auto-vectorized due to the conditionals. GCC and Clang can both output reports about what loop gets vectorized, or why that wasn't done. In my case, a conditional was indeed the issue for the autovectorizer. This was with GCC 4.8 though, so things may have changed since then. With Godbolt, it's pretty easy to check whether this is an issue for you. Again, always measure on your target machine and check whether you are affected or not.

Vested answered 6/1, 2018 at 13:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.