Cost of a virtual function in a tight loop
Asked Answered
A

4

6

I am in a situation where I have game objects that have a virtual function Update(). There are a lot of game objects (currently a little over 7000) and the loop calls update for all of them (amongst other things). My colleague suggested that we should remove the virtual function altogether. As you can imagine, this will take a lot of refactoring.

I have seen this answer but in my case, profiling means I have to change a lot of code. So before I even think of starting I thought I'd ask here for opinion on whether refactoring is worth it in this case.

Note that I have profiled other parts of the loop and have been trying to optimize the parts that are taking the longest. I suspect that the virtual function calls in this case is something I should not worry about, but I cannot be sure until I profile and I cannot profile until I change the code (which is a lot). Also note that some update functions are very small while others are larger more complex.

EDIT: There are multiple answers that give great insight, so anybody who stumbles onto this question in the future, have a look at all the answers and not just the selected one.

Avraham answered 6/7, 2011 at 15:37 Comment(3)
From the information you provided, it seems refactoring it could be difficult or impossible. The reason is that there are several different kinds of Update() functions. All you're going to get by the refactoring is that virtual function calls will be replaced by switch-case or if-statements, which are not any better in performance.Karissakarita
How dare somebody ask anything about optimization! :P ... as usual, -1 on these types of questions without explaining why. Anyway, thanks to everyone who replied. All I needed for another opinion, which I got. Not sure why some people are hell bent on burying everything and anything that even has the word optimizeAvraham
Same here. I'm working on my own game engine, which currently runs at about 10fps on a top of the line machine. Not sure where to look but I'm afraid to ask. In my engine I'm using two lists, one for all objects (which way outnumber the live objects in my system - collisions don't make an object active) and one for live objects.Ambrotype
C
6

If you can't profile, have a look at the assembler code to get an idea how expensive the lookup really is. It might be a simple indirect jump which costs almost nothing.

If you need to refactor, here is a suggestion: Create lots of "UpdateXxx" classes which know how to call the new non-virtual update() method. Collect those in an array and then call update() on them.

But my guess is that you won't save much, especially not with only 7K objects.

Note on profiling: If you can't use a profiler (makes me wonder why not), time the calls to update() and log calls which take longer than, say, 100ms. The timing isn't expensive and it allows you to quickly figure out which calls are most expensive.

Cr answered 6/7, 2011 at 15:43 Comment(1)
That is a very good suggestion. Considering yours and dascandy's answer, I will forget about this for now but in my spare time will give your suggestion a shot and see what happens (+1)Avraham
A
12

A virtual function call is not going to add much more than a single indirection and a hard-to-predict jump. That means that usually you're down one pipeline flush or about 20 cycles per virtual function. 7000 of them is about 140000 cycles, which should be negligible compared to your average update function. If it isn't, say that most of your update functions are just empty, you can consider putting the update-able objects in a separate list for this purpose.

Removing the virtual function is just going to lead to one of you replacing it with an identical but self-implemented system. This is the exact kind of place where a virtual function makes sense.

Per reference, 140000 cycles is about 50 microseconds. That's assuming a P4 with a huge pipeline and always a full pipeline flush (which you don't usually get).

Ambrotype answered 6/7, 2011 at 15:42 Comment(1)
Excellent, that will definitely help in my decision, thanks (+1)Avraham
K
9

Although it's not the same code and may not be the same compiler as you're using, here's a bit of reference data from a rather old benchmark (bench++ by Joe Orost):

Test Name:   F000005                         Class Name:  Style
CPU Time:        7.70  nanoseconds           plus or minus      0.385
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way if/else if statement
 compare this test with F000006


Test Name:   F000006                         Class Name:  Style
CPU Time:        2.00  nanoseconds           plus or minus     0.0999
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way switch statement
 compare this test with F000005


Test Name:   F000007                         Class Name:  Style
CPU Time:        3.41  nanoseconds           plus or minus      0.171
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way sparse switch statement
 compare this test with F000005 and F000006


Test Name:   F000008                         Class Name:  Style
CPU Time:        2.20  nanoseconds           plus or minus      0.110
Wall/CPU:        1.00  ratio.                Iteration Count:  1677721600
Test Description:
 Time to test a global using a 10-way virtual function class
 compare this test with F000006

This particular result is from compiling with the 64-bit edition of VC++ 9.0 (VS 2008), but it's reasonably similar to what I've seen from other recent compilers. The bottom line is that the virtual function is faster than most of the obvious alternatives, and very close to the same speed as the only one that beats it (in fact, the two being equal is within the measured margin of error). That, however, depends on the values involved being dense -- as you can see in F00007, if the values are sparse, the switch statement produces code that's slower than the virtual function call.

Bottom line: The virtual function call is probably the wrong place to look. Refactored code might easily work out slower, and even at best it probably won't gain enough to notice or care about.

Kolosick answered 6/7, 2011 at 16:38 Comment(6)
Wow, excellent, thanks for taking the time to do this. It solidifies my decision to not worry about it (+1)Avraham
+1 As an afterthought: C++ has been around for a couple of years and it's not surprising that hardware developers have started to build CPUs which can run it efficiently. :-)Cr
@Jerry: Could you provide a link to the source by any chance? I cannot seem to get search results for bench++Avraham
@Samaursa: I wish I could, but doing a bit of looking, I can't seem to find it either. I suppose I could upload it somewhere. Have any suggestions on good places for that?Kolosick
@Jerry: You can post it at githug or bitbucket?Avraham
Awesome thanks for adding actual times. I never can substantiate these with fact-based data (only with reasoned data) which, while still true carries less conviction. These numbers help immensely. You should be upvoted more.Ambrotype
C
6

If you can't profile, have a look at the assembler code to get an idea how expensive the lookup really is. It might be a simple indirect jump which costs almost nothing.

If you need to refactor, here is a suggestion: Create lots of "UpdateXxx" classes which know how to call the new non-virtual update() method. Collect those in an array and then call update() on them.

But my guess is that you won't save much, especially not with only 7K objects.

Note on profiling: If you can't use a profiler (makes me wonder why not), time the calls to update() and log calls which take longer than, say, 100ms. The timing isn't expensive and it allows you to quickly figure out which calls are most expensive.

Cr answered 6/7, 2011 at 15:43 Comment(1)
That is a very good suggestion. Considering yours and dascandy's answer, I will forget about this for now but in my spare time will give your suggestion a shot and see what happens (+1)Avraham
S
2

another test with virtual, inline and direct calls you may find here [enter link description here][1] Virtual functions and performance - C++

Schramke answered 6/7, 2011 at 15:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.