branch prediction on a function pointer
Asked Answered
T

3

8

I have a loop that is running over and over again. The logic inside that loop is dependent on the mode that the program is in. To improve performance I was thinking that an array of function pointers can be initialized, functionPtr[], so that would just call functionPtrmode that runs the right logic. The loop will stay in the same mode for many cycles (the number is unknown upfront but many thousands). The program runs on an intel x64 machine only and needs no portability.

I was hoping that the CPU would utilize branch prediction but since my branch isn't conditional (on the assembly level) but the location of the branch does depend on a variable, (functionPtr+mode). Will the CPU attempt to calculate functionPtr+mode and start pulling those instructions in while in the pipeline?

Toole answered 7/10, 2014 at 15:29 Comment(9)
It's nothing to do with branch prediction, but it should be fine - there should be no pipeline stall even for an indirect function call.Ubiety
I see you are using c++. Can't you make two classes which inherit from the same interface then instantiate the correct one depending on the mode ?Schizont
If there are only a few cases, you could make the function in the loop a template parameter. Select at runtime (eg switch statement), a templated function containing the loop. Be aware this will increase code size.Cricoid
@Vincent, I think that has the same issue, if you have foo[mode].run() then you still have a pointer to foo[mode] to calculate and a function to call. Either way in my case The info needed in the run() call modifies variables that will be used in the next run() call, even for a different mode, so keeping it in one class makes sense.Toole
@Neil, that doesn't sound like it will work when the mode will switch throughout the loop unpredictably.Toole
@Toole Sorry I missed the part where it will change mid-loop!Cricoid
To improve performance, you should focus on reducing the number of branches (such as using Loop Unrolling), which will have a higher impact that branch prediction. Proper data cache usage will also have more significant impact than branch prediction. At least these 2 techniques can be applied to any platform, whereas branch prediction is platform specific.Thorley
@Toole OK I didn't understand well your requirements. I know this could be an extreme solution but... Maybe you can compile the code on the fly ? bellard.org/tccSchizont
Also a Switch statement is a solution significantly faster than a function pointer.Schizont
T
5

Yes, reasonably recent processors can do (at least something like) branch prediction for indirect jumps.

From the Pentium (Intel's first to do branch prediction) through the first Pentium IV's, all that was used for indirect branches was the Branch Target Buffer (BTB). This meant that they "predicted" such branches correctly when (and only when) the target was exactly identical to the previous target--which sounds like it's adequate for your case.

Starting with the Pentium M/Prescott (the last Pentium IV) Intel improved branch prediction for indirect jumps to use a two-level adaptive predictor. If I'm understanding your question correctly (i.e., your loop will execute with the same target for many consecutive iterations, and those are what you care about) even just the BTB would be sufficient for your purposes. The two level predictor would become more useful if (for example) you were branching on the least significant bit of consecutive numbers, so you had a predictable pattern of jumping to one target in one iteration, and the other in the next iteration. With a pattern like this, the BTB alone would always predict the branch incorrectly, but the two-level predictor in a current processor would predict correctly (after the first couple of iterations, so the pattern could be detected).

Transvestite answered 7/10, 2014 at 15:54 Comment(0)
T
6

From The microarchitecture of Intel, AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers

http://www.agner.org/optimize/microarchitecture.pdf

section 3.7 (for Sandy Bridge, other processors are in other sections) Pattern recognition for indirect jumps and calls Indirect jumps and indirect calls (but not returns) are predicted using the same two-level predictor as branch instructions.

A pointer to a function is an indirect call.

Toole answered 7/10, 2014 at 16:7 Comment(0)
T
5

Yes, reasonably recent processors can do (at least something like) branch prediction for indirect jumps.

From the Pentium (Intel's first to do branch prediction) through the first Pentium IV's, all that was used for indirect branches was the Branch Target Buffer (BTB). This meant that they "predicted" such branches correctly when (and only when) the target was exactly identical to the previous target--which sounds like it's adequate for your case.

Starting with the Pentium M/Prescott (the last Pentium IV) Intel improved branch prediction for indirect jumps to use a two-level adaptive predictor. If I'm understanding your question correctly (i.e., your loop will execute with the same target for many consecutive iterations, and those are what you care about) even just the BTB would be sufficient for your purposes. The two level predictor would become more useful if (for example) you were branching on the least significant bit of consecutive numbers, so you had a predictable pattern of jumping to one target in one iteration, and the other in the next iteration. With a pattern like this, the BTB alone would always predict the branch incorrectly, but the two-level predictor in a current processor would predict correctly (after the first couple of iterations, so the pattern could be detected).

Transvestite answered 7/10, 2014 at 15:54 Comment(0)
P
1

Branch Prediction is for actual branches where we do not know until evaluation of branch, which tells which of the instruction is to be executed next. But since in your code next instruction is known depending on mode we are in, there is no need of any prediction neither will their be any wait in pipeline.

Given that there is enough time between mode change and instruction options, pipeline will successfully fetch the right instruction every time without any extra effort.

Pannikin answered 7/10, 2014 at 15:37 Comment(1)
Indirect branches still need prediction. It's an easy prediction when it's the same every time for a given branch, but it still consumes entries in the BTB (Branch Target Buffer). Data from memory / registers isn't available at decode time, let alone at fetch time before decode, so prediction is still needed even if the data hasn't changed recently. Unless you use self-modifying machine code to rewrite relative branch instructions, the target isn't easily available in the part of the CPU that needs it.Selfexpression

© 2022 - 2024 — McMap. All rights reserved.