Does the x86-64 pipeline stall on an indirect jump like JMP RAX?
Asked Answered
B

1

6

In x86-64, if you use the following assembly code:

MOV RAX, (memory address)
JMP RAX

Does the pipeline stall before executing the branch (to wait for MOV to finish with RAX), or will it flush the pipeline like a conditional branch?

Bourg answered 22/5, 2016 at 3:1 Comment(2)
Even mispredicted conditional branches don't have to fully flush the pipeline in modern designs. The pipeline can keep all the correct work it did on instructions before the mispredicted branch. This applies to Intel SnB-family for sure, and maybe Core2; I forget, but Agner Fog's microarch guide might sayQuoin
re: fast-recovery on branch mispredicts: What exactly happens when a skylake CPU mispredicts a branch?Quoin
M
7

For most modern 80x86 CPUs; there's static prediction (no history to use to make a better prediction) and dynamic prediction (where there's history from previous executions that can be used).

For static prediction CPU predicts that execution will continue at the instruction immediately after the JMP RAX. I'm not entirely sure which CPUs use dynamic prediction for JMP RAX (rather than only for the Jc branches); but for those that do it'd override the static prediction.

Once the CPU has a predicted target address, it speculatively executes until it finds out if it predicted right/wrong. If it predicted right it keeps all the work it did and the JMP RAX would have little or no cost.

If CPU predicted wrong then that's no different to any other branch misprediction (discard all the work that was speculatively executed and go back to fetch/decode at the correct RIP).

Note that if your JMP RAX is unpredictable or it's too unlikely that the instruction after it is going to be the target of the jump; Intel recommends putting a PAUSE or UD2 immediately after the jump to prevent unnecessary speculative execution. In this case the CPU would stall (do nothing until it finds out the correct jump target).

Also note that you'd want to move the MOV RAX, .. so that it's executed as soon as possible, so that the target of the jump is known as soon as possible, so that you minimise the time spent stalled or speculatively executing the wrong thing.

Motorman answered 22/5, 2016 at 5:24 Comment(3)
statically predicting not-taken doesn't make any sense for an unconditional indirect branch. IDK what happens when no branch-target-buffer prediction is available, but I doubt that any CPUs speculatively execute the following instructions on the assumption that the branch target is the next instruction. Although maybe I'm wrong about that if pause or ud2 is recommended. I do remember reading about using ud2 to block speculative execution in some case, but I'd forgotten where.Quoin
All modern x86 CPUs, even low-power designs like Atom, have at least some capacity to predict branch-targets for indirect branches. Larger designs like SnB-family can even recognize short patterns for indirect branch target addresses.Quoin
@PeterCordes: In most versions of Intel's Optimisation Guide it'll say something like (taken almost verbatim from "E.1 Rule 13" from April 2012 version): "When indirect branches are present, try to put the most likely target of an indirect branch immediately following the indirect branch. Alternatively, if indirect branches are common but can't be predicted then follow the indirect branch with a UD2 instruction, which will stop the processor decoding down the fall-through path". Of course there's more detail provided throughout the guide (and I condensed to fit the comment box character limit)Motorman

© 2022 - 2024 — McMap. All rights reserved.