difference between conditional instructions (cmov) and jump instructions [duplicate]
Asked Answered
C

1

18

I'm confused where to use cmov instructions and where to use jump instructions in assembly?

From performance point of view:

  • What is the difference in both of them?
  • Which one is better?

If possible, please explain their difference with an example.

Christiniachristis answered 2/10, 2014 at 4:28 Comment(1)
what processor/family are you asking about, add the relevant tag please.Shelburne
A
21

movcc is a so-called predicated instruction. That's fancy-speak for "this instruction executes under a condition (predicate)".

Many processors, including the x86, after doing an arithmetic operation (especially compare instructions), sets the condition code bits to indicate the status of the result of the operation.

A conditional jump instruction checks the condition code bits for a status, and if true, jumps to a designated target.

Because the jump is conditional, and the processor typically has a deep pipeline, the condition code bits may literally not ready for the jmp instruction to process when the CPU encounters the jmp instruction. The chip designers could simply wait for the pipeline to drain (often many clock cycles), and then execute the jmp, but that would make the processor slow.

Instead, most of them choose to have a branch prediction algorithm, which predicts which way a conditional jump will go. The processor can then fetch, decode, and execute the predicted branch (or not), and continue fast execution, with the proviso that if the condition code bits that finally arrive turn out to be wrong for conditional (branch mispredict), the processor undoes all work it did after the branch, and re-executes the program going down the other path.

Conditional jumps are harder for pipelined execution than normal data dependencies, because they can change which instruction should be next in the stream of instructions flowing through the pipeline. This is called a control dependency, as opposed to a data dependency (like an add where both inputs are outputs of other recent instructions).

The branch predictors turn out to be very good, because most branches tend to have bias about their direction. (The branch at the end of most loops, is going to branch back to top, typically). So most of the time the processor doesn't have to back out of wrongly predicted work.

If the direction of the branch is highly unpredictable, then the processor will guess wrong about 50% of the time, thus have to back out work. That's expensive.

OK, now, one often finds code like this:

  cmp   ...
  jcc   $
  mov   register1, register2
$: ; continue here
  ...
  ; use register1

If the branch predictor guesses right, this code is fast, no matter which way the branch goes. If it guesses wrong a lot... ouch.

Thus the conditional move instruction. This is a move that conditionally moves data, based on the condition code bits. We can rewrite the above:

  cmp   ...
  movcc  register1, register2
$: ; continue here
  ...
  ; use register1

Now we have no branch instructions, and thus no mispredicts that make the processor undo all the work. Since there is no control dependency, the following instructions need to be fetched and decoded regardless of whether the movcc acts like a mov or nop. The pipeline can stay full without predicting the condition and speculatively executing instructions that use register1. (You could build a CPU that way, but it would defeat the purpose of movcc.)

movcc converts a control dependency into a data dependency. The CPU treats it exactly like a 3-input math instruction, with the inputs being EFLAGS and its two "regular" inputs (dest register and source register-or-memory). On x86, adc is identical to cmovae (mov if CF==0) as far as how out-of-order execution tracks the dependencies: inputs are CF, and both GP registers. Output is the destination register.

For the x86, there are cmovcc, jcc, and setcc instructions for every condition combination cc. (setcc sets the destination to 0 or 1, according to the condition. So it has a data dependency on the flags, and no other input dependencies.)

Ammonate answered 2/10, 2014 at 4:49 Comment(10)
Also good to note that a predicted jump is basically free, while a movcc has a small fixed cost. A predicted jump will always be faster. (on current x86 CPUs)Cisalpine
@Cory: I don't see any fundamental reason the movcc can't simply be ignored until the cc bits are valid, or the downstream register is used. Current CPUs don't do that? Instruction decode time is essentially zero due to the fact that it is happening ahead of execution.Ammonate
I guess I should have said "equal or faster" -- a predicted jump is always free, a movcc may or may not be. movcc is not predicted, it is treated just like an add basically -- it introduces a dependency, and can be reordered and can have its latency "hidden" by ILP.Cisalpine
This is the reason compilers don't go nuts with conditional moves when you have super simple branches. They need to be scheduled just right to be perfect, and if it can't be perfect its difficult for a compiler to predict a jump beforehand to know if the cost is worth it.Cisalpine
twitter.com/FioraAeterna/status/567724845866029058Petry
Interesting discussion on performance from Linus yarchive.net/comp/linux/cmov.htmlCalliecalligraphy
Any theory has to be tested experimentally. What happens when you actually test your code with both variants (branch+mov vs cmov)?.. Apparently, someone already did: github.com/xiadz/cmov and the graphs seem to point out that the branch has to be predicted correctly 99% of the time, else cmov clearly wins.Raynell
Folks: Dmitry's link to github shows a really nice analysis and result. cmov clearly wins, and by a lot.Ammonate
The answer is on the right track, but confuses things and unfortunately is quite wrong - the delay in condition code bits has nothing to do with the problem (it affects both cmov and branches and does not cause a problem). The problem is that by the time the cpu executes the branch, it likely has already fetched, decoded, and partially executed many following instructions - when the jump is mispredicted, these have to be undone and thrown away. This overhead can be as costly as dozens of instructions missed. conditional move does not suffer from this problem, but may require extra calculationsMonopetalous
I fixed the problems pointed out by @MarcLehmann's comment with an edit earlier this month. We should probably delete both these comments to avoid future confusion. (Delete yours and flag mine as obsolete if you agree, Marc).Unpredictable

© 2022 - 2024 — McMap. All rights reserved.