Purpose of cmove instruction in x86 assembly?

Asked 10/5, 2015 at 10:25 Answered 21/5, 2023 at 20:12

assembly x86 cpu-architecture instruction-set conditional-move

When disassembling an executable I encountered the cmove instruction. I've already searched on the Internet but I've only found that it's a conditional move, and if the source and destination are equal a mov occurs. What I don't understand yet is why I need it, since it doesn't change the operands. What is its purpose?

Cadmann answered 10/5, 2015 at 10:25 Comment(0)

The CMOVcc instructions don't compare the source and destination. They use the flags from a previous comparison (or other operation that sets the flags) which determines if the move should be done or not. (Intel manual)

Example; this copies edx to ecx if eax and ebx are equal:

cmp eax, ebx
cmove ecx, edx

This does the same as:

cmp eax, ebx
jne skip
  mov ecx, edx
skip:

Amatruda answered 10/5, 2015 at 10:34 Comment(3)

@Amatruda indeed it does the same thing, but the first one does not care about branch prediction failures. – Heartbreaking 3/5, 2019 at 11:43

Important point: cmov with a memory source operand like cmoveq eax, [edx] is an unconditional load that feeds an ALU select operation. It will fault on a bad address even if the condition is false. And the point of cmov is to have a data dependency instead of a control dependency (branch misprediction is possible). – Ingratiating 26/5, 2019 at 3:52

(typo in previous comment: it's cmove not cmoveq. Perhaps I was thinking of ARM's condition names that are always 2 letters long, like eq for equal, unlike x86's e for equal) – Ingratiating 30/4, 2021 at 11:29

The purpose of cmov is to allow software (in some cases) to avoid a branch.

For example, if you have this code:

    cmp eax,ebx
    jne .l1
    mov eax,edx
.l1:

..then when a modern CPU sees the jne branch it will take a guess about whether the branch will be taken or not taken, and then start speculatively executing instructions based on the guess. If the guess is wrong there's a performance penalty, because the CPU has to discard any speculatively executed work and then start fetching and executing the correct path.

For a conditional move (e.g. cmove eax,edx) the CPU doesn't need to guess which code will be executed and the cost of a mispredicted branch is avoided. However, the CPU can't know if the value in eax will change or not, which means that later instructions that depend on the results of the conditional move have to wait until the conditional move completes (instead of being speculatively executed with an assumed value and not stalling).

This means that if the branch can be easily predicted a branch can be faster; and if the branch can't be easily predicted the condition move can be faster.

Note that a conditional move is never strictly needed (it can always be done with a branch instead) - it's more like an optional optimization.

Apotheosize answered 26/5, 2019 at 8:51 Comment(0)

I've already searched on the Internet but I've only found that it's a conditional move, and if the source and destination are equal a mov occurs.

Your conclusion is incorrect.

The move happens if the previous instruction changed CPU flags (CF, ZF, OF, SF) in such a way that they satisfy the condition specified by CMOVcc instruction encoding.

The cc suffix in the instruction name is a placeholder for a condition code which can be any of the A, AE, B, BE, C, E, G, GE, L, LE, NA, NAE, NB, NBE, NC, NE, NG, NGE, NL, NLE, NO, NP, NS, NZ, O, P, PE, PO, S, and Z.

In your case, CMOVE will perform source to destination move if the zero flag (ZF) is set.

What I don't understand yet is why I need it, since it doesn't change the operands.

As I explained above, it does change the destination if the condition is met.

What is its purpose?

The purpose is to help avoid conditional branches.

The reason why you want to avoid conditional branches is that it's hard for the CPU to predict whether those branches will be taken or not taken.

Given that the x86 architecture is highly speculative and out-of-order, it will start decoding and executing the code at the branch it predicted as taken well ahead of the actual need to improve code execution performance.

If it later turns out that the branch was predicted incorrectly, the CPU will have to flush the execution pipeline and start executing the other code path from scratch.

If this happens in a loop or other critical code path it can degrade the code performance significantly, hence you have CMOVcc so you can avoid conditional branches where possible.

Jennine answered 21/5, 2023 at 20:12 Comment(3)

A good example of CMOV vs. branchy is gcc optimization flag -O3 makes code slower than -O2 where poor use of cmov (in a way that makes the critical path latency of a loop-carried dependency chain longer than it needs to be) leads to worse performance on sorted data where a branch predicts near-perfectly. But still much better performance on unpredictable data. – Ingratiating 21/5, 2023 at 22:59

@PeterCordes Edge cases are possible, but on the other hand Intel CPUs newer than Skylake (6th gen) have better performance using CMOV .vs. branching in almost all cases. – Jennine 21/5, 2023 at 23:13

Yeah, that was exactly my point. On unsorted data, branchless code using cmov is way better, and my answer discusses the details and potential pitfalls of running as fast as possible with branchless code. BTW, it was Broadwell that made cmov a single uop on Intel CPUs (except for cmovbe / cmova that need two separate FLAGS inputs, from CF and the SPAZO group, so are still 2 uops total. uops.info) – Ingratiating 21/5, 2023 at 23:31

The CMOVcc instructions check the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and perform a move operation if the flags are in a specified state (or condition). A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, a move is not performed and execution continues with the instruction following the CMOVcc instruction.

These instructions can move a 16- or 32-bit value from memory to a general-purpose register or from one general-purpose register to another. Conditional moves of 8-bit register operands are not supported.

The terms "less" and "greater" are used for comparisons of signed integers and the terms "above" and "below" are used for unsigned integers.

https://wiki.cheatengine.org/index.php?title=Assembler:Commands:CMOVE

Klapp answered 4/8, 2020 at 2:38 Comment(0)

Recommended topics

Hot tags