When disassembling an executable I encountered the cmove
instruction. I've already searched on the Internet but I've only found that it's a conditional move, and if the source and destination are equal a mov
occurs. What I don't understand yet is why I need it, since it doesn't change the operands. What is its purpose?
The CMOVcc
instructions don't compare the source and destination. They use the flags from a previous comparison (or other operation that sets the flags) which determines if the move should be done or not. (Intel manual)
Example; this copies edx
to ecx
if eax
and ebx
are equal:
cmp eax, ebx
cmove ecx, edx
This does the same as:
cmp eax, ebx
jne skip
mov ecx, edx
skip:
cmov
with a memory source operand like cmoveq eax, [edx]
is an unconditional load that feeds an ALU select operation. It will fault on a bad address even if the condition is false. And the point of cmov
is to have a data dependency instead of a control dependency (branch misprediction is possible). –
Ingratiating cmove
not cmoveq
. Perhaps I was thinking of ARM's condition names that are always 2 letters long, like eq
for equal, unlike x86's e
for equal) –
Ingratiating The purpose of cmov
is to allow software (in some cases) to avoid a branch.
For example, if you have this code:
cmp eax,ebx
jne .l1
mov eax,edx
.l1:
..then when a modern CPU sees the jne
branch it will take a guess about whether the branch will be taken or not taken, and then start speculatively executing instructions based on the guess. If the guess is wrong there's a performance penalty, because the CPU has to discard any speculatively executed work and then start fetching and executing the correct path.
For a conditional move (e.g. cmove eax,edx
) the CPU doesn't need to guess which code will be executed and the cost of a mispredicted branch is avoided. However, the CPU can't know if the value in eax
will change or not, which means that later instructions that depend on the results of the conditional move have to wait until the conditional move completes (instead of being speculatively executed with an assumed value and not stalling).
This means that if the branch can be easily predicted a branch can be faster; and if the branch can't be easily predicted the condition move can be faster.
Note that a conditional move is never strictly needed (it can always be done with a branch instead) - it's more like an optional optimization.
I've already searched on the Internet but I've only found that it's a conditional move, and if the source and destination are equal a mov occurs.
Your conclusion is incorrect.
The move happens if the previous instruction changed CPU flags (CF, ZF, OF, SF) in such a way that they satisfy the condition specified by CMOVcc
instruction encoding.
The cc
suffix in the instruction name is a placeholder for a condition code which can be any of the A, AE, B, BE, C, E, G, GE, L, LE, NA, NAE, NB, NBE, NC, NE, NG, NGE, NL, NLE, NO, NP, NS, NZ, O, P, PE, PO, S, and Z.
In your case, CMOVE
will perform source to destination move if the zero flag (ZF) is set.
What I don't understand yet is why I need it, since it doesn't change the operands.
As I explained above, it does change the destination if the condition is met.
What is its purpose?
The purpose is to help avoid conditional branches.
The reason why you want to avoid conditional branches is that it's hard for the CPU to predict whether those branches will be taken or not taken.
Given that the x86 architecture is highly speculative and out-of-order, it will start decoding and executing the code at the branch it predicted as taken well ahead of the actual need to improve code execution performance.
If it later turns out that the branch was predicted incorrectly, the CPU will have to flush the execution pipeline and start executing the other code path from scratch.
If this happens in a loop or other critical code path it can degrade the code performance significantly, hence you have CMOVcc
so you can avoid conditional branches where possible.
cmov
(in a way that makes the critical path latency of a loop-carried dependency chain longer than it needs to be) leads to worse performance on sorted data where a branch predicts near-perfectly. But still much better performance on unpredictable data. –
Ingratiating cmov
is way better, and my answer discusses the details and potential pitfalls of running as fast as possible with branchless code. BTW, it was Broadwell that made cmov
a single uop on Intel CPUs (except for cmovbe
/ cmova
that need two separate FLAGS inputs, from CF and the SPAZO group, so are still 2 uops total. uops.info) –
Ingratiating The CMOVcc instructions check the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and perform a move operation if the flags are in a specified state (or condition). A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, a move is not performed and execution continues with the instruction following the CMOVcc instruction.
These instructions can move a 16- or 32-bit value from memory to a general-purpose register or from one general-purpose register to another. Conditional moves of 8-bit register operands are not supported.
The terms "less" and "greater" are used for comparisons of signed integers and the terms "above" and "below" are used for unsigned integers.
https://wiki.cheatengine.org/index.php?title=Assembler:Commands:CMOVE
© 2022 - 2024 — McMap. All rights reserved.