Update: see sh1's answer for the current situation: there's a conditional-zero instruction, like cmov
from x0
. The full cmov
was dropped from the planned discussions before extension B made it to v1.0 (and extension B was split into some separate parts). An article has some details and links on the situation as of mid 2023.
Current compilers no longer support b
as a single-letter extension name either.
The proposed RISC-V extension B includes cmov
(with 4 operands: 3 inputs and a separate destination!). (Version 0.93 was current when the rest of this answer was written.)
I think David Patterson (one of the lead architects behind MIPS and RISC-V) really dislikes cmov
(along with short-vector SIMD like SSE/AVX) and thinks CPUs should specially handle "hammock" branches (that jump forward over a single instruction like a move) if they want to do that. Something like that. So this seems to be a case of philosophical purity getting in the way of including useful instructions. (AArch64 is a much more pragmatic design, still being RISC in the ways that matter for a high-performance implementation.)
And/or perhaps a desire to limit instructions to at most 2 inputs, if there aren't any other 3-input instructions. That means a scalar pipeline only needs 2 register read ports, not 3, if it strictly follows this restriction. (That also means no add-with-carry, making extended-precision math quite a pain for numbers wider than 2 registers, when you have to deal with carry-in and carry-out to the same add operation.)
You can emulate cmov
as you say with a mask for AND/ANDnot/OR, but that would take quite a few instructions and is usually not worth it except possibly on wide and deep out-of-order machines, where the amount of work discarded by a branch miss is a lot bigger. (mask = (c == 0) - 1;
which you can do with sltiu
/ add reg,reg, -1
to turn 0 into -1 and 1 into 0.)
You kind of have it backwards in terms of which kind of microarchitecture benefits more from CMOV, although there are potential benefits either way. And an in-order machine already kind of has to wait at a conditional branch for the condition to resolve, vs. an out-of-order machine treating control dependencies very differently from data dependencies. As discussed in gcc optimization flag -O3 makes code slower than -O2, data dependencies through cmov
can create a loop-carried dependency chain that's a bigger bottleneck that highly predictable branches.
There are some out-of-order exec RISC-V designs, maybe even some that are open-source. For example, Erik Eidt linked The Berkeley Out-of-Order Machine (BOOM).
Extension B: where they put all the fun instructions they left out
The RISC-V extension B proposal has a conditional move, along with scalar min/max, popcount, leading/trailing zero count, bitfield insert/extract, two-register shifts, and a bunch of more esoteric stuff. https://five-embeddev.com/riscv-bitmanip/draft/bext.html#conditional-move-cmov
Looking at the list of proposed instructions, it's amazing what got left out of baseline RISC-V, like sign-extension of narrow integers (currently requires slli/srai) if it's not already guaranteed by the calling convention or a load instruction, and standard stuff like popcount and leading/trailing zero count that most ISAs have.
Godbolt shows clang 12.0 using cmov
, min
, and sext.b
. In that clang version, -O3 -Wall -menable-experimental-extensions -march=rv32gcb0p93
was the magic incantation to do that. Extension B 0.93 is enabled by the b0p93
part of the string. (Extension B isn't finalized, and I don't know what version clang 14.0 was looking for; its error message wasn't helpful, and just plain -march=rv32gcb
didn't get the compiler to actually use cmov
.)
// -march=rv32gcb0p93 includes extension b 0.93 (0p93)
int sel(int x, int y, int c){
return c ? x : y;
}
# extension B clang
cmov a0, a2, a0, a1
ret
# baseline gcc11.3 (clang and GCC12 waste several mv instructions)
bne a2,zero,.L2
mv a0,a1
.L2:
ret
int min(int x, int y, int c){
return (x<y) ? x : y;
}
# extension B clang
min a0, a0, a1
ret
# baseline gcc
ble a0,a1,.L5
mv a0,a1
.L5:
ret
int sext(int c){
return (signed char)c;
}
# extension B clang
sext.b a0, a0
ret
# baseline gcc
slli a0,a0,24
srai a0,a0,24
ret
sub
/and
on the opposite condition. Yes, for bithacks and branchless stuff, 0 / -1 would be more useful. But since C implementations typically use abool
whose object representation must be 0 / 1 to allow cheaper conversion to int, that's what MIPS and RISC-V did for their compare-into-register instructions. (And/or possibly other reasons.) – Swagmanmask = (c==0)-1
. Like x86test
/setcc
/dec
. It might or might not actually be worth doing on RISC-V, depending on the microarchitecture and how unpredictable it is. – Swagman