With any known pre-conditions, there are some tricks that are more efficient (in terms of speed) than the push imm8/pop rax 3-byte solution.
For speed mov eax, 1
has many advantages, because it doesn't have any input dependencies and it's only one instruction. Out-of-order execution can get started on it (and anything that depends on it) without waiting for other stuff. (See Agner Fog's guides and the x86 tag wiki).
Obviously many of these take advantage of the fact that writing a 32-bit register zeros the upper half, to avoid the unnecessary REX prefix of the OP's code. (Also note that xor rax,rax
is not special-cased as a zeroing idiom on Silvermont. It only recognizes xor-zeroing of 32-bit registers, like eax or r10d, not rax or r10.)
If you have a small known constant in any register to start with, you can use
lea eax, [rcx+1] ; 3 bytes: opcode + ModRM + disp8
disp8 can encode displacements from -128 to +127.
If you have an odd number in eax, and eax, 1
is also 3 bytes.
In 32-bit code, inc eax
only takes one byte, but those inc/dec opcodes were repurposed as REX prefixes for AMD64. So xor eax,eax
/ inc eax
is 4 bytes in x86-64 code, but only 3 in 32-bit code. Still, if saving 1 byte over a mov eax,1
is sufficient, and LEA or AND won't work, this is more efficient than push/pop.
mov eax, 1
is only 5 bytes. (And yes, it clears the upper32, just like every other write to a 32bit reg.) If you're going for size over speed, thepush imm8 / pop
answer has a lot of merit. – Banda