Shortest Intel x86-64 opcode for rax=1?

Asked 20/11, 2015 at 11:25 Answered 14/9, 2016 at 17:50

Solved assembly x86-64 micro-optimization code-size

What would be the shortest Intel x86-64 opcode for setting rax to 1?

I tried xor rax,rax and inc al (in NASM syntax); which gives the 5-byte opcode 48 31 c0 fe c0. Would it be possible to achieve the same result in 4 bytes?

You can modify or read any other registers, but cannot assume that a specific value would be on any one of them from previous instructions.

Piglet answered 20/11, 2015 at 11:25 Comment(1)

Generally not worth using 2 instructions when mov eax, 1 is only 5 bytes. (And yes, it clears the upper32, just like every other write to a 32bit reg.) If you're going for size over speed, the push imm8 / pop answer has a lot of merit. – Banda 20/11, 2015 at 21:32

Since there is a byte immediate encoding for push and a one-byte pop for registers, this can be done in three bytes: 6a 01 58, or push $1 / pop %rax.

Corded answered 20/11, 2015 at 14:42 Comment(0)

With any known pre-conditions, there are some tricks that are more efficient (in terms of speed) than the push imm8/pop rax 3-byte solution.

For speed mov eax, 1 has many advantages, because it doesn't have any input dependencies and it's only one instruction. Out-of-order execution can get started on it (and anything that depends on it) without waiting for other stuff. (See Agner Fog's guides and the x86 tag wiki).

Obviously many of these take advantage of the fact that writing a 32-bit register zeros the upper half, to avoid the unnecessary REX prefix of the OP's code. (Also note that xor rax,rax is not special-cased as a zeroing idiom on Silvermont. It only recognizes xor-zeroing of 32-bit registers, like eax or r10d, not rax or r10.)

If you have a small known constant in any register to start with, you can use

lea   eax, [rcx+1]    ; 3 bytes: opcode + ModRM + disp8

disp8 can encode displacements from -128 to +127.

If you have an odd number in eax, and eax, 1 is also 3 bytes.

In 32-bit code, inc eax only takes one byte, but those inc/dec opcodes were repurposed as REX prefixes for AMD64. So xor eax,eax / inc eax is 4 bytes in x86-64 code, but only 3 in 32-bit code. Still, if saving 1 byte over a mov eax,1 is sufficient, and LEA or AND won't work, this is more efficient than push/pop.

Banda answered 14/9, 2016 at 17:50 Comment(0)

Recommended topics

Hot tags