Intel REX Encoding of PUSH
Asked Answered
H

1

6

GAS gives the following encodings for the following instructions:

push rbp    # 0x55
push rbx    # 0x53
push r12    # 0x41 0x54
push r13    # 0x41 0x55

From the AMD64 spec (Page 313):

PUSH reg64 50 +rq  Push the contexts of a 64-bit register onto the stack.

Since the offsets for rbp and rbx are 5 and 3, respectively, the first two encoding make sense. I don't understand what's going on with the last two encodings, though.

I understand that 0x40-0x4f is a REX prefix and 0x41 has the REX.B bit set (which is either an extension to the MSB of MODRM.rm or SIB.base, according to this external reference). The spec mentions that to access all of the 16 GPRs you need to use REX, but it's unclear where the cutoff is.

From consulting the docs for MODRM and SIB, I don't think SIB is used, because its purpose is indexing using a base+offset register (although to be honest, I can't really tell how you differentiate between MODRM and SIB given just the encoding).

So, I suspect MODRM is being used here. Considering just the push r12 (0x41 0x54) for the moment (and noting that r12 has offset 12), we have:

+----------------+--------------------+
| 0x41           | 0x54               |
+----------------+--------------------+
| REX            | MODRM              |
+--------+-------+-----+--------+-----+
| Prefix | WRXB  | mod | reg    | rm  |
| 0100   | 0001  | 01  | 01   0 | 100 |
+--------+-------+-----+--------+-----+

REX.B + MODRM.rm = 0b1100 = 12 so this would indicate that that is the source register (r12 = offset 12). If you ignore all of the tables in the external (unofficial) reference, REX.R + MODRM.mod + MODRM.reg = 0b00101 = 5, which is the first nibble of the push instruction base 0x50.

So, I think I have worked this backwards, but I don't understand how I would arrive at an encoding like 0x41 0x54. From the AMD reference, Figure 1-10 (Page 54) has a footnote that if MODRM.mod = 01 or 10, then the byte "includes an offset specified by the instruction displacement field." This would perhaps hint at why we have the instruction offset REX.R + MODRM.mod + MODRM.reg = 0b00101 = 5. But, why is the MODRM.mod part of the instruction offset? If it must be included than instructions that take this offset form are limited to prefixes 0b01 or 0x10. That can't be right, right?

tl;dr

  • How does the REX encoding actually work for instructions like push?
  • What is the instruction offset cutoff for needing a REX prefix? (is it documented that I can't do 0x50 + 12 for push r12 like I could for push rbp or push rbx?)
  • Why is the MODRM.mod included in the prefix of the instruction base? (Or is this correct at all?)
  • Is this consistent for similar instructions like pop? (And how do I know which instructions support this? Does it work for all instructions that have opcodes of the form XX +xx?)
  • Where is this documented in the official manual?
  • How can I differentiate between whether a REX prefix is followed by a MODRM or SIB byte?
  • Is there better documentation that perhaps lays these processes out in steps instead of making you jump between several pages from table to table?
Heartworm answered 4/2, 2019 at 15:35 Comment(6)
You can't do + 12 because the rm field is only 3 bits, so its maximum value is 7. The B bit in the REX is the fourth bit. You can think of it as meaning "add 8 to the rm".Condyloid
@RaymondChen Right, I understand why for MODRM you need the extra B bit from the REX. What I don't understand is why you can't just do 0x50 + 12 (push + r12 offset with no REX) since the second nibble is enough to store all of the register offsets. Additionally, why is the instruction offset REX.R + MODRM.mod + MODRM.reg when the spec just talks about how MODRM.reg "is used to extend the operation encoding" (Page 54, "ModRM.reg (Bits[5:3]).")?Heartworm
All your questions can be answered with a careful read of the manual (try Intel's if AMD's are too vague) or by reading Raymond's comment. ModRM is too small for 16 registers, REX.B extends it. Some questions seems to suggest you skimmed through the manuals too fast, I feel you, it's a awkward to jump between pages; try taking notes in a text editor or printing the tables (if you can) :)Underpin
@BaileyParker 50+r is only one-byte long, with 5 bits of opcode and 3 bits for the register number, so you can't encode r12 without REX. The encoding has been invented from the 16 and 32-bit x86 era, which have ony 8 registers, not like they give you another bit when transitioning into x86-64Austrasia
@MargaretBloom Thanks! I'm giving the Intel manual a read now. It's already looking like it's much more clear: "REX prefixes are used to generate 64-bit operand sizes or to reference registers R8-R15." :DHeartworm
@Austrasia Ah! I had the incorrect understanding that it was 4 for opcode and 4 for register number. This explains why all register below R8 don't need REX. Thank you!Heartworm
W
4

There's clearly no ModRM byte here because the entire instruction is one byte. You can't have a ModRM without an opcode byte.

The push reg/pop reg short forms embed a 3-bit register code into the opcode byte. That's what 50 + rq means. (Unlike the FF /6 push r/m64 encoding which does use ModRM; you can encode a register operand with that to make the instruction longer, but normally you'd only ever use that for push qword [rdi] or something).

It's the same format as 16 / 32-bit, which is why x86-64 needs an extra bit (from a REX prefix) to encode one of the "new" / upper registers that have 4-bit codes with the leading bit set.

OSdev omitted this case, and only mentioned ModRM.rm and SIB.base.


Intel's vol.2 manual PDF documents the encoding:

3.1.1.1 Opcode Column in the Instruction Summary Table (Instructions without VEX Prefix)

  • ...

  • +rb, +rw, +rd, +ro — Indicated the lower 3 bits of the opcode byte is used to encode the register operand without a modR/M byte. The instruction lists the corresponding hexadecimal value of the opcode byte with low 3 bits as 000b. In non-64-bit mode, a register code, from 0 through 7, is added to the hexadecimal value of the opcode byte. In 64-bit mode, indicates the four bit field of REX.b and opcode[2:0] field encodes the register operand of the instruction. “+ro” is applicable only in 64-bit mode. See Table 3-1 for the codes.

Table 3-1 uses the same coding scheme as register numbers in ModRM and SIB, unsurprisingly, but Intel goes all out and has a full table of all integer registers for all operand-sizes. Including AH/BH/CH/DH, because mov ah, 1 can use the 2-byte short form.

I've excerpted the relevant rows from the "quadword register (64-Bit Mode only)" column:

From Intel's Table 3-1. Register Codes Associated With +rb, +rw, +rd, +ro
        reg    REX.B  Reg Field
        RBX    None    3

        RBP    None    5

        R12    Yes     4
        R13    Yes     5

Fun fact: in Intel's manual, they actually use 50 + rd instead of 50 + ro for PUSH r64, same as for push r32 in 32-bit mode. https://www.felixcloutier.com/x86/push.


Is this consistent for similar instructions like pop? (And how do I know which instructions support this? Does it work for all instructions that have opcodes of the form XX +xx?)

Yes. push/pop reg, mov reg,imm, and xchg eax, r32 / xchg rax, r64 all use the same encoding with 3 opcode bits to encode a register.

It would be nice if we could have those 8 xchg opcodes back for something more useful (like more compact VEX or EVEX prefixes in 64-bit mode), but that ship sailed when AMD played it conservative with AMD64, mostly keeping machine code as similar as possible to 32-bit mode. They did reclaim the 0x4? inc/dec reg opcodes for use as REX prefixes, though.

Wilson answered 5/2, 2019 at 0:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.