GAS gives the following encodings for the following instructions:
push rbp # 0x55
push rbx # 0x53
push r12 # 0x41 0x54
push r13 # 0x41 0x55
From the AMD64 spec (Page 313):
PUSH reg64 50 +rq
Push the contexts of a 64-bit register onto the stack.
Since the offsets for rbp
and rbx
are 5 and 3, respectively, the first two encoding make sense. I don't understand what's going on with the last two encodings, though.
I understand that 0x40-0x4f
is a REX prefix and 0x41
has the REX.B
bit set (which is either an extension to the MSB of MODRM.rm
or SIB.base
, according to this external reference). The spec mentions that to access all of the 16 GPRs you need to use REX, but it's unclear where the cutoff is.
From consulting the docs for MODRM and SIB, I don't think SIB is used, because its purpose is indexing using a base+offset register (although to be honest, I can't really tell how you differentiate between MODRM and SIB given just the encoding).
So, I suspect MODRM is being used here. Considering just the push r12
(0x41 0x54
) for the moment (and noting that r12
has offset 12
), we have:
+----------------+--------------------+
| 0x41 | 0x54 |
+----------------+--------------------+
| REX | MODRM |
+--------+-------+-----+--------+-----+
| Prefix | WRXB | mod | reg | rm |
| 0100 | 0001 | 01 | 01 0 | 100 |
+--------+-------+-----+--------+-----+
REX.B + MODRM.rm = 0b1100 = 12
so this would indicate that that is the source register (r12
= offset 12). If you ignore all of the tables in the external (unofficial) reference, REX.R + MODRM.mod + MODRM.reg = 0b00101 = 5
, which is the first nibble of the push instruction base 0x50
.
So, I think I have worked this backwards, but I don't understand how I would arrive at an encoding like 0x41 0x54
. From the AMD reference, Figure 1-10 (Page 54) has a footnote that if MODRM.mod = 01 or 10
, then the byte "includes an offset specified by the instruction displacement field." This would perhaps hint at why we have the instruction offset REX.R + MODRM.mod + MODRM.reg = 0b00101 = 5
. But, why is the MODRM.mod
part of the instruction offset? If it must be included than instructions that take this offset form are limited to prefixes 0b01
or 0x10
. That can't be right, right?
tl;dr
- How does the REX encoding actually work for instructions like
push
? - What is the instruction offset cutoff for needing a REX prefix? (is it documented that I can't do 0x50 + 12 for
push r12
like I could forpush rbp
orpush rbx
?) - Why is the
MODRM.mod
included in the prefix of the instruction base? (Or is this correct at all?) - Is this consistent for similar instructions like
pop
? (And how do I know which instructions support this? Does it work for all instructions that have opcodes of the formXX +xx
?) - Where is this documented in the official manual?
- How can I differentiate between whether a REX prefix is followed by a MODRM or SIB byte?
- Is there better documentation that perhaps lays these processes out in steps instead of making you jump between several pages from table to table?
+ 12
because the rm field is only 3 bits, so its maximum value is 7. TheB
bit in the REX is the fourth bit. You can think of it as meaning "add 8 to the rm". – Condyloid0x50 + 12
(push +r12
offset with no REX) since the second nibble is enough to store all of the register offsets. Additionally, why is the instruction offsetREX.R + MODRM.mod + MODRM.reg
when the spec just talks about howMODRM.reg
"is used to extend the operation encoding" (Page 54, "ModRM.reg (Bits[5:3]).")? – Heartworm