Understanding Instruction Encoding?

About

Asked 31/7, 2021 at 19:23 Answered 31/7, 2021 at 19:27

Solved assembly x86-64 att machine-code instruction-encoding

-1

I used a website to encode this:

movw $8, 4(%r8d,%esi,4)

and got:

encoding (hex): 67 66 41 C7 44 B0 04 08 00

Thanks to you I nearly understand everything except 2 small points:

Here we are moving 2 bytes immediate to 4 bytes address. They used C7 opcode which according to the table I have means one of the following:

mov imm16 to r/m16
mov imm32 to r/m32
mov imm32 (sign extended) to r/m64

Why there is no match?

Why immediate is 2 bytes? according to what?

Weltanschauung answered 31/7, 2021 at 19:23 Comment(0)

There is a match. It's the first one "mov imm16 to r/m16", because of the w in the mnemonic movw. r/m16 means that 16 bits (two bytes) of memory are being read/written. It so happens that you are using a 32-bit effective address to identify which two bytes of memory are to be written, but that's not part of the r/m16 notation.

The immediate is two bytes because two bytes are to be written. There would be no point in having more. Though there are some examples, like the third case, where the immediate is shorter than the operand size and is zero- or sign-extended.

Cronus answered 31/7, 2021 at 19:27 Comment(8)

But I don't get it we are summing 32 bit addresses so we get 32 bit address... w means write to 16 bit address so which 16 bit we take lower ones or higher ones? – Weltanschauung 31/7, 2021 at 19:29

I think terminology like "32 bit address" is confusing you. The address is 32 bits, but we are using to identify 16 bits worth of memory. For instance, suppose r8d + (esi * 4) + 4 comes out to equal 0x12345678. Then your movw instruction will write 08 to the byte at address 0x12345678, and write 00 to the byte at address 0x12345679. Writing two bytes = 16 bits. If you used movb, only the byte at 0x12345678 would be written. If you used movl, the four bytes at 0x12345678..0x1234567b would be written (with the values 08 00 00 00 respectively). – Cronus 31/7, 2021 at 19:33

@coolmo: Literally writing with a 16-bit address would mean writing the bytes starting at 0x5678 (it would always be the low bits). There is no encoding for this in 64-bit long mode, though there is in 32-bit protected mode (sort of, you become more limited in addressing modes). It is pretty much useless either way. – Cronus 31/7, 2021 at 19:36

Now it's clear, last thing there is an opcode for mov imm32 (sign extended) to r/m64 and another one for mov imm64 to r/m64 how may I know which to use (how may I know if the instruction does sign extension or not)? – Weltanschauung 31/7, 2021 at 19:41

@coolmo: At the level of assembly you don't really care about the sign extension: you specify the actual value you want written. If the assembler can represent it as a sign-extended value, it will assemble it; otherwise pick a different encoding or complain. As for mov imm64 to r/m64, I think you are mistaken: no such encoding exists. There is an instruction to mov imm64 to r64 (register only), REX.W + B8. The GNU assembler will automatically pick this encoding if you specify an immediate that does not fit in 32-bits sign extended, or you can force it with the movabsq mnemonic. – Cronus 31/7, 2021 at 19:47

@coolmo: For example movq $0xfffffffffedcba98, %rax will give you the "mov imm32 sign extended to r/m64` encoding, REX.W + C7. So will movq $0xfffffffffedcba98, (%rax, %rsi, 8). However movq $0xfedcba9876543210, %rax will give you REX.W + B8, and movq $0xfedcba9876543210, (%rax, %rsi, 8) will fail to assemble. – Cronus 31/7, 2021 at 19:51

@coolmo: The official x86 terminology for these concepts are "address size" vs. "operand size". Those two attributes of an instruction are totally separate and orthogonal, and are controlled by different prefixes. – Allynallys 1/8, 2021 at 1:55

@coolmo: Unfortunately x86 doesn't have any mov-immediate that sign-extends a byte immediate (which would allow 3-byte mov to reg for small numbers), Only x86-64's REX.W version of mov $imm32, r/m32. The REX.W version of mov $imm32, reg (no ModRM) is special and takes a 64-bit immediate. Re: assemblers choosing automatically: if you use an symbolic address like mov $func, %rdi, GAS will default to movq. Only if the value is a compile-time (not link-time) constant can it choose movabs if needed. Difference between movq and movabsq in x86-64 – Allynallys 1/8, 2021 at 1:58

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags