When should I use size directives in x86?
Asked Answered
M

3

6

When to use size directives in x86 seems a bit ambiguous. This x86 assembly guide says the following:

In general, the intended size of the of the data item at a given memory address can be inferred from the assembly code instruction in which it is referenced. For example, in all of the above instructions, the size of the memory regions could be inferred from the size of the register operand. When we were loading a 32-bit register, the assembler could infer that the region of memory we were referring to was 4 bytes wide. When we were storing the value of a one byte register to memory, the assembler could infer that we wanted the address to refer to a single byte in memory.

The examples they give are pretty trivial, such as mov'ing an immediate value into a register.
But what about more complex situations, such as the following:

mov    QWORD PTR [rip+0x21b520], 0x1

In this case, isn't the QWORD PTR size directive redundant since, according to the above guide, it can be assumed that we want to move 8 bytes into the destination register due to the fact that RIP is 8 bytes? What are the definitive rules for size directives on the x86 architecture? I couldn't find an answer for this anywhere, thanks.

Update: As Ross pointed out, the destination in the above example isn't a register. Here's a more relevant example:

mov    esi, DWORD PTR [rax*4+0x419260] 

In this case, can't it be assumed that we want to move 4 bytes because ESI is 4 bytes, making the DWORD PTR directive redundant?

Manatarms answered 15/6, 2017 at 21:14 Comment(8)
RIP isn't the destination register. The destination isn't a register at all, it's a memory location. The instruction stores the value 1 in memory at the address RIP + 0x21b520.Opal
@RossRidge Ah, I never looked at it that way. Thanks, that clarifies it a bit for me, but I've seen other situations where the destination is, in fact, a register. I'll update the question.Manatarms
dword ptr isn't needed in your second example.Malchus
@Malchus Yeah, I actually found this instruction while debugging with gdb. Any idea why the assembler would include the redundant directive? Not that it matters but I'm a bit curious. I thought they were supposed to be smarter than this...Manatarms
Perhaps it could be gdb that is adding the redundant directive.Manatarms
Sure, if the destination register is explicit then the PTR size does not matter. Most assemblers don't need it. Old assemblers used to need it, a long time ago. But you are actually using a disassembler, they are wordy by design. Since they can't guess if you use an old assembler.Duchy
Which assembler are you using, and if you remove dword ptr from the mov esi instruction, does that result in an error?Boatright
Related: Determining when NASM can infer the size of the mov operation (NASM uses dword instead of dword ptr).Fourinhand
P
6

You're right; it is rather ambiguous. Assuming we're talking about Intel syntax, it is true that you can often get away with not using size directives. Any time the assembler can figure it out automatically, they are optional. For example, in the instruction

mov    esi, DWORD PTR [rax*4+0x419260] 

the DWORD PTR specifier is optional for exactly the reason you suppose: the assembler can figure out that it is to move a DWORD-sized value, since the value is being moved into a DWORD-sized register.

Similarly, in

mov    rsi, QWORD PTR [rax*4+0x419260] 

the QWORD PTR specifier is optional for the exact same reason.

But it is not always optional. Consider your first example:

mov    QWORD PTR [rip+0x21b520], 0x1

Here, the QWORD PTR specifier is not optional. Without it, the assembler has no idea what size value you want to store starting at the address rip+0x21b520. Should 0x1 be stored as a BYTE? Extended to a WORD? A DWORD? A QWORD? Some assemblers might guess, but you can't be assured of the correct result without explicitly specifying what you want.

In other words, when the value is in a register operand, the size specifier is optional because the assembler can figure out the size based on the size of the register. However, if you're dealing with an immediate value or a memory operand, the size specifier is probably required to ensure you get the results you want.

Personally, I prefer to always include the size when I write code. It's a couple of characters more typing, but it forces me to think about it and state explicitly what I want. If I screw up and code a mismatch, then the assembler will scream loudly at me, which has caught bugs more than once. I also think having it there enhances readability. So here I agree with old_timer, even though his perspective appears to be somewhat unpopular.

Disassemblers also tend to be verbose in their outputs, including the size specifiers even when they are optional. Hans Passant theorized in the comments this was to preserve backwards-compatibility with old-school assemblers that always needed these, but I'm not sure that's true. It might be part of it, but in my experience, disassemblers tend to be wordy in lots of different ways, and I think this is just to make it easier to analyze code with which you are unfamiliar.

Note that AT&T syntax uses a slightly different tact. Rather than writing the size as a prefix to the operand, it adds a suffix to the instruction mnemonic: b for byte, w for word, l for dword, and q for qword. So, the three previous examples become:

movl    0x419260(,%rax,4), %esi
movq    0x419260(,%rax,4), %rsi
movq    $0x1, 0x21b520(%rip)

Again, on the first two instructions, the l and q prefixes are optional, because the assembler can deduce the appropriate size. On the last instruction, just like in Intel syntax, the prefix is non-optional. So, the same thing in AT&T syntax as Intel syntax, just a different format for the size specifiers.

Pursuit answered 16/6, 2017 at 10:12 Comment(5)
Some assemblers, like MASM or TASM or Delphi's built-in assembler, can declare variables or struct members as having a certain size. Using these variables as target or source operand also makes the use of an explicit size directive unnecessary, in these assemblers. I wouldn't call this ambiguous, just, well, sometimes harder to read.Acidimetry
Yes, high-level-language-style assemblers make it even more complicated. I thought about mentioning that, but decided the answer didn't need to be any longer. :-) It follows the same general principle that if the assembler can deduce the size, you don't need to explicitly specify it. But it is only type-aware assemblers that can make these deductions, and these are not used terribly frequently in my experience.Pursuit
The one I use most is a real (built-in) assembler, but the structs and variables are defined in the language (Object Pascal). That makes things quite easy, and much nicer to use than an external assembler (which I have done too, just for fun). And indeed, if the assembler can deduce the size, there is no need to use a directive. I generally don't use any, and only add them if the assembler complains.Acidimetry
I think MASM, which is pretty type/size aware, is used rather often, at least on Windows. NASM and most others aren't, indeed. TASM was, but that is pretty old.Acidimetry
I only use explicit sizes when it's not implied by a register, but I know I always think about the operand size anyway. Byte and dword don't need any prefixes, but word and qword need a 66 or REX prefix, so I'm always trying to get away with using dword operand size for efficiency. (In 32-bit code, I guess that'd be easier...). If that's not your thought process when writing asm, I guess explicit size suffixes / Intel-syntax overrides are one way to force yourself into it.Fourinhand
D
3

RIP, or any other register in the address is only relevant to the addressing mode, not the width of data transfered. The memory reference [rip+0x21b520] could be used with a 1, 2, 4, or 8-byte access, and the constant value 0x01 could also be 1 to 8 bytes (0x01 is the same as 0x00000001 etc.) So in this case, the operand size has to be explicitly mentioned.

With a register as the source or destination, the operand size would be implicit: if, say, EAX is used, the data is 32 bits or 4 bytes:

mov    [rip+0x21b520],eax

And of course, in the awfully beautiful AT&T syntax, the operand size is marked as a suffix to the instruction mnemonic (the l here).

movl $1, 0x21b520(%rip) 
Dorcy answered 15/6, 2017 at 21:42 Comment(2)
Thanks! I updated the question with a better example that I found while debugging a binary with gdb. It seems that the size directive is redundant in this case, no?Manatarms
@ilkkachu: Awful, yes, but beautiful? Naaaah.Acidimetry
M
-1

it gets worse than that, an assembly language is defined by the assembler, the program that reads/interprets/parses it. And x86 in particular but as a general rule there is no technical reason for any two assemblers for the same target to have the same assembly language, they tend to be similar, but dont have to be.

You have fallen into a couple of traps, first off the specific syntax used for the assembler you are using with respect to the size directive, then second, is there a default. My recommendation is ALWAYS use the size directive (or if there is a unique instruction mnemonic), then you never have to worry about it right?

Mariannamarianne answered 15/6, 2017 at 21:24 Comment(2)
I usually do the opposite: I initially never use any, and only add them if the assembler complains. And every assembler will complain if it can't find out. Especially in the assembler I use most (Delphi's BASM), where variables and struct members have a designated size, it is seldom necessary. Makes writing code a lot easier, IMO. I have used NASM, FASM, YASM, BASM, TASM, MASM, and they all complain, if necessary.Acidimetry
I think I've heard of some nasty x86 assembler having a default operand size. Maybe the one built into emu8086?Fourinhand

© 2022 - 2024 — McMap. All rights reserved.