Why operand must have size in one line but not the other in x86 assembly
Asked Answered
H

1

2

Looking at the picture, on line 34 I had to write the word ptr for this to work, while on line 44 I didn't.
Why is that? image

Can't the compiler know that 0020h is a word just like 0FF20h is a word?
Adding 0 to 0020h making it 00020h or anything like that doesn't work either.

I am using MASM on 80x86. emu8086, also tried on dosbox v0.74

Hideaway answered 18/4, 2018 at 22:43 Comment(7)
0020H could also be just a byte but 0FF20H doesn't fit in a single byte I guess. Not sure though, just a hunch.Tobacco
Note that you had to write 0FF20H with a leading zero too so if the assembler really relied on the length of the literal, it could have thought that was a dword ... similarly for 0FFH. It would be a dangerous game. Note sensible assemblers don't even allow your second form without explicit size. That's just a bug waiting to happen.Williawilliam
yes it is possible to write an assembler that the number of digits in the number indicates the size 0000h 16 bits and 00h 8 bits. But that is generally not how they work. Your confusing is valid, the tool should really force it all the time for consistency, but clearly doesnt...I assume it doesnt throw an error if you add the WORD PTR to line 44?Outfall
You should use the word ptr also on line 44, to show your intent to update 16 bits of memory. The fact that it does compile as expected by accident is irrelevant, especially in something as fragile as assembly you should be rather completely explicit and accurate, for the purpose of review and debugging (for example when you ask something on SO, and post your source, you can bet majority of readers will be unable to tell what is the "default" behaviour of your assembler, so being explicit in every ambiguous case helps a lot with reviews).Amulet
And BTW, I think either Peter Cordes or Michael Petch told me their personal preference of using the size specifier on the memory destination side (in NASM you can write also mov [si], word 0x20 - but I don't use it), and the reasoning was very solid, by stating mov word ptr [si],20h you are saying that you want to modify 16 bits of memory, but you don't mind encoding of constant as 8 bit, if such opcode (mov word ptr [r],sign-extended-imm8) does exist, so you give the assembler more accurate information what your really want, and leave him relaxed constraints on constant optimization.Amulet
@Ped7g: Yes, that's my reasoning for putting the size override on the memory operand. (But note that mov doesn't have encodings with narrow immediates, except for mov r64, sign_extended_imm32. ALU instructions like add word [mem], imm8 exist, though. It would be a nice code-size saving for x86-64 to use one of the opcode bytes it freed up, like SALC or POP ES, as the opcode for a mov r/m64/32/16, sign-extended-imm8, giving you mov eax,1 in 3 bytes. And the very common mov qword [mem], 0 in 4 bytes + extra for the addressing mode. Saving 3 bytes vs. imm32 for memory dst.)Gobo
@user7387595: Are you sure you tried it with emu8086? Last I heard, its crappy assembler would accept ambiguous instructions like that and picks a default operand size (but I forget which size is the default).Gobo
G
4

The difference is because your assembler strangely and dangerously accepts 0FF20h as implying word operand-size. But even for your assembler, leading zeros don't imply operand-size, just the actual value; presumably it checks the position of the most significant bit.

This is not the case for a well-designed and consistent assembler syntax like NASM: If I try to assemble this in 16-bit mode with nasm -fbin foo.asm

mov [es: si], 2
mov [es: si], 0ff20H

I get these errors:

foo.asm:1: error: operation size not specified
foo.asm:2: error: operation size not specified

Only a register can imply an operand-size for the whole instruction, not the width of a constant. (mov [si], ax is not ambiguous: there is no form of mov where the destination has a different width than the source, and ax is definitely word sized.)

Same applies for GAS (the GNU assembler), in both AT&T and Intel syntax modes. (Its Intel-syntax mode is very similar to MASM.)

There's no mov r/m16, sign_extended_imm8 encoding, but there is for add and most ALU operations, so there's no reason for an assembler to assume that xyz [mem], 0 means byte operand size. More likely the programmer forgot to specify, so it treats it as an error instead of silently accepting something ambiguous.

mov word [mem], 0 is a totally normal way to zero a word in memory.


Besides all that, x86 supports 32-bit operand size in 16-bit code, using a 66h operand-size prefix. This is independent from the address-size.

mov dword ptr es:[si], 0FF20h is also encodeable, and completely ambiguous with mov word ptr es:[si], 0FF20h if you leave out the size ptr specifier.

As Jester commented, if leading zeros counted as part of the width of the constant, 0FF20h could easily be taken as implying dword.

Note that you had to write 0FF20H with a leading zero too so if the assembler really relied on the length of the literal, it could have thought that was a dword ... similarly for 0FFH. It would be a dangerous game. Note sensible assemblers don't even allow your second form without explicit size. That's just a bug waiting to happen.

(Sensible assemblers include NASM and GAS, like I showed above).

If I were you, I'd be unhappy that my assembler accepted mov es:[si], 0FF20h without complaint. I thought emu8086 was even worse than MASM, and usually accepted stuff like mov [si], 2 with some default operand size instead of warning even then.

I'm not a big fan of how MASM magically infers operand-size from symbol db 1, 2, 3 either, but that's not ambiguous, it just means you have to look at how a symbol was declared to know what operand-size it will imply.

Gobo answered 19/4, 2018 at 13:0 Comment(1)

© 2022 - 2024 — McMap. All rights reserved.