Why do we need to disambiguate when adding an immediate value to a value at a memory address
Asked Answered
F

1

3

Explains that unless we specify a size operator (such as byte or dword) when adding an immediate value to a value stored at a memory address, NASM will return an error message.

section .data           ; Section containing initialized data

    memory_address: db "PIPPACHIP"

section .text           ; Section containing code

global  _start          ; Linker needs this to find the entry point!

_start:

23            mov ebx, memory_address
24            add [ebx], 32

........................................................

24:  error: operation size not specified. 

Fair’s fair.

I’m curious as to why this is so however. As the two following segments of code will yield the same result.

add byte [ebx], 32

or

add dword [ebx], 32

So what difference does it make? (Other than not making much sense as to why you would use dword in this instance). Is it simply because “NASM says so”? Or is there some logic here that I am missing?

If the assembler can decipher the operand size from a register name, for example add [ebx], eax would work, why not do the same for an immediate value, i.e. just go ahead and calculate the size of the immediate value upfront.

What is the requirement that means a size operator needs to be specified when adding an immediate value to a value at a memory address?

NASM version 2.11.08 Architecture x86

Finkelstein answered 22/11, 2017 at 23:23 Comment(4)
With your example indeed you won't see a difference, but try adding 32 to, say, 240.Celtuce
Try add byte [ebx],250 vs add dword [ebx],250 ... they should be the same according to your logic. But result will be different.Arva
@Celtuce Thank you. That makes sense. The size operator is instructing the CPU on how many bits to switch in memory, regardless of the resulting sum, and the carry flag is set if the result of the arithmetic “carries out” a bit, i.e. the CPU is not physically able to ‘record’ the result in the number of bits specified by the size operator.Finkelstein
Related: What's the difference between 0 and dword 0? for basics of what operand-size overrides actually do.Trinidad
T
5

It does matter what operand-size you use for several reasons, and it would be weird and unintuitive / non-obvious to have the size implied by the integer value. It's a much better design to have NASM error when there's ambiguity because neither operand is a register.


As the two following segments of code will yield the same result:

add byte [ebx], 32
add dword [ebx], 32

They only yield the same result because 'P' + 32 doesn't carry into the next byte.

Flags are set according to the result. If the 4th byte had its high bit set, then SF would be set for the dword version.

re: comments about how CF works:

Carry-out from an add is always 0 or 1. i.e. the sum of two N-bit integers will always fit in an (N+1)-bit integer, where the extra bit is CF. Think of the add eax, ebx as producing the result in CF:EAX, where each bit can be 0 or 1 depending on the input operands.


Also, if ebx was pointing at the last byte in a page, then dword [ebx] could segfault (if the next page was unmapped), but byte [ebx] wouldn't.

This also has performance implications: read-modify-write of a byte can't store-forward to a dword load, and a dword read-modify-write accesses all 4 bytes. (And correctness if another thread had just modified one of those other bytes before this thread stored the old value over it.)


For these and various other reasons, it matters whether the opcode for the instruction that NASM assembles into the output file is the opcode for add r/m32, imm8 or add r/m8, imm8.

It's a Good Thing that it forces you to be explicit about which one you mean instead of having some kind of default. Basing it on the size of the immediate would be confusing, too, especially when using a ASCII_casebit equ 0x20 constant. You don't want the operand-size of your instructions to change when you change a constant.

Trinidad answered 22/11, 2017 at 23:35 Comment(6)
Thank you. I see. E.G. If I specify the size operator word, and the result of my addition increases the original data beyond the size of one byte (one memory address), I will be changing the data stored in the ‘next’ byte in memory as well, even if this not my intention. Whereas, if I specify byte, I will only be changing the data in the ‘first’ byte, regardless of the resulting sum; and a CF / OF will be set accordingly. What happens to the data that is carried? Is it lost? i.e. A carry is flagged, but there is no direct information as to what has been carried? Carry is not always 1.Finkelstein
@case_2501: carry-out from an add is always 0 or 1. i.e. the sum of two N-bit integers will always fit in an (N+1)-bit integer, where the +1 is CF.Trinidad
@case_2501: there is no memory-destination mul or imul, and the one-operand form gives you the full-multiply result in (e)dx:(e)ax, or ah:al. e.g. mul dword [ebx] does edx:eax = eax * dword [ebx]. If you don't care about the upper half of the multiply (e.g. C semantics for c = a*123), you can do imul r32, r/m32, imm8/32 (where the middle operand can be register or memory), or you can do imul r32, r/m32Trinidad
“the sum of two N-bit integers will always fit in an (N+1)-bit integer”, this makes perfect sense. However, I am finding it difficult to understand how the carry is always 1 when, for example, we add an 8-bit integer to a 16-bit integer. If, in my original example, I add 4,095, instead of 32, add byte [ebx], 4095 the carry flag is set accordingly. However, the flag is not indicating to me here that the next highest bit is now a 1, like it would be for the sum of two N-bit integers, rather just that there has been a carry of X value, how do you know X?Finkelstein
@case_2501: The carry-out isn't always 1, it's set to the high bit of the N+1 bit result. Think of the add eax,ebx as producing the result in CF:EAX, where each bit can be 0 or 1 depending on the input operands.Trinidad
4095 doesn't fit in an 8-bit immediate, so add byte [ebx], 4095 is not encodeable. Many assemblers will truncate it to add byte [ebx], 0xFF.Trinidad

© 2022 - 2024 — McMap. All rights reserved.