Why in NASM do we have to use square brackets ([ ]) to MOV to memory location?
Asked Answered
S

2

5

For example if I have a variable named test declared like:

test db 0x01      ;suppose the address is 0x00000052

If I do something like:

mov rax, test     ;rax = 0x00000052
mov rax, [test]   ;rax = 0x01

But, when I try to save in it, if we're following the same pattern:

mov test, 0x01    ;address 0x00000052 = 0x01
mov [test], 0x01  ;address 0x01 = 0x01

But it actually is:

mov [test], 0x01  ;address 0x00000052 = 0x01

So, why the square brackets behave differently depending if they are the first or second operands?

Scotopia answered 28/3, 2018 at 12:33 Comment(8)
mov test, 0x01 would mean 0x00000052 = 0x01, i.e. number = other_number, which doesn't make sense. Your comment ";address 0x00000052 = 0x01" somehow assumes the value 0x52 is memory address, but there's no reason to assume that. BTW test is not variable, it is symbolic label for certain memory address 0x52, you can create label just by test:, you don't need to follow it with db directive to reserve any space (although you should, if you want to overwrite the bytes following that label). My quarrel is about how you think about it, there are no variables in asm.Cottonmouth
and mov [test], 0x01 ;address 0x01 = 0x01 has weird comment too... it's mov [0x52],1 = store value 1 into memory at address 0x52, and it's ambiguous, as the assembler can't tell from that source if you want to store 8/16/32/64 bit value 1, NASM should either fail or at least emit warning on that line. In ambiguous case you should specify size explicitly, like mov byte [test],1 -> to write only single byte into memory. (BTW "why" - because Intel syntax marks memory access with square brackets and NASM creators decided to follow that rigorously).Cottonmouth
Because NASM Requires Square Brackets For Memory ReferencesPurvis
mov rax, test ;rax = 0x00000052 shows you're probably looking at disassembly of a .o you haven't linked. It's 0x52 bytes from the start of the file or something. mov rax, test is a mov r64, sign_extended_imm32 of the address.Mottle
Thanks for the insightful answers! About the "variables in assembly", I've already programmed plenty of assembly on HCS12, but it's a microcontroller with only A and B registers, and referencing memory is only "$", that's why I was so confused why mov rax, [test] is different from mov [test], rax.Scotopia
In C int *a; x=a vs y=*a the latter is with brackets in this asm syntax and the former without.Convection
In x86 asm, the destination is always the first operand. mov rax, [test] is a load, the other order is a store (different opcode but same mnemonic). On load/store architectures with separate mnemonics like lw and sw, it's typical for them not to follow the pattern of which operand is the destination for ALU instructions. e.g. MIPS lw $t0, ($a0) and sw $t0, ($a0), not sw ($a0), $t0. But on x86, almost all instructions can have a memory source or a memory destination, so they always respect the operand ordering.Mottle
@PedroPalhari I see... x86 is lot more versatile, so you can write both mov eax,0x52 and mov eax,[0x52], first one will load the value 0x52 itself into eax, the second will use 0x52 as memory address, and load 32 bit value (size is deducted from target register = eax = 32 bits) from memory. When you flip the arguments, the source vs destination is flipped, which makes sense with mov [0x52],eax (storing 32 bit value of eax into memory), but not mov 0x52,eax (immediate constant is not something desirable for writing into). NASM is consistent in style "[] = memory access".Cottonmouth
Y
5

In most assemblers, using square brackets dereferences a memory location. You are treating the value as a memory address.

For example, let's take this for an example.

mov ax, [0x1000]

This will get the value at 0x1000 and put it into AX. If you remove the square brackets, you only move 0x1000.

If you move a value to a number, you are putting it into the value (memory location).

If you are a C developer, here's an example problem.

Don't let this example annoy you if you've been bullied into learning C by others, calling you a 'troll'.

You can ignore this if you want but you might have known about scanf() if you know C.

int a = 10;
scanf("%d", a);

Now, this is a very common mistake because we are not getting the memory address of the variable. Instead, we are using its value as the address. The scanf() function requires you to give the the address.

If we did this,

scanf("%d", &a);

we would have the address of the variable a.

Yerxa answered 28/3, 2018 at 14:15 Comment(1)
The point is that MASM is the weird / inconsistent one, by making mov eax, symbol a load even though it doesn't have brackets. To figure out if it's a load or a mov-immediate, you have to go look at whether it's defined as an equ or = constant or as a label. NASM forces you to use syntax that matches how you define names, so you can always tell what kind of instruction it is.Mottle
C
2

Steve Woods' post gave me the impression he thinks & is a dereference operator. & is C's reference operator. * is C's dereference operator. The OP has a valid concern. [] can seem to function as both depending on the context. It is neither a dereference or reference operator. It is the "This is a memory address!!!" operator.

https://nasm.us/doc/nasmdoc3.html#section-3.3

An effective address is any operand to an instruction which references memory. Effective addresses, in NASM, have a very simple syntax: they consist of an expression evaluating to the desired address, enclosed in square brackets.

; assume wordvar was used as a label, and the linker gave it address 6291668
; or mostly equivalently, you used   wordvar equ 6291668

mov eax,wordvar         ; eax = 6291668. Move value 6291668 to eax.
mov eax,[wordvar]       ; eax =  12. Move contents of address 6291668 to eax.

mov eax,13
; mov wordvar,eax       ; Move eax to value 6291668. syntax error.
mov [wordvar],eax       ; mem(6291668) = 13. Move eax to address 6291668.

When an operand is a memory address, it has to be enclosed in square brackets to tell nasm that is the case. It's not dereferencing it, it's just letting nasm know what's up. If it was equivalent to the dereference operator,

mov [wordvar], eax

would set memory location 12 to 13.

It's not the dereference operator. It's the "this is a memory address" operator. This appears to be both dereferencing and referencing in different cases because x86 and x86_64 instructions behave differently based on whether its operands are memory locations or values. I am teaching myself assembly and I had to explain this to figure it out myself.

Coloration answered 9/2, 2019 at 16:46 Comment(4)
Your first paragraph isn't quite right. C assumes that all variables have an address, and using their bare names gives you the contents of that memory, not the address. So if you want the address in C, you need to add a &, vs. in NASM you leave out the []. In NASM, bare names are like numeric constants, or pointer constants, like C static const *a = some_static_address; where you do need *a to reference memory. Or for EQU constants, static const *a = 12345;Mottle
So anyway, [] always dereferences the symbol value to access memory at that. (Technically symbol values are their address, not the pointed to memory. Putting a label somewhere is very similar to foo equ 0x401000, as far as what happens when you use that token later inside or outside of []). And since we know that x86 doesn't have memory-indirect addressing, [foo] couldn't have been syntax for loading the address from memory and then dereferencing it. Unlike C, there's no compiler that can turn expressions into multiple instructions if they're not encodable as one.Mottle
And BTW, in x86 terminology, a "word" is 16 bits. EAX is a dword register, so you might want to adjust your variable name.Mottle
Update to my first comment: extern char foo[]; is a better C analogy for a symbol defined by a label, and what you'd actually use if you want to declare a C var for something where you don't want to access bytes there, just use the address, like end_data (end of the .data section). There is no pointer object to get optimize away, just the name attached to an address.Mottle

© 2022 - 2024 — McMap. All rights reserved.