What do the brackets mean in NASM syntax for x86 asm?
Asked Answered
L

9

75

Given the following code:

L1     db    "word", 0

       mov   al, [L1]
       mov   eax, L1

What do the brackets in [L1] represent?


This question is specifically about NASM. The other major flavour of Intel-syntax assembly is MASM style, where brackets work differently when there's no register involved:
See Confusing brackets in MASM32

Lanna answered 8/1, 2010 at 20:4 Comment(1)
Note that brackets are weird and less simple in MASM : Confusing brackets in MASM32 Usually they mean dereference, but sometimes they're ignored. (And sometimes it dereferences even without brackets).Advanced
P
62

[L1] means the memory contents at address L1. After running mov al, [L1] here, The al register will receive the byte at address L1 (the letter 'w').

Peculation answered 8/1, 2010 at 20:6 Comment(6)
Thanks for your reply, I am starting to learn asm. If I understand this correctly, "mov al, [L1]" would move 'w' into al, and "mov eax, L1" would move the address of L1 into eax. Is that correct?Lanna
yes. and if you did mov ebx,L1 -- mov al,[ebx] then al would be 'w' in that case too.Humerus
The exception to this is LEA.Gonzales
@interjay, So why is it that the brackets are not needed in the second line: mov eax, L1?Vinni
@Vinni It depends on the assembler you're using, but usually without the brackets it will get the memory address, not the contents.Peculation
@Pacerier: NASM/FASM assemble mov eax, L1 to mov eax, imm32 with the address. MASM / GAS (.intel_syntax noprefix) assemble that to a load, exactly the same as mov eax, [L1]. See Confusing brackets in MASM32. Some MASM users prefer to always use [] around memory references even when not required, but to mov reg, imm you need mov eax, OFFSET L1 in MASM/GAS-Intel syntax.Advanced
A
56

Operands of this type, such as [ebp], are called memory operands.

All the answers here are good, but I see that none tells about the caveat in following this as a rigid rule - if brackets, then dereference, except when it's the lea instruction.

lea is an exception to the above rule. Say we've

mov eax, [ebp - 4]

The value of ebp is subtracted by 4 and the brackets indicate that the resulting value is taken as an address and the value residing at that address is stored in eax. However, in lea's case, the brackets wouldn't mean that:

lea eax, [ebp - 4]

The value of ebp is subtracted by 4 and the resulting value is stored in eax. This instruction would just calculate the address and store the calculated value in the destination register. See What is the difference between MOV and LEA? for further details.

Armandoarmature answered 13/9, 2014 at 14:20 Comment(2)
The first link is dead. Here is a snapshot: web.archive.org/web/20180331051340/http://www.imada.sdu.dk/…Bartonbartosch
@Bartonbartosch Thanks for flagging the broken link! Fixed with a better link :)Armandoarmature
H
30

Simply means to get the memory at the address marked by the label L1.

If you like C, then think of it like this: [L1] is the same as *L1

Humerus answered 8/1, 2010 at 20:11 Comment(2)
@user2485710 No, *p means dereference the char pointed by p. Strings have nothing to do with this.Boob
*L1 only works if you think in terms of asm labels as equivalent to C static/global arrays, like static char L1[] = "word"; in this question. Then in C, L1 has type char* and is the address. *L1 has type char and is the first byte. mov eax, [L1] in asm is like memcpy into a uint32_t, or deref of an unaligned / strict-aliasing-safe uint32_t*.Advanced
A
12

The brackets mean to de-reference an address. For example

mov eax, [1234]

means, mov the contents of address 1234 to EAX. So:

1234 00001

EAX will contain 00001.

Actionable answered 8/1, 2010 at 20:6 Comment(0)
S
2

Direct memory addressing - al will be loaded with the value located at memory address L1.

Sixpence answered 8/1, 2010 at 20:8 Comment(0)
I
1

As with many assembler languages, this means indirection. In other words, the first mov loads al with the contents of L1 (the byte 'w' in other words), not the address.

Your second mov actually loads eax with the address L1 and you can later dereference that to get or set its content.

In both those cases, L1 is conceptually considered to be the address.

Ibbie answered 8/1, 2010 at 20:7 Comment(0)
C
1

They mean that instead of moving the value of the register or numeric value L1 into the register al, treat the register value or numeric value L1 as a pointer into memory, fetch the contents of that memory address, and move that contents into al.

In this instance, L1 is a memory location, but the same logic would apply if a register name was in the brackets:

mov al, [ebx]

Also known as a load.

Cartagena answered 8/1, 2010 at 20:7 Comment(0)
S
1

In MASM, brackets work like NASM when used with registers, and in that case are not optional. (Things are different for addressing modes that don't involve a register, see Confusing brackets in MASM32)

The brackets indicate that the register contains a pointer, and that the machine code wants the value of that pointer (pointers are in byte-addressing: a pointer is the xth byte of memory; a byte is 8 binary digits; one hexadecimal digit is 4 binary digits; as a byte is 2 hexadecimal digits; starting from there); if it's in the src part of the instruction.

  • However, if dst has the brackets: memory at that address is an operand for the instruction. (Memory as in pointer of "byte-addressing" talked about, previously.)

In binary machine code, (typing hexadecimal digits in notepad.exe then converting hexadecimal digits into \xhexadecimal result~python_reference) to get the value of a pointer in a register, it can be defined in the ModR/M byte of the instruction that's going to be written in notepad.exe which is 10 characters I believe. (I'm finishing my MASM experience first, then I'm going to move on to scavenge information about what to type into notepad.exe through readings of window's kernel/malware analysis; I'll come back to this post and write up an example)

1 .686
2 .model flat, c
3 option casemap :none
4 
5 include C:\masm32\include\kernel32.inc
6 includelib C:\masm32\lib\kernel32.lib
7 
8 .data 
9     message db "Hello world!", 0
10 .code
11 
12 main proc
13  call testfunc
14  COMMENT @ 
15  push 0FFFFh
16  push testfunc
17  pop ax
18  @
19  invoke ExitProcess, 404
20 main ENDP
21 
22 testfunc proc
23  sub esp, 1
24  mov al, 0FFh
25  mov [esp], al
26  COMMENT @
27  push 0FFFFh
28  push 05EFFB880h
29  push 0773BFF5Ch
30  push 0FB038Fh
31  mov al, [esp+8]
32  @
33  invoke ExitProcess, [esp]
34 testfunc ENDP
35 
36 END main

Windows:
If you would type the result of executing this, and compare:

C:\masm32\bin\ml /c /Zd /coff script_name.asm
C:\masm32\bin\Link /SUBSYSTEM:CONSOLE script_name.obj
script_name.exe
echo %ERRORLEVEL%

The program's exit status (printed with echo) would be a the number stored to stack memory with mov [esp], al as the arg to ExitProcess, ending in hex FF. (%ERRORLEVEL% converts the number to a string of decimal digits, not hex, but it's the same number.)

However, without the [] around [esp]: we also have to change AL to EAX (because x86 CPUs don't have an instruction to move 8-bit registers to bottom of 32-bit registers). And remove the brackets around the last time the letters 'esp' was used in the lines of code; it would result in the pointer to the stack region in esp.

1 testfunc proc
2   mov eax, esp
3   mov bl, 0FFh
4   mov [eax], bl
5   COMMENT @
6   push 0FFFFh
7   push 05EFFB880h
8   push 0773BFF5Ch
9   push 0FB038Fh
10  mov al, [esp+8]
11  @
12  invoke ExitProcess, [esp]
13 testfunc ENDP

Tag: optional brackets

The above code is proof that the brackets ALWAYS WORK (uses the value inside whatever the code is as a pointer and gets the value of the pointer) in language interpreting machine code into a readable way instead of bytes and knowing how the Windows kernel would execute an exe file (reverse engineer window's kernel to make your own exe files from scratch inside notepad, which there isn't enough support in; however, malware analysis does have enough support.)

(If you want to test the code: you replace the lines with the testfunc in last code, and execute it the same way with the lines): In this case, eax is equal to esp's pointer in memory of the stack segment (stack segment is important because it has its own instructions: PUSH and POP 32-bit values from / to an immediate, register, or memory operand). So when you execute it, the bare esp operand is the value of the ESP register, a pointer value, not memory contents on the stack.


I'll come back and edit this post once in a while (if I actually get really good at assembly.); So, this can be an ultimate guide to assembly. I just got started in assembly and making a quick length of the most significant bit finder in a specific range script in assembly.

Resources that have helped me gotten to make this script so far:
5 hour tutorial of the entirety of C++:

  • https://www.youtube.com/watch?v=vLnPwxZdW4Y&ab_channel=freeCodeCamp.org

    I recommend after this doing a scavenger hunt of learning HTML/CSS/JS and making a calculator website (a drag and drop of html file to Microsoft Edge), and scavenger hunt of coding a video game like Undertale (a drag and drop of html file to Microsoft Edge), and then learn Python3 just for jokes.

Helps me find out what stuff like DWORDs are (unsigned long).
https://www.bing.com

  • Please read the intel software developer manual, it tells you stuff like how if you change a position in memory, it's called the command register of advanced programmable interrupt controller would execute code in another core which is a CPU. You don't have to remember, just I recommend rewriting everything into txts, and then make a script to search for a word every new section you create a txt. I didn't even memorize anything from the book, I just know some stuff in the commonsense part of my mind, I hope you will know more for the reader.

I read till half of Volume 3 and then skimmed the rest
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

  • I watched some of https://www.youtube.com/c/WhatsACreel videos because I was doing a chapter and had 30 day breaks between reading that so I could understand better. I recommend doing that too, but I don't know how to tell you when to stop and question your thinking to watch a video; I'm sorry.

Davy Wybrial's assembly language tutorial to watch after all that of watching: https://www.youtube.com/watch?v=wLXIWKUWpSs&ab_channel=DavyWybiral
The Intel Software Developer Manual's section called 'Operation Section':

  • "a register name enclosed in brackets implies the contents of the location whose address is contained in that register."

How to Start Coding Assembly on Windows (MASM)
https://www.youtube.com/watch?v=lCjbwLeLNfs&ab_channel=CharlesClayton

Again, I'll come back to here (this post, and as well as my future posts) and try to educate everyone, so my knowledge is equal with everyone reading.

Substitute answered 22/2, 2022 at 22:37 Comment(13)
[] definitely aren't "functions, which return." In the context of assembly language, a function is something you call with a call instruction. [] in MASM is part of the addressing-mode syntax, as part of a single instruction. No function, no return involved. I think that's just bad wording which should be fixed but is separate from the later points you're trying to make. Which have separate problems:Advanced
You're only looking at the case of a register name inside []. With numeric literals like mov eax, [1234] or a label like mov eax, L1, MASM does ignore the brackets. See Confusing brackets in MASM32 - apparently mov eax, 1234 really is equivalent if you don't use dword ptr or ds:. This question is tagged NASM, though, where brackets are always meaningful and never optional.Advanced
I edited this question's title to make it clearer it's specifically about NASM syntax. (Since there are other answers here which say things that are only true for NASM syntax, not MASM.) This answer was already kind of off-topic since the question was tagged NASM, and this answer only looked at cases with a register inside the [] so it's not correct for MASM.Advanced
It's great that you want to contribute and share links to tutorials that you found helpful, and a question like this is likely to be one that beginners look at. (Although the x86 tag wiki stackoverflow.com/tags/x86/info is where collections of links are supposed to go; I may take a look at some of those tutorial links and add them if they look good. I don't go looking for new x86 tutorials myself since I already know how it works.) Welcome to Stack Overflow.Advanced
But it is important to actually answer the question you're posting under correctly, and without misleading statements about "functions" and "returning", or about "This would return a decimal number when in hex the number ends with the hexadecimal FF." The numbers in registers are in binary; hex and decimal are just different ways to represent them in source code and debuggers. No actual conversion is taking place when mov al, [esp] itself executes, only during assembly and when later code at run-time prints the number as a string.Advanced
Also, push 0FFFFh is a 32-bit operand-size push, not 16-bit. See How many bytes does the push instruction push onto the stack when I don't specify the operand size?. It will assemble to the push imm32 form felixcloutier.com/x86/push. It's non-standard terminology to say that's "pushing memory"; normally that would mean pushing a memory source operand like push dword ptr [1234], not an immediate like push 123 where the value pushed is a constant that's part of the machine code, not loaded as data. (Only stored as data by push)Advanced
I came from CharlesProxy and python3/js (explains function). I did just want to say the brackets actually do just get the value of the byte address written inside a register or the address as is. I'll write about the confusingness of writing in machine code so people can just slide into knowing machine code if they have the knowledge of the word 'syntax'., confidently! Again, thank you for posting questions that answer my big post. Also, I would like to state for the readers, the [] in hex machine coding is the modrm byte getting defined. I'll come back to edit a lot of my answers in futureSubstitute
There you go, it would be much better to start off your answer by saying that the presence and contents of [] tells the assembler what memory addressing mode you want it to encode in the ModRM byte (and optional SIB and/or disp8/disp32 bytes). The correspondence to machine code is a useful angle that other answers on this question haven't already explained. And it would give you something to replace the nonsense of calling it a "function" or saying it "returns". In mov [eax], ecx, the memory operand is a write-only destination, for example.Advanced
Thank you, I wrote a new beginning update. I was going to go in depth like that in the future to edit this post; but now seemed pretty good as well. That is just a fraction of what the post in the future will be about.Substitute
Since you still didn't fix any of the problems I pointed out in your answer, I did it for you so I could remove my downvote.Advanced
I'm grateful for the help; thank you!Substitute
Your last edit about src and dst seems to only make sense for mov or other write-only instructions. Instructions like add [rdi], eax both read and write the memory-destination operand. I think the clearest way to explain it is that "memory at that address is an operand for the instruction", whether that's a source, destination, or RMW destination.Advanced
If you plan this - "So, this can be an ultimate guide to assembly" - the answer to a specific question is not the best place. You can create other questions, you directly answer yourself as a FAQ, or create community wikis.Gratian
C
0

It indicates that the register should be used as a pointer for the actual location, instead of acting upon the register itself.

Chug answered 8/1, 2010 at 20:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.