Labels are a symbolic way to write memory addresses, nothing more, nothing less. A label itself takes no space, and is just a handy way to let you refer to that spot in memory later.
(Well, they can also turn into symbols in an object file to allow numeric addresses to be calculated at link time, instead of at assemble time. But for labels defined and referenced in the same file, this extra complexity is mostly invisible; see below about addresses being link-time constants, not assemble-time.)
e.g.
; NASM syntax, but the concepts apply exactly to MASM as well
; For MASM, you may need BYTE PTR or whatever size overrides in loads.
section .rodata ; or section .data if you want to be able to store here, too.
COUNT:
db 0x12
FOO:
db 0
BAR:
dw 0x80FF ; same as db 0xff, 0x80
A 4-byte load like mov eax, [COUNT]
will get 0x80FF0012 (since x86 is little-endian). A 2-byte load from FOO
like mov cx, [FOO]
will get 0xFF00.
You might actually use overlapping loads from a constant this way, e.g. with strings where some are substrings of others. For null-terminated strings, only common suffixes can be combined into the same storage space this way.
Now does this mean that COUNT
is a 4-byte variable or a 1-byte variable? No, neither. Assembly language doesn't really have "variables".
Variables are a higher-level concept that you can implement in assembly language with a label and an assembler directive that reserves some static space. Notice that the labels are separate from the db
directives in the example above.
But a variable doesn't need to have any static storage space: e.g. your loop counter variable can (and often should) exist only in a register.
A variable doesn't even need to have a single fixed location. It can be spilled to the stack in part of a function where it's not used, but live in registers in another part of a function. In compiler-generated code, variables often move between registers for no reason because compilers don't even try to use the same register for the same variable.
Note that MASM does implicitly associate a label with an operand-size based on the directive that follows it. So you might have to write mov eax, dword ptr [count]
if mov eax, [count]
gives an operand-size mismatch error.
Some people consider this a feature, but others think this magic operand-size stuff is totally weird. NASM syntax doesn't have any of this magic. You can tell how a line will assemble without having to go and find where the labels are defined. add [count], 1
is an error in NASM, because nothing implies an operand-size.
Don't get stuck into thinking that everything you'd use a variable for in C must have static storage with a label in your assembly language programs. But if you do want to use the term "variable" for static data-storage + a label like Kip Irvine does, then go ahead.
Also note that data labels are not special or different from code labels. Nothing stops you from writing jmp COUNT
. Decoding 12 00 FF 80 as a (sequence of) x86 instruction(s) is left as an exercise for the reader, but (if it's in a page with execute permission), it will be fetched and decoded by the CPU.
Similarly, nothing stops you from loading data from code labels as a memory operand. It's not usually a good idea for performance reasons to mix code and data (all CPUs use split L1D and L1I caches), but that works too. In a typical OS (like Linux), the text segment of an executable contains the code and read-only data sections, and is mapped with read and execute permission. (But not write permission, so trying to store will fault unless you modified the permissions.)
A JIT-compiler writes machine code to a buffer and then jumps there. It could be a static buffer with a label, but more usually it would be a dynamically-allocated buffer whose address is a variable.
Static addresses are usually link-time constants, but often not assemble-time constants. (Unless you're writing a bootloader, or something else that is definitely loaded at a known address, then org 0x100
might be useful.) This means you can do mov al, [COUNT+2]
, but not mov al, [COUNT*2]
. (Object-file formats support integer displacements, but not other math operators).
In PIC code, label addresses are not even link-time constants, but at least in 64-bit PIC code the offset from code to a data label is a link-time constant, so RIP-relative addressing can be used without an extra level of indirection (through the Global Offset Table).
count DWORD 100
creates a label that will have an offset that will eventually be known when the program is run.count
is the label. It will eventually have an address. At that address there is a 32-bit value (DWORD) equal to 100 – Holmquistcount
data label in my code, i'm actually using the value contained in that memory location. what if I want to know the actually memory location ofcount
? is it possible to get the actual value of memory location? – Macneiloffset
keyword to get the address ofcount
. If you have a 32-bit programmov eax, offset count
would move the 32-bit address of count into eax.mov eax, [count]
would move the 32-bit value at the address associated with count in EAX. You can also get the address of a label with LEA using something likelea eax, [count]
. With LEA (load effective address) you don't use theoffset
keyword. – Holmquistcount db 10 ; reserve+define 1 byte
and then overwrite more memorymov [count],ebx ; writes 4 bytes
. The MASM is one of rare x86 assemblers trying to actually track the "type" of label a bit, but it rarely helps, and other assemblers don't do it. So don't rely on it, treat labels in mind rather low level. – Toothpastemov eax,[count]
doesn't fetch somecount
label variable first, but has the correct memory address encoded directly in the instruction opcode, i.e.mov eax,[<some 32bit number as address>]
. – Toothpaste