What is the function of a "data label" in an x86 assembler?
Asked Answered
M

1

6

I'm currently learning assembly programming by following Kip Irvine's "assembly language x86 programming" book.

In the book, the authors tries to explain the concept of data label

A data label identifies the location of a variable, providing a convenient way to reference the variable in code. The following, for example, defines a variable named count:

count DWORD 100

The assembler assigns a numeric address to each label.

So my understanding of what data label does is: data label count is a variable that contain a numeric value, where the numeric value is a location in memory. When I use count in my code, I'm actually using the value contained in that location in memory, in this instance, 100.

Is my understanding of data label correct? If it is somewhat incorrect, could someone please point the mistake out?

Macneil answered 25/6, 2017 at 3:14 Comment(7)
A data label is a reference (alias) to a memory address that holds data. count DWORD 100 creates a label that will have an offset that will eventually be known when the program is run. count is the label. It will eventually have an address. At that address there is a 32-bit value (DWORD) equal to 100Holmquist
@MichaelPetch so when i'm using the count data label in my code, i'm actually using the value contained in that memory location. what if I want to know the actually memory location of count? is it possible to get the actual value of memory location?Macneil
in masm you can use the offset keyword to get the address of count. If you have a 32-bit program mov eax, offset count would move the 32-bit address of count into eax. mov eax, [count] would move the 32-bit value at the address associated with count in EAX. You can also get the address of a label with LEA using something like lea eax, [count]. With LEA (load effective address) you don't use the offset keyword.Holmquist
@MichaelPetch thank you very much for helping out! man, you sure is pro as in assembly language, hopefully one day i can be like you.Macneil
@Michael Since our good Captain here found your comments helpful, you should consider promoting them to an answer. I'm not sure what else I would add if I were to post my own. Honestly, Irvine's explanation seems pretty good; I'm not sure how I would clear it up were I editing his book. Maybe calling this a "variable" is confusing for someone who already knows other higher-level programming languages, and it would be better just to avoid this term altogether in this context?Os
OP: "At that address there is a 32-bit value (DWORD) equal to 100" .. the label actually points at the first byte of that dword. You can use it to access any amount of bytes, for example the common mistakes of new asm programmers is to allocate wrong amount of memory for some variable count db 10 ; reserve+define 1 byte and then overwrite more memory mov [count],ebx ; writes 4 bytes. The MASM is one of rare x86 assemblers trying to actually track the "type" of label a bit, but it rarely helps, and other assemblers don't do it. So don't rely on it, treat labels in mind rather low level.Toothpaste
Also to get better idea why those subtle differences (label vs variable) matter, you should switch "listing" option during assembling of particular source, and check outputted machine code to better understand what memory content forms a code for the computer. You will then recognize the labels are just assembler symbols, valid during compilation and linking, but not part of target machine code, i.e. mov eax,[count] doesn't fetch some count label variable first, but has the correct memory address encoded directly in the instruction opcode, i.e. mov eax,[<some 32bit number as address>].Toothpaste
W
8

Labels are a symbolic way to write memory addresses, nothing more, nothing less. A label itself takes no space, and is just a handy way to let you refer to that spot in memory later.

(Well, they can also turn into symbols in an object file to allow numeric addresses to be calculated at link time, instead of at assemble time. But for labels defined and referenced in the same file, this extra complexity is mostly invisible; see below about addresses being link-time constants, not assemble-time.)

e.g.

; NASM syntax, but the concepts apply exactly to MASM as well
; For MASM, you may need  BYTE PTR or whatever size overrides in loads.
section .rodata     ; or section .data  if you want to be able to store here, too.
COUNT:
   db 0x12
FOO:
   db 0
BAR:
   dw 0x80FF      ; same as   db 0xff, 0x80

A 4-byte load like mov eax, [COUNT] will get 0x80FF0012 (since x86 is little-endian). A 2-byte load from FOO like mov cx, [FOO] will get 0xFF00.

You might actually use overlapping loads from a constant this way, e.g. with strings where some are substrings of others. For null-terminated strings, only common suffixes can be combined into the same storage space this way.


Now does this mean that COUNT is a 4-byte variable or a 1-byte variable? No, neither. Assembly language doesn't really have "variables".

Variables are a higher-level concept that you can implement in assembly language with a label and an assembler directive that reserves some static space. Notice that the labels are separate from the db directives in the example above.

But a variable doesn't need to have any static storage space: e.g. your loop counter variable can (and often should) exist only in a register.

A variable doesn't even need to have a single fixed location. It can be spilled to the stack in part of a function where it's not used, but live in registers in another part of a function. In compiler-generated code, variables often move between registers for no reason because compilers don't even try to use the same register for the same variable.


Note that MASM does implicitly associate a label with an operand-size based on the directive that follows it. So you might have to write mov eax, dword ptr [count] if mov eax, [count] gives an operand-size mismatch error.

Some people consider this a feature, but others think this magic operand-size stuff is totally weird. NASM syntax doesn't have any of this magic. You can tell how a line will assemble without having to go and find where the labels are defined. add [count], 1 is an error in NASM, because nothing implies an operand-size.

Don't get stuck into thinking that everything you'd use a variable for in C must have static storage with a label in your assembly language programs. But if you do want to use the term "variable" for static data-storage + a label like Kip Irvine does, then go ahead.


Also note that data labels are not special or different from code labels. Nothing stops you from writing jmp COUNT. Decoding 12 00 FF 80 as a (sequence of) x86 instruction(s) is left as an exercise for the reader, but (if it's in a page with execute permission), it will be fetched and decoded by the CPU.

Similarly, nothing stops you from loading data from code labels as a memory operand. It's not usually a good idea for performance reasons to mix code and data (all CPUs use split L1D and L1I caches), but that works too. In a typical OS (like Linux), the text segment of an executable contains the code and read-only data sections, and is mapped with read and execute permission. (But not write permission, so trying to store will fault unless you modified the permissions.)

A JIT-compiler writes machine code to a buffer and then jumps there. It could be a static buffer with a label, but more usually it would be a dynamically-allocated buffer whose address is a variable.


Static addresses are usually link-time constants, but often not assemble-time constants. (Unless you're writing a bootloader, or something else that is definitely loaded at a known address, then org 0x100 might be useful.) This means you can do mov al, [COUNT+2], but not mov al, [COUNT*2]. (Object-file formats support integer displacements, but not other math operators).

In PIC code, label addresses are not even link-time constants, but at least in 64-bit PIC code the offset from code to a data label is a link-time constant, so RIP-relative addressing can be used without an extra level of indirection (through the Global Offset Table).

Wedge answered 29/6, 2017 at 7:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.