What's the difference between equ and db in NASM?
Asked Answered
B

3

42
len:  equ  2
len:  db   2

Are they the same, producing a label that can be used instead of 2? If not, then what is the advantage or disadvantage of each declaration form? Can they be used interchangeably?

Beutler answered 4/11, 2011 at 8:41 Comment(0)
B
72

The first is equate, similar to C's:

#define len 2

in that it doesn't actually allocate any space in the final code, it simply sets the len symbol to be equal to 2. Then, when you use len later on in your source code, it's the same as if you're using the constant 2.

The second is define byte, similar to C's:

int len = 2;

It does actually allocate space, one byte in memory, stores a 2 there, and sets len to be the address of that byte.

Here's some pseudo-assembler code that shows the distinction:

line   addr   code       label   instruction
----   ----   --------   -----   -----------
   1   0000                      org    1234h
   2   1234              elen    equ    2
   3   1234   02         dlen    db     2
   4   1235   44 02 00           mov    ax,     elen
   5   1238   44 34 12           mov    ax,     dlen

Line 1 simply sets the assembly address to be 1234h, to make it easier to explain what's happening.

In line 2, no code is generated, the assembler simply loads elen into the symbol table with the value 2. Since no code has been generated, the address does not change.

Then, when you use it on line 4, it loads that value into the register.

Line 3 shows that db is different, it actually allocates some space (one byte) and stores the value in that space. It then loads dlen into the symbol table but gives it the value of that address 1234h rather than the constant value 2.

When you later use dlen on line 5, you get the address, which you would have to dereference to get the actual value 2.

Beutler answered 4/11, 2011 at 8:44 Comment(8)
NASM 2.10.09 ELF output nitpicks: 1) "no data is generated": true for the executable after linking and RAM space, but the object file that NASM generates does contain the symbol data. 2) "similar to C's #define": in a sense, but equ does generate a symbol, which could be used by other object files with extern and without including the macro in those files. More details: https://mcmap.net/q/383128/-what-39-s-the-difference-between-equ-and-db-in-nasmNeoma
Good points, @Ciro, I thought the meaning was obvious from context but, to be certain, I've changed data to code to ensure clarity. As to the #define, similarity isn't necessarily equality but I'll try to clarify that as well :-)Beutler
So, indeed does db generate global variables...? Instead is there a easier method to store string in frame stacks? (other than mov dword [rsp] 'foo' ;because storing longer strings become difficult)Omnifarious
len: db 2 is more like char len = 2, not int. For int you'd use dd. (Or dw if you're targeting a 16-bit ABI where int is int16_t.)Levins
@PeterCordes wouldn't unsigned char len be more appropriate?Fullblooded
so equ would result in immediate addressing while db, in direct addressing?Fullblooded
@FirstUser: It would be equally appropriate, so would signed char len = 2;. In asm, signedness is about what instructions you use on the value; db doesn't imply signed or unsigned. (If you'd used db -2, then in C terms it would definitely be a signed char.) You might later reuse the same storage for an unsigned value, but that would be like a C union.) C char is either signed or unsigned, depending on the implementation; mainstream x86 ABIs have signed char, although compilers will still treat char vs. signed char mismatches as worth warning about for portability.Levins
@FirstUser: re: addressing: there is no addressing until you use them. If I did mov eax, [rdi + len] with the EQU version, that would be a displacement to go with a base regsiter. Or with DB, that would use the symbol address as the displacement, RDI indexing a byte array. If I did mov eax, [rbx + rdi*len], that would only work with the assemble-time constant EQU, becoming the scale factor. In NASM syntax, mov edi, len is an immediate operand either way, either the value 2 or the symbol address. To get 2 into a register, though, yes, mov eax, len vs. movzx eax, byte [len]Levins
N
9

Summary

NASM 2.10.09 ELF output:

  • db does not have any magic effects: it simply outputs bytes directly to the output object file.

    If those bytes happen to be in front of a symbol, the symbol will point to that value when the program starts.

    If you are on the text section, your bytes will get executed.

    Weather you use db or dw, etc. that does not specify the size of the symbol: the st_size field of the symbol table entry is not affected.

  • equ makes the symbol in the current line have st_shndx == SHN_ABS magic value in its symbol table entry.

    Instead of outputting a byte to the current object file location, it outputs it to the st_value field of the symbol table entry.

All else follows from this.

To understand what that really means, you should first understand the basics of the ELF standard and relocation.

SHN_ABS theory

SHN_ABS tells the linker that:

  • relocation is not to be done on this symbol
  • the st_value field of the symbol entry is to be used as a value directly

Contrast this with "regular" symbols, in which the value of the symbol is a memory address instead, and must therefore go through relocation.

Since it does not point to memory, SHN_ABS symbols can be effectively removed from the executable by the linker by inlining them.

But they are still regular symbols on object files and do take up memory there, and could be shared amongst multiple files if global.

Sample usage

section .data
    x: equ 1
    y: db 2
section .text
global _start
_start:
    mov al, x
    ; al == 1
    mov al, [y]
    ; al == 2

Note that since the symbol x contains a literal value, no dereference [] must be done to it like for y.

If we wanted to use x from a C program, we'd need something like:

extern char x;
printf("%d", &x);

and set on the asm:

global x

Empirical observation of generated output

We can observe what we've said before with:

nasm -felf32 -o equ.o equ.asm
ld -melf_i386 -o equ equ.o

Now:

readelf -s equ.o

contains:

Num:    Value  Size Type    Bind   Vis      Ndx Name
  4: 00000001     0 NOTYPE  LOCAL  DEFAULT  ABS x
  5: 00000000     0 NOTYPE  LOCAL  DEFAULT    1 y

Ndx is st_shndx, so we see that x is SHN_ABS while y is not.

Also see that Size is 0 for y: db in no way told y that it was a single byte wide. We could simply add two db directives to allocate 2 bytes there.

And then:

objdump -dr equ

gives:

08048080 <_start>:
 8048080:       b0 01                   mov    $0x1,%al
 8048082:       a0 88 90 04 08          mov    0x8049088,%al

So we see that 0x1 was inlined into instruction, while y got the value of a relocation address 0x8049088.

Tested on Ubuntu 14.04 AMD64.

Docs

http://www.nasm.us/doc/nasmdoc3.html#section-3.2.4:

EQU defines a symbol to a given constant value: when EQU is used, the source line must contain a label. The action of EQU is to define the given label name to the value of its (only) operand. This definition is absolute, and cannot change later. So, for example,

message         db      'hello, world' 
msglen          equ     $-message

defines msglen to be the constant 12. msglen may not then be redefined later. This is not a preprocessor definition either: the value of msglen is evaluated once, using the value of $ (see section 3.5 for an explanation of $) at the point of definition, rather than being evaluated wherever it is referenced and using the value of $ at the point of reference.

See also

Analogous question for GAS: Difference between .equ and .word in ARM Assembly? .equiv seems to be the closes GAS equivalent.

Neoma answered 15/10, 2015 at 12:12 Comment(1)
As the manual alludes to, you can use $ in equates which can result in a symbol much like putting a label. That is, label: and label equ $ are almost exactly the same. (Equates are ignored for the local label mechanism however.) The example with $-message is the difference of two symbols though so it is evaluated as a scalar number.Cuneal
N
8

equ: preprocessor time. analogous to #define but most assemblers are lacking an #undef, and can't have anything but an atomic constant of fixed number of bytes on the right hand side, so floats, doubles, lists are not supported with most assemblers' equ directive.

db: compile time. the value stored in db is stored in the binary output by the assembler at a specific offset. equ allows you define constants that normally would need to be either hardcoded, or require a mov operation to get. db allows you to have data available in memory before the program even starts.

Here's a nasm demonstrating db:

; I am a 16 byte object at offset 0.
    db '----------------'

; I am a 14 byte object at offset 16
; the label foo makes the assembler remember the current 'tell' of the 
; binary being written.
foo:
    db 'Hello, World!', 0

; I am a 2 byte filler at offset 30 to help readability in hex editor.
    db ' .'

; I am a 4 byte object at offset 16 that the offset of foo, which is 16(0x10).
    dd foo

enter image description here

An equ can only define a constant up to the largest the assembler supports

example of equ, along with a few common limitations of it.

; OK
ZERO equ 0

; OK(some assemblers won't recognize \r and will need to look up the ascii table to get the value of it).
CR equ 0xD
; OK(some assemblers won't recognize \n and will need to look up the ascii table to get the value of it).
LF equ 0xA

; error: bar.asm:2: warning: numeric constant 102919291299129192919293122 -
; does not fit in 64 bits
; LARGE_INTEGER equ 102919291299129192919293122

; bar.asm:5: error: expression syntax error
; assemblers often don't support float constants, despite fitting in
; reasonable number of bytes. This is one of the many things
; we take for granted in C, ability to precompile floats at compile time
; without the need to create your own assembly preprocessor/assembler.
; PI equ 3.1415926 

; bar.asm:14: error: bad syntax for EQU
; assemblers often don't support list constants, this is something C
; does support using define, allowing you to define a macro that
; can be passed as a single argument to a function that takes multiple.
; eg
; #define RED 0xff, 0x00, 0x00, 0x00
; glVertex4f(RED);
; #undef RED
;RED equ 0xff, 0x00, 0x00, 0x00

the resulting binary has no bytes at all because equ does not pollute the image; all references to an equ get replaced by the right hand side of that equ.

Nonjuror answered 26/1, 2018 at 21:19 Comment(1)
Equates may be similar to defines but NASM does have %define (and %xdefine and %assign) also.Cuneal

© 2022 - 2024 — McMap. All rights reserved.