How to know if an assembly code has particular syntax (emu8086, NASM, TASM, ...)?
Asked Answered
D

2

5

I want to know how,by looking through a sample source code, recognise if the syntax used is em8086, TASM or NASM? I am a new to assembly..I would like to know more about emu8086 please.

Dirtcheap answered 30/6, 2017 at 19:27 Comment(4)
Most valid emu8086 programs will be valid TASM programs, so it can be hard to tell those two apart.Duplessis
@So you mean if i write a source code using emu8086, it will be compatible with TASM?Dirtcheap
And TASM is largely compatible with MASM, actually when configured so, it can compile probably almost all of it (but it has also it's own "TASM ideal mode", which is a bit cleaner version of MASM (cleaner toward Intel syntax), but would not compile under MASM. NASM has some TASM compatibility command line switch, which IIRC works only for most basic sources, so don't bother. YASM should compile almost anything of NASM (maybe except some advanced macros). Emu8086 is "gimmicky" MASM/TASM, basic examples will work. Also emu8086 is least strict = weird source often compiles to something = bugs.Magnetostriction
To recognize between them by a look at source may be close to impossible in some cases, if the source doesn't use anything very specific to particular Assembler. You can usually recognize 16 vs 32 vs 64 bit code (especially if it is long enough), but 32b instructions may be used even in 16b real mode, so again may get tricky. The easiest way to be sure is of course to check the documentation of author, any decent source has usually documentation how it was built and what tools+versions were used to develop it, plus the legal license of the source itself. And each OS API is different of course.Magnetostriction
A
5

NASM/YASM is easy to distinguish from MASM/TASM/emu8086. YASM uses NASM syntax, with a few minor differences in what it accepts for constants and directives.

I don't know how to distinguish MASM from TASM, or TASM from emu8086, or FASM, so I'll leave that for another answer to address.


In NASM, explicit sizes on things like memory operands use dword or byte. In TASM/MASM style, you have to write dword ptr or byte ptr.

In MASM (and I think TASM/emu8086), a bare symbol name referes to the contents. You have to use offset foo to get the address of foo. In NASM, you have to use [foo] to create a memory operand, and foo is the address.

There are probably other differences in syntax, too (e.g. in segment overrides), but these should be enough to tell by looking whether something is NASM-style or MASM-style.

NASM:

global foo
foo:         ; a function called foo()
    add    dword [ecx], 2
    add    dword [counter], 1   ; Error without "dword", because neither operand implies an operand-size for the instruction.  And the [] is required.
    mov    eax, [static_var]
    mov    eax, [static_array + ecx*4] ; Everything *must* be inside the []

    mov    esi, static_var      ; mov esi,imm32 with the address of the static_var
    ret

section .data
 static_var: dd 0xdeadbeef     ; NASM can use 0x... constant.  MASM only allows 0DEADBEEFh style

section .bss
 counter: resd 1    ; reserve space for one dword (initialized to zero)
 buf:     resb 256  ; reserve 256 bytes

Note the : after label names here, even for data. This is recommended but not required: any unknown token at the start of a line is assumed to be a label so counter resd 1 will assemble. But loop resd 1 won't because loop is a valid instruction mnemonic.

MASM/TASM (I may have some of this wrong, I don't use MASM or TASM):

GNU GAS .intel_syntax noprefix is mostly the same, but without the magic operand-size association for labels. And GAS directives / pseudo-instruction are totally different, like .byte 0x12 vs. db 12h.

.CODE
foo PROC      ; PROC/ENDP definitely means not NASM
    add    dword ptr [ecx], 2
    add    counter, 1            ; operand-size magically implied by the dd after the counter label.  [] is optional
    mov    eax, static_var       ; mov  eax, [static_var] is the same, and recommended by some for clarity
    mov    eax, static_array[ecx*4] ; [ static_array + ecx*4 ] is also allowed, but not required.

    mov    esi, OFFSET static_var   ; mov esi,imm32 with the address.
    ret
ENDP

.data       ; no SECTION directive, just .data directly

  static_var dd 0deadbeefH
;;; With a : after the name, it would be just a label, not a "variable" with a size associated.

.bss
  ; (In most OSes, the BSS is initialized to zero.  I assume MASM/TASM allows you to write dd 0 in the BSS, but I'm not sure)

 counter: dd 0        ; reserve space for one dword (zeroed)
 buf   db 256 dup(?)  ; reserve 256 bytes (uninitialized).

Except where I commented otherwise, any of these differences are a guaranteed sign that it's NASM/YASM or MASM/TASM/emu8086

e.g. if you ever see a bare symbol as the destination operand (e.g. mov foo, eax), it's definitely not NASM, because mov imm32, r32 makes no sense. Unless the symbol is actually a macro definition for a register, e.g. %define result eax would allow mov result, 5. (Good catch, @MichaelPetch). If the source is full of macros, then look for the defs. %define means NASM, while MACRO means MASM/TASM.

MASM/TASM doesn't have resb / resd directives. Instead, they have count DUP(value), where value can be ?.

NASM has times 30 db 0x10 to repeat the byte 0x10 30 times. You can use it on anything, even instructions. It also has %rep directives to repeat a block.

MASM and NASM have significant macro capabilities, but they use different syntax.

The tag wiki has links to assembler manuals and much more.


Other random things when assembling code with the wrong assembler:

In MASM, dword by itself (not dword ptr) evaluates as the number 4, because that's the width of a dword. So mov dword [foo], 123 will disastrously assemble as mov 4[foo], 123 which is the same as [foo+4]. And the operand-size will be whatever size is implied by how you declared foo, e.g. foo db 1,2,3,4 is an array of bytes, so mov dword [foo], 123 assembled by MASM is actually mov byte ptr:foo, 123.

See also Confusing brackets in MASM32 for the disaster of syntax-design that is MASM. mov eax, [const] is a mov-immediate if const was declared like const=0xb8000.

Astrogeology answered 1/7, 2017 at 0:48 Comment(6)
%define foo eax mov foo, eax works. As well I think you mean times 30 db 0x10 instead of time 30 db 0x10 Chemise
@MichaelPetch: Thanks, great point about heavily-macroed NASM code looking different. Then I think you'd just want to look for %define vs. MACRO to differentiate.Astrogeology
@PeterCordes Thank you. Could you also give me an example of emu8086 syntax like you did for NASM and NASM.Dirtcheap
@ASG: no, sorry I can't. I don't know emu8086 or have any interest in it. For me, it's just another one of those weird MASM-style assemblers that I can read but have to look up examples to write directives. (edits welcome from anyone that does know emu8086, if they don't want to post a whole answer of their own.) I think the MASM/TASM example is also valid for emu8086, but it might not be. I cobbled it together from looking at code in questions/answers with the tasm tag. Try looking at the emu8086 tag.Astrogeology
If you want to mess around with 16-bit code, I think BOCHS + NASM is a good choice. BOCHS has a built-in debugger. But I wouldn't recommend learning 16-bit x86 at all, until after you learn 32/64-bit. It's pretty much only useful for bootloaders and stuff. 32/64-bit is actually simpler.Astrogeology
@PeterCordes I will surely and thanks for all the information.Dirtcheap
M
0

First: All of them are x86 assemblers with intel syntax so... the syntax of the instructions is the same; they also should use the same mnemonic.

You should know the syntax of all of them.

related: Assembly difference between TASM and MASM

The directives can differ. These are some examples of NASM directives:

  • BITS 32
  • BITS 16
  • ORGsegment

Here you should find all the nasm keywords(not all the element of that list are nasm keywords): http://www.nasm.us/doc/nasmdoci.html

I think emu8086 uses FASM

Mush answered 30/6, 2017 at 19:56 Comment(1)
NASM and TASM use the same mnemonics, but the operand syntax differs for memory operands. (dword vs. dword ptr). You can't just change the directives to port something from TASM to NASM.Astrogeology

© 2022 - 2024 — McMap. All rights reserved.