How do I disassemble raw 16-bit x86 machine code?
Asked Answered
S

7

103

I'd like to disassemble the MBR (first 512 bytes) of a bootable x86 disk that I have. I have copied the MBR to a file using

dd if=/dev/my-device of=mbr bs=512 count=1

Any suggestions for a Linux utility that can disassemble the file mbr?

Squeteague answered 15/11, 2009 at 9:36 Comment(0)
G
120

You can use objdump. According to this article the syntax is:

objdump -D -b binary -mi386 -Maddr16,data16 mbr
Gumwood answered 15/11, 2009 at 9:42 Comment(2)
can you explain what the options you specify do?Furtive
or --target instead of -b. -D is "disassemble the contents of all sections"; -b bfdname or --target=bfdname will force reading as specified object-code format (not elf but raw binary in our case); -m machine will specify the architecture to use (in our file there is no header with arch info). -M options are options of disassembler; addr16,data16 are used to "specify the default address size and operand size" (treat code as i8086 one in the universal x86 disasm engine)Gasbag
B
37

The GNU tool is called objdump, for example:

objdump -D -b binary -m i8086 <file>
Bidwell answered 15/11, 2009 at 9:45 Comment(2)
You can also set different options for the architecture and the syntax. For example, -m i386 or -Mintel,x86-64. i8086 is an old architecture and using it for modern code may yield unexpected results. Furthermore, specifying x86-64 to -M might be a good idea nowadays since many machines are 64-bit. Passing intel to -M changes the syntax to Intel-style instead of the default AT&T style, which you may or may not want.Ingmar
@GDP2: This question is specifically about disassembling a 16-bit mode raw flat binary with no metadata, like a DOS .com executable or a legacy BIOS MBR, thus i8086 is appropriate. objdump -D -b binary -mi8086 -Mintel foo correctly disassembles a 3-byte file as 0f 58 07 addps xmm0, [bx], so -m i8086 is just setting the mode (16-bit mode), not telling it that there can't be new instructions that 8086 didn't have. If you had a different executable, like a 32 or 64-bit ELF binary, yes you'd want objdump -drwC -MintelGodspeed
F
27

I like ndisasm for this purpose. It comes with the NASM assembler, which is free and open source and included in the package repositories of most linux distros.

Father answered 15/11, 2009 at 9:42 Comment(2)
I like this answer better. Easier to use, and I could install nasm on OS X - objdump wasn't there, and I don't want to build it from source.Getty
MacOS will typically have llvm-objdump which probably has similar options, but yes, ndisasm is a good tool for the no-metadata case (otherwise quite inconvenient because it doesn't know about metadata). It doesn't have any possibility of AT&T syntax bugs like swapping some x87 mnemonics which objdump -d had at one point (but not anymore).Godspeed
C
23
ndisasm -b16 -o7c00h -a -s7c3eh mbr

Explanation - from ndisasm manpage

  • -b = Specifies 16-, 32- or 64-bit mode. The default is 16-bit mode.
  • -o = Specifies the notional load address for the file. This option causes ndisasm to get the addresses it lists down the left hand margin, and the target addresses of PC-relative jumps and calls, right.
  • -a = Enables automatic (or intelligent) sync mode, in which ndisasm will attempt to guess where synchronisation should be performed, by means of examining the target addresses of the relative jumps and calls it disassembles.
  • -s = Manually specifies a synchronisation address, such that ndisasm will not output any machine instruction which encompasses bytes on both sides of the address. Hence the instruction which starts at that address will be correctly disassembled.
  • mbr = The file to be disassembled.
Cozy answered 8/7, 2011 at 6:44 Comment(3)
what does this do as opposed to simple ndisasm? Can you explain the optionsFurtive
Could you explain what those options mean and do? Understanding an answering is better than just getting one.Particia
-b specifies 16-, 32- or 64-bit mode. The default is 16-bit mode. -o is the notional load address for the file. This option causes ndisasm to get the addresses it lists down the left hand margin, and the target addresses of PC-relative jumps and calls, right. -s specifies a synchronisation address, such that ndisasm will not output any machine instruction which encompasses bytes on both sides of the address. Hence the instruction which starts at that address will be correctly disassembled.Discontinuity
A
18

starblue and hlovdal both have parts of the canonical answer. If you want to disassemble raw i8086 code, you usually want Intel syntax, not AT&T syntax, too, so use:

objdump -D -Mintel,i8086 -b binary -m i386 mbr.bin
objdump -D -Mintel,i386 -b binary -m i386 foo.bin    # for 32-bit code
objdump -D -Mintel,x86-64 -b binary -m i386 foo.bin  # for 64-bit code

If your code is ELF (or a.out (or (E)COFF)), you can use the short form:

objdump -D -Mintel,i8086 a.out  # disassembles the entire file
objdump -d -Mintel,i8086 a.out  # disassembles only code sections

For 32-bit or 64-bit code, omit the ,8086; the ELF header already includes this information.

ndisasm, as suggested by jameslin, is also a good choice, but objdump usually comes with the OS and can deal with all architectures supported by GNU binutils (superset of those supported by GCC), and its output can usually be fed into GNU as (ndisasm’s can usually be fed into nasm though, of course).

Peter Cordes suggests that “Agner Fog's objconv is very nice. It puts labels on branch targets, making a lot easier to figure out what the code does. It can disassemble into NASM, YASM, MASM, or AT&T (GNU) syntax.”

Multimedia Mike already found out about --adjust-vma; the ndisasm equivalent is the -o option.

To disassemble, say, sh4 code (I used one binary from Debian to test), use this with GNU binutils (almost all other disassemblers are limited to one platform, such as x86 with ndisasm and objconv):

objdump -D -b binary -m sh -EL x

The -m is the machine, and -EL means Little Endian (for sh4eb use -EB instead), which is relevant for architectures that exist in either endianness.

Alinealinna answered 22/12, 2015 at 20:44 Comment(5)
Agner Fog's objconv is very nice. It puts labels on branch targets, making a lot easier to figure out what the code does. It can disassemble into NASM, YASM, MASM, or AT&T (GNU) syntax.Godspeed
It built fine right out of the box on GNU/Linux, for me. But yes, it's x86 / x86-64 only, unlike GNU binutils. However, it has a lot of nice x86-specific hints that it adds as comments, like when an operand-size prefix can cause an LCP-stall in the decoders of an Intel CPU. By all means, mention it in your answer. One of the major purposes of comments is to help the poster improve their answer, not just as something that later viewers need to read, too.Godspeed
@PeterCordes Yes well I have MirBSD as main OS ;)Alinealinna
@PeterCordes but it seems it can't disassemble raw binaries, can it? I had to create minimal ELF files just to be able to feed a bunch of instructions into it, but maybe I just missed some option?Reitareiter
@Ruslan: IDK, interesting question. I usually just use objdump, or if I want branch labels, gcc -O3 -masm=intel -fverbose-asm -S -o- | less, since I'm usually trying to tweak C source into compiling to good asm.Godspeed
S
9

Try this command:

sudo dd if=/dev/sda bs=512 count=1 | ndisasm -b16 -o7c00h -
Sacchariferous answered 23/11, 2009 at 19:10 Comment(0)
V
0

If you're just looking to use a disassembler, then objdump is one choice. The disassembler that comes with the nasm assembler is ndisasm. You can also run "debug.exe" in DOS Box on Linux, provided you get a hold of a copy of the program. It also does disassembly, as well as controlled execution; i.e. simulation of the CPU, itself - which is also important, even when doing disassembly, for reasons I'm about to describe.

Fake86 has a cpu emulator. You may be able to hack it into doing disassembly by (a) having it show the instruction instead of simulating it, (b) having it not take conditional jumps or invoke calls, but (instead) stacking the address as a new entry point to do disassembly from (i.e., in effect, taking both branches and encapsulating subroutines), (c) having it stop the current disassembly at an unconditional jump or return, (d) having it accept one, two or more entry points to start with and ideally (e) having it also accept base addresses for data segments, and (f) getting it to do a hex dump of all the areas unprocessed as data or code segments (as these are usually where indirect jumps or calls or indirectly-accessed data segments land into.)

This gets to the other sense of your query: "I want to make a disassembler". The source for ndisasm is available, and it handles many of the descendants of 8086, not just 8086, itself (which seriously clutters it, if all you want is an 8086 or even 80386 disassembler), but it is not self-contained and has a heavy dependency on the rest of the distribution.

Its main talking point is that it uses octal digits for the opcodes - which better fits the 80x86 - as I pointed out on the USENET in 1995 in comp.lang.asm ... and (in fact) nasm's creation was a direct response to that. So, it's potentially more transparent and you may want to keep the source handy as a check and comparison, if you're making your own disassembler.

You can also run the debug.exe program on itself.

You could also try to run ndisasm on debug.exe; after stripping out the 0x200-byte .EXE file header, to make it a raw binary, after extracting out the entry point address CS:IP and stack pointer address SS:SP from it (80x86 stacks grow down, so the stack segment is nominally SS:0 to SS:(SP-1)). The EXE for debug.exe has no relocations, so you're okay with that treating the code as raw binary.

But you won't get anything that's clearly recognizable, since the program is self-modifying - more precisely: self-extracting. You'll get a (barely) compressed code image (about 5/6 compression ratio) followed by a loader routine.

You have to run emulation on it, e.g. by running debug.exe on debug.exe to emulate its unpacking routine, to get it to extract itself, and then you dump the unpacked program image and disassemble that. There is a "relocation table" at the end of the loader routine, so it does actually have relocations in it - it's just that they're applied when the program unpacks itself, rather than by the OS when the EXE file is loaded.

And then you've just disassembled a disassembler that also happens to do CPU emulation, like Fake86 does - but only for the 8086. You'll have to make the absolute addresses relative (using the original relocation table as a guide), to make is re-assemblable. Once you do that, you can work on the source. The opcode table is in clear view (if you display it as text) - both when seen in the packed and unpacked versions of debug.exe.

There's also DosDebug up on GitHub. It handles everything up to "80586" (or Pentium") and "80686": it flags a generation "6" for some instructions.; e.g. the conditional "cmov" operations are handled by it, as well as their "fcmov" floating point versions. DosDebug is in 8086 assembly and is best-suited to compile with jwasm. You might be able to run nasm on it, I don't know. I never tried.

I might port the DAS disassembler to the x86, since items (a)-(f) are already incorporated into DAS's design. I've only ever ported it to the 8051, 6800, 6809 and 8080/8085 (and Z80) up to now; but the transition from 8085 to 8086 is relatively small. To that end, I might hack something out of Fake86. That's mostly abandonware, now, since the author replaced it by XTulator, as Fake86 was written when the programmer was relatively new to C. You might also be able to hack something directly out of DosDebug's opcode tables (their "instr.*" files).

Vidette answered 11/9, 2022 at 0:18 Comment(2)
DOS Debug, which also used to be (sometimes still is) called FreeDOS Debug, also has several forks. There's Enhanced Debug, which is not free software. And there's my lDebug which is free software and still assembles with NASM (forked from an older FreeDOS Debug revision that also had NASM source).Carding
To address your comment on assemblers for FreeDOS Debug: Current versions only assemble with JWASM or MASM or possibly TASM, not with NASM. Version 1.13 was the last NASM source revision. (NASM/YASM only, in fact. Nontrivial programs usually don't assemble under different dialects.)Carding

© 2022 - 2024 — McMap. All rights reserved.