What would happen if a system executes a part of the file that is zero-padded?

The decoding of 0 bytes completely depends on the CPU architecture. On many architectures, instruction are fixed length (for example 32-bit), so the relevant thing would be 00 00 00 00 (using hexdump notation).

On most Linux distros, clang/llvm comes with support for multiple target architectures built-in (clang -target and llvm-objdump), unlike gcc / gas / binutils, so I was able to use that to check for some architectures I didn't have cross-gcc / binutils installed for. Use llvm-objdump --version to see the supported list. (But I didn't figure out how to get it to disassemble a raw binary like binutils objdump -b binary, and my clang won't create SPARC binaries on its own.)

On x86, 00 00 (2 bytes) decodes (http://ref.x86asm.net/coder32.html) as an 8-bit add with a memory destination. The first byte is the opcode, the 2nd byte is the ModR/M that specifies the operands.

This usually segfaults right away (if eax/rax isn't a valid pointer), or segfaults once execution falls off the end of the zero-padded part into an unmapped page. (This happens in real life because of bugs like falling off the end of _start without making an exit system call), although in those cases the following bytes aren't always all zero. e.g. data, or ELF metadata.)

x86 64-bit mode: ndisasm -b64 /dev/zero | head:

address   machine code      disassembly
00000000  0000              add [rax],al

x86 32-bit mode (-b32):

00000000  0000              add [eax],al

x86 16-bit mode: (-b16):

00000000  0000              add [bx+si],al

AArch32 ARM mode: cd /tmp && dd if=/dev/zero of=zero bs=16 count=1 && arm-none-eabi-objdump -z -D -b binary -marm zero. (Without -z, objdump skips over large blocks of all-zero and shows ...)

addr   machine code   disassembly
0:   00000000        andeq   r0, r0, r0

ARM Thumb/Thumb2: arm-none-eabi-objdump -z -D -b binary -marm --disassembler-options=force-thumb zero

0:   0000            movs    r0, r0
2:   0000            movs    r0, r0

AArch64: aarch64-linux-gnu-objdump -z -D -b binary -maarch64 zero

 0:   00000000        .inst   0x00000000 ; undefined

MIPS32: echo .long 0 > zero.S && clang -c -target mips zero.S && llvm-objdump -d zero.o

zero.o: file format ELF32-mips
Disassembly of section .text:
   0:       00 00 00 00     nop

PowerPC 32 and 64-bit: -target powerpc and -target powerpc64. IDK if any extensions to PowerPC use the 00 00 00 00 instruction encoding for anything, or if it's still an illegal instruction on modern IBM POWER chips.

zero.o: file format ELF32-ppc   (or ELF64-ppc64)
Disassembly of section .text:
   0:       00 00 00 00  <unknown>

IBM S390: clang -c -target systemz zero.S

zero.o: file format ELF64-s390
Disassembly of section .text:
   0:       00 00  <unknown>
   2:       00 00  <unknown>

Recommended topics

Hot tags