What would happen if a system executes a part of the file that is zero-padded?
Asked Answered
A

1

1

I've seen in some posts/videos/files that they are zero-padded to look bigger than they are, or match "same file size" criteria some file system utilities have for moving files, mostly they are either prank programs, or malware.

But I often wondered, what would happen if the file corrupted, and would "load" the next set of "instructions" that are in the big zero-padded space at the end of the file?

Would anything happen? What's the instruction set for 0x0?

Ariose answered 3/5, 2018 at 8:57 Comment(5)
That depends on what architecture the program has been written for.Duque
On x86-64 00 00 is add [rax],al (add [eax],al in 32b mode). On Z80 00 is nop, etc... (each CPU may have different instruction set and opcodes).Indication
executable files are often padded with NOPs, which are not zeros in x86. And in debug mode it may be padded with other valuesEcclesiolatry
@prl: The OP is talking about that posts and videos they've seen which said that "fake" executable files are sometimes padded.Gadson
One video even got as far as to open one of those files in a hex editor and show the masses of 0'sAriose
G
7

The decoding of 0 bytes completely depends on the CPU architecture. On many architectures, instruction are fixed length (for example 32-bit), so the relevant thing would be 00 00 00 00 (using hexdump notation).

On most Linux distros, clang/llvm comes with support for multiple target architectures built-in (clang -target and llvm-objdump), unlike gcc / gas / binutils, so I was able to use that to check for some architectures I didn't have cross-gcc / binutils installed for. Use llvm-objdump --version to see the supported list. (But I didn't figure out how to get it to disassemble a raw binary like binutils objdump -b binary, and my clang won't create SPARC binaries on its own.)


On x86, 00 00 (2 bytes) decodes (http://ref.x86asm.net/coder32.html) as an 8-bit add with a memory destination. The first byte is the opcode, the 2nd byte is the ModR/M that specifies the operands.

This usually segfaults right away (if eax/rax isn't a valid pointer), or segfaults once execution falls off the end of the zero-padded part into an unmapped page. (This happens in real life because of bugs like falling off the end of _start without making an exit system call), although in those cases the following bytes aren't always all zero. e.g. data, or ELF metadata.)


x86 64-bit mode: ndisasm -b64 /dev/zero | head:

address   machine code      disassembly
00000000  0000              add [rax],al

x86 32-bit mode (-b32):

00000000  0000              add [eax],al

x86 16-bit mode: (-b16):

00000000  0000              add [bx+si],al

AArch32 ARM mode: cd /tmp && dd if=/dev/zero of=zero bs=16 count=1 && arm-none-eabi-objdump -z -D -b binary -marm zero. (Without -z, objdump skips over large blocks of all-zero and shows ...)

addr   machine code   disassembly
0:   00000000        andeq   r0, r0, r0

ARM Thumb/Thumb2: arm-none-eabi-objdump -z -D -b binary -marm --disassembler-options=force-thumb zero

0:   0000            movs    r0, r0
2:   0000            movs    r0, r0

AArch64: aarch64-linux-gnu-objdump -z -D -b binary -maarch64 zero

 0:   00000000        .inst   0x00000000 ; undefined

MIPS32: echo .long 0 > zero.S && clang -c -target mips zero.S && llvm-objdump -d zero.o

zero.o: file format ELF32-mips
Disassembly of section .text:
   0:       00 00 00 00     nop

PowerPC 32 and 64-bit: -target powerpc and -target powerpc64. IDK if any extensions to PowerPC use the 00 00 00 00 instruction encoding for anything, or if it's still an illegal instruction on modern IBM POWER chips.

zero.o: file format ELF32-ppc   (or ELF64-ppc64)
Disassembly of section .text:
   0:       00 00 00 00  <unknown>

IBM S390: clang -c -target systemz zero.S

zero.o: file format ELF64-s390
Disassembly of section .text:
   0:       00 00  <unknown>
   2:       00 00  <unknown>
Gadson answered 3/5, 2018 at 9:11 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.