Smallest executable program (x86-64 Linux)

Asked 19/11, 2018 at 21:3 Answered 30/1 at 1:40

I recently came across this post describing the smallest possible ELF executable for Linux, however the post was written for 32 bit and I was unable to get the final version to compile on my machine.

This brings me to the question: what's the smallest x86-64 ELF executable it's possible to write that runs without error?

Abusing or violating the ELF specification is ok as long as current Linux kernels in practice will run it, as with the last couple versions of the MuppetLabs 32-bit teensy ELF executable article.

Akeyla answered 19/11, 2018 at 21:3 Comment(5)

What machine do you have? Windows subsystem for Linux (which doesn't support 32-bit executable at all)? Or a proper Linux kernel built without IA-32 compat? What do you mean you couldn't get the final version to even compile? Surely you got a binary file, but couldn't run it? (Anyway, I know your question isn't about that, but if you couldn't even compile the 32-bit version, you probably won't be able to use NASM's flat-binary output to create a 64-bit executable with code packed into the ELF headers either.) – Panicstricken 19/11, 2018 at 21:10

Can you use 32-bit int 0x80 system calls in your 64-bit executable? If so, your probably don't need to change much. I know there's some overlap of ELF header fields being interpreted as part of the machine code, so some change might be needed for ELF64. – Panicstricken 19/11, 2018 at 21:13

For 64 bit mode, you basically need to recreate the entire program as both the machine code and the layout of the ELF header is quite different. While this is a nice exercise for an experienced programmer, I'm not sure if you are going to get an answer to your question within the scope of this site. – Nativity 19/11, 2018 at 21:33

I'm voting to close this question as off-topic because code golf questions are off-topic on StackOverflow. – Refrigeration 19/11, 2018 at 22:32

This is not just a "code golf" question IMO; it has practical value as well. I came here because I was interested in writing a tiny assembly program by hand, and was looking for a starting point. – Elihu 23/10, 2022 at 1:5

Starting from an answer of mine about the "real" entrypoint of an ELF executable on Linux and "raw" syscalls, we can strip it down to

bits 64
global _start
_start:
   mov di,42        ; only the low byte of the exit code is kept,
                    ; so we can use di instead of the full edi/rdi
   xor eax,eax
   mov al,60        ; shorter than mov eax,60
   syscall          ; perform the syscall

I don't think you can get it to be any smaller without going out of specs - in particular, the psABI doesn't guarantee anything about the state of eax. This gets assembled to precisely 10 bytes (as opposed to the 7 bytes of the 32 bit payload):

66 bf 2a 00 31 c0 b0 3c 0f 05

The straightforward way (assemble with nasm, link with ld) produces me a 352 bytes executable.

The first "real" transformation he does is building the ELF "by hand"; doing this (with some modifications, as the ELF header for x86_64 is a bit bigger)

bits 64
            org 0x08048000

ehdr:                                           ; Elf64_Ehdr
            db  0x7F, "ELF", 2, 1, 1, 0         ;   e_ident
    times 8 db  0
            dw  2                               ;   e_type
            dw  62                              ;   e_machine
            dd  1                               ;   e_version
            dq  _start                          ;   e_entry
            dq  phdr - $$                       ;   e_phoff
            dq  0                               ;   e_shoff
            dd  0                               ;   e_flags
            dw  ehdrsize                        ;   e_ehsize
            dw  phdrsize                        ;   e_phentsize
            dw  1                               ;   e_phnum
            dw  0                               ;   e_shentsize
            dw  0                               ;   e_shnum
            dw  0                               ;   e_shstrndx

ehdrsize    equ $ - ehdr

phdr:                                           ; Elf64_Phdr
            dd  1                               ;   p_type
            dd  5                               ;   p_flags
            dq  0                               ;   p_offset
            dq  $$                              ;   p_vaddr
            dq  $$                              ;   p_paddr
            dq  filesize                        ;   p_filesz
            dq  filesize                        ;   p_memsz
            dq  0x1000                          ;   p_align

phdrsize    equ     $ - phdr

_start:
   mov di,42        ; only the low byte of the exit code is kept,
                    ; so we can use di instead of the full edi/rdi
   xor eax,eax
   mov al,60        ; shorter than mov eax,60
   syscall          ; perform the syscall

filesize      equ     $ - $$

we get down to 130 bytes. This is a tad bigger than the 91 bytes executable, but it comes from the fact that several fields become 64 bits instead of 32.

We can then apply some tricks similar to his; the partial overlap of phdr and ehdr can be done, although the order of fields in phdr is different, and we have to overlap p_flags with e_shnum (which however should be ignored due to e_shentsize being 0).

Moving the code inside the header is slightly more difficult, as it's 3 bytes larger, but that part of header is just as big as in the 32 bit case. We overcome this by starting 2 bytes earlier, overwriting the padding byte (ok) and the ABI version field (not ok, but still works).

So, we reach:

bits 64
            org 0x08048000

ehdr:                                           ; Elf64_Ehdr
            db  0x7F, "ELF", 2, 1,              ;   e_ident
_start:
            mov di,42        ; only the low byte of the exit code is kept,
                            ; so we can use di instead of the full edi/rdi
            xor eax,eax
            mov al,60        ; shorter than mov eax,60
            syscall          ; perform the syscall
            dw  2                               ;   e_type
            dw  62                              ;   e_machine
            dd  1                               ;   e_version
            dq  _start                          ;   e_entry
            dq  phdr - $$                       ;   e_phoff
            dq  0                               ;   e_shoff
            dd  0                               ;   e_flags
            dw  ehdrsize                        ;   e_ehsize
            dw  phdrsize                        ;   e_phentsize
phdr:                                           ; Elf64_Phdr
            dw  1                               ;   e_phnum         p_type
            dw  0                               ;   e_shentsize
            dw  5                               ;   e_shnum         p_flags
            dw  0                               ;   e_shstrndx
ehdrsize    equ $ - ehdr
            dq  0                               ;   p_offset
            dq  $$                              ;   p_vaddr
            dq  $$                              ;   p_paddr
            dq  filesize                        ;   p_filesz
            dq  filesize                        ;   p_memsz
            dq  0x1000                          ;   p_align

phdrsize    equ     $ - phdr
filesize    equ     $ - $$

which is 112 bytes long.

Here I stop for the moment, as I don't have much time for this right now. You now have the basic layout with the relevant modifications for 64 bit, so you just have to experiment with more audacious overlaps

Zoltai answered 19/11, 2018 at 22:25 Comment(8)

If you're golfing for code-size and you still want to _exit(42) instead of xor edi,edi like a normal person, you'd use push 42/pop rdi (3 bytes) instead of a 4-byte 66 mov-di imm16. And then a 3-byte lea eax, [rdi - 42 + 60] or another push/pop. Tips for golfing in x86/x64 machine code. Of course in practice Linux does zero all the registers before process startup. Depending on your golfing rules, you might take advantage. (codegolf.SE only requires that code work on at least one implementation, not necessarily all.) – Panicstricken 19/11, 2018 at 22:50

To set only the low byte, another option is mov al,42 (2 bytes) /xchg eax,edi (1 byte). – Panicstricken 19/11, 2018 at 22:54

@PeterCordes: argh the usual push/pop trick, I keep forgetting it... probably it's because I usually golf in 16 bit x86, where they aren't as useful (except for segment registers). _exit(42) is there to match the original, otherwise I would have just made it exit with whatever happened to be in rdi :-D. Unfortunately, as this is not a "regular" code-golf, there aren't really well-defined rules... – Zoltai 19/11, 2018 at 23:11

I am at 9 Bytes with use64; xor edi, edi; mov al, 42; xchg eax, edi; mov al, 60; syscall? – Bedsore 20/11, 2018 at 23:13

@sivizius: you can get to 8 (3+1+3+1+2) using the tricks from @PeterCordes (push 42; pop rdi; push 60; pop rax; syscall) – Zoltai 20/11, 2018 at 23:46

I'm curious: perhaps you know why the program SECTION .text global _start _start: mov eax, 1 mov ebx, 0 int 80H is 492 bytes long if all it does is exit immediatelly? – Seedman 18/7, 2020 at 23:4

@mercury0114: the code itself is 12 bytes, the rest is various headers, the symbol table, the definition of other standard executable sections and stuff like that. Assembling your code with nasm -felf and linking it with ld -m elf_i386 I get 484 bytes, doing strip -s over the resulting binary gets down to 248 (you can get an idea of the content before/after using objdump -x -D). – Zoltai 19/7, 2020 at 10:52

In the first example, why is p_offset is 0? Shouldn't it be 120? – Bimolecular 30/1, 2021 at 23:10

Updated Answer

After seeing the tricks used in @Matteo Italia's answer, I found it's possible to reach 112 bytes since we can not only hide the string but also the code in the EFL header.

Explanations: The key idea is hiding everthing to the header, including string "Hello World!\n" and the code to print the string. We should first test what part of the header is modifiable (aka modify the value and the program can still be executed). Then, we hide our data and code in header as following code shows: (compile with command nasm -f bin ./x.asm)

This source code is based on @Matteo Italia's answer but completes the part he didn't show, of printing Hello World as well as exiting. There doesn't seem to be a way to make it any shorter; the kernel requires the file to be big enough to contain the ELF headers.
This version has some nop instructions in other space that's available for use inside / between the ELF headers which we can't avoid. We still have space to waste in p_paddr and p_align.

bits 64
            org 0x08048000

ehdr:                                           ;   Elf64_Ehdr
            db  0x7F, "ELF",                    ;   e_ident
_start:
            mov dl, 13
            mov esi,STR
            pop rax
            syscall
            jmp _S0
            dw  2                               ;   e_type
            dw  62                              ;   e_machine
            dd  0xff                            ;   e_version
            dq  _start                          ;   e_entry
            dq  phdr - $$                       ;   e_phoff
STR:
            db "Hello Wo"                       ;   e_shoff
            db "rld!"                           ;   e_flags
            dw  0x0a                            ;   e_ehsize, ther place where we hide the next line symbol
            dw  phdrsize                        ;   e_phentsize
phdr:                                           ;   Elf64_Phdr
            dw  1                               ;   e_phnum         p_type
            dw  0                               ;   e_shentsize
            dw  5                               ;   e_shnum         p_flags
            dw  0                               ;   e_shstrndx
ehdrsize    equ $ - ehdr
            dq  0                               ;   p_offset
            dq  $$                              ;   p_vaddr
_S0:
            nop                  ; unused space for more code
            nop
            nop
            nop
            nop                                 
            nop                                 
            jmp _S1                             ;   p_paddr, These 8 bytes belong to p_paddr, I nop them to show we can add some asm code here
            dq  filesize                        ;   p_filesz
            dq  filesize                        ;   p_memsz
_S1:
            mov eax,60 ; p_align[0:5]
            syscall    ; p_align[6:7]
            nop        ; p_align[7:8]

phdrsize    equ     $ - phdr
filesize    equ     $ - $$

Original Post:

I have a 129-byte x64 "Hello World!".

Step1. Compile the following asm code with nasm -f bin hw.asm

; hello_world.asm
  BITS 64
  org 0x400000

  ehdr:           ; Elf64_Ehdr
    db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
    times 8 db 0
    dw  2         ; e_type
    dw  0x3e      ; e_machine
    dd  1         ; e_version
    dq  _start    ; e_entry
    dq  phdr - $$ ; e_phoff
    dq  0         ; e_shoff
    dd  0         ; e_flags
    dw  ehdrsize  ; e_ehsize
    dw  phdrsize  ; e_phentsize
  phdr:           ; Elf64_Phdr
    dd  1         ; e_phnum      ; p_type
                  ; e_shentsize
    dd  5         ; e_shnum      ; p_flags
                  ; e_shstrndx
  ehdrsize  equ  $ - ehdr
    dq  0         ; p_offset
    dq  $$        ; p_vaddr
    dq  $$        ; p_paddr
    dq  filesize  ; p_filesz
    dq  filesize  ; p_memsz
    dq  0x1000    ; p_align
  phdrsize  equ  $ - phdr
  
  _start:
    ; write "Hello World!" to stdout
    pop rax
    mov dl, 60
    mov esi, hello
    syscall
    syscall

  hello: db "Hello World!", 10 ; 10 is the ASCII code for newline

  filesize  equ  $ - $$

Step2. Modify it with following python script

from pwn import *
context.log_level='debug'
context.arch='amd64'
context.terminal = ['tmux', 'splitw', '-h', '-F' '#{pane_pid}', '-P']
with open('./hw','rb') as f:
    pro = f.read()
print(len(pro))
pro = list(pro)
cut = 0x68
pro[0x18]  = cut
pro[0x74]  = 0x7c-(0x70-cut)
pro = pro[:cut]+pro[0x70:]
print(pro)
x = b''
for _ in pro:
    x+=_.to_bytes(1,'little')
with open("X",'wb') as f:
    f.write(x)

You should a 129-byte "Hello World".

[18:19:02] n132 :: xps  ➜  /tmp » strace ./X
execve("./X", ["./X"], 0x7fffba3db670 /* 72 vars */) = 0
write(0, "Hello World!\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 60Hello World!
) = 60
exit(0)                                 = ?
+++ exited with 0 +++
[18:19:04] n132 :: xps  ➜  /tmp » ./X
Hello World!
[18:19:11] n132 :: xps  ➜  /tmp » ls -la ./X
-rwxrwxr-x 1 n132 n132 129 Jan 29 18:18 ./X

Pyridine answered 30/1 at 1:40 Comment(15)

What changes does the Python code make, and why do that with Python instead of NASM macros + directives? Clever trick, though to write with a length of 60 = __NR_exit including trailing garbage so the return value is the call number for the next syscall. And to use rax = argc as __NR_write. This also depends on stdin (fd 0) being a read-write FD that's open on the terminal, since you write(0, hello, 60). – Panicstricken 30/1 at 1:53

This program doesn't respect ./hello > /dev/null, but breaks if you close or redirect stdin. Which is fine, it still works in a normal terminal, but worth at least a mention in the comments to document that it's intentionally writing stdin to save bytes (because Linux initializes register values to 0 in a freshly-execed process.) – Panicstricken 30/1 at 1:56

You are right. I used stdin to save bytes as well as truncate the ELF header. I don't know how to use nasm to do that so I just used Python and find at most we can ignore the last 8 bytes in the header. Also, we can hide the string "Hellow World!\n" in the ELF header. I got a 118-byte Hello World by utilizing this skill. (For this case I have to set RDX for SYS_Write and RAX for SYS_exit correctly since there are non-zero bytes after "Hello World\n". ) It's still not hard to make it smaller than 118 bytes. – Pyridine 31/1 at 3:44

You can't truncate or overwrite bytes you've already emitted with NASM, so just comment out the dq 0x1000 ; p_align line to not emit those 8 bytes in the first place. (Leaving it there commented out is a good way to document what you're doing, along with other comments. Unlike your Python code full of magic numbers with no comments.) – Panicstricken 31/1 at 3:54

since there are non-zero bytes after "Hello World\n" - can you put the code inside the ELF header instead of the string? The string is 13 bytes, the machine code is 12. Or do the bytes need to have certain values, and re-ordering your asm instructions can't achieve that? – Panicstricken 31/1 at 3:59

@PeterCordes I post a new answer (112 bytes) which is based on Matteo Italia's answer. 112 bytes should be the limit if we don't bypass the checks of ELF header. – Pyridine 31/1 at 5:42

I'll make it more clear @PeterCordes – Pyridine 31/1 at 5:47

Thanks for cleaning up your other posts and editing this one. Why jmp _S0? That could jump directly to _S1, unless that would be more than 127 bytes away and need a 5-byte jmp rel32? Is that what the nop sequence and jmp _S1 is for, to keep the first jump as jmp rel8? But no, the whole executable is only 112 bytes, so jmp _S1 is in range. If there is a reason for _S0: to exists at all (instead of just dq 0 or times 7 db 0 or something, there should be comments explaining it because it's not obvious. – Panicstricken 31/1 at 5:50

If the STR: address had only a single bit set, you could materialize it in a register with bts edi, 24 (4 bytes) instead of mov edi, STR (5 bytes). But I don't think that's possible; the virt address and file offset of a segment mapping must have the same low 12 bits, i.e. the same alignment relative to a page boundary. STR can't be at the start of the file, and we don't want the file to be 4096 bytes long. (But if it was, we could have a mapping starting at (1<<24) - 4096.) – Panicstricken 31/1 at 5:59

I cleaned it up. For jmp _S0, I want to show people we still have enough space to do more things. Answer to the question for _S1: For jmp .+x, if x is less than 0x81, the asm code's length should be 2. For jmp .-x, if x is less than 0x7e, the asm code's length should be 2. By the way, I wrote the verbose code because I want to show people we still have space to waste in p_align and p_paddr. – Pyridine 31/1 at 6:5

Thans for your new comment. I did write verbose code. I did that because the threshold is not the length of asm code. – Pyridine 31/1 at 6:8

Ah, that makes sense. Comments would be a good way to tell future readers about the point of those NOPs. – Panicstricken 31/1 at 6:8

Thanks I'll add that to answer so people can catch that. – Pyridine 31/1 at 6:10

I made an edit to add an actual comment to the asm like I was suggesting, as well as tidy up some of the phrasing. Feel free to edit again to put things into your own words, but I think my phrasing is an improvement. – Panicstricken 31/1 at 6:17

Thanks so much, it looks better. English is not my first language. Thanks for your advise, I learned a lot from the new version. – Pyridine 31/1 at 6:20

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags