Minimal executable size now 10x larger after linking than 2 years ago, for tiny programs?
Asked Answered
M

1

14

For a university course, I like to compare code-sizes of functionally similar programs if written and compiled using gcc/clang versus assembly. In the process of re-evaluating how to further shrink the size of some executables, I couldn't trust my eyes when the very same assembly code I assembled/linked 2 years ago now has grown >10x in size after building it again (which true for multiple programs, not only helloworld):

$ make
as -32 -o helloworld-asm-2020.o helloworld-asm-2020.s
ld -melf_i386 -o helloworld-asm-2020 helloworld-asm-2020.o

$ ls -l
-rwxr-xr-x 1 xxx users  708 Jul 18  2018 helloworld-asm-2018*
-rwxr-xr-x 1 xxx users 8704 Nov 25 15:00 helloworld-asm-2020*
-rwxr-xr-x 1 xxx users 4724 Nov 25 15:00 helloworld-asm-2020-n*
-rwxr-xr-x 1 xxx users 4228 Nov 25 15:00 helloworld-asm-2020-n-sstripped*
-rwxr-xr-x 1 xxx users  604 Nov 25 15:00 helloworld-asm-2020.o*
-rw-r--r-- 1 xxx users  498 Nov 25 14:44 helloworld-asm-2020.s

The assembly code is:

.code32
.section .data
msg: .ascii "Hello, world!\n"
         len = . - msg

.section .text
.globl _start

_start:
        movl $len, %edx   # EDX = message length
        movl $msg, %ecx   # ECX = address of message
        movl $1, %ebx     # EBX = file descriptor (1 = stdout)
        movl $4, %eax     # EAX = syscall number (4 = write)
        int $0x80         # call kernel by interrupt

        # and exit
        movl $0, %ebx     # return code is zero
        movl $1, %eax     # exit syscall number (1 = exit)
        int $0x80         # call kernel again

The same hello world program, compiled using GNU as and GNU ld (always using 32-bit assembly) was 708 bytes then, and has grown to 8.5K now. Even when telling the linker to turn off page alignment (ld -n), it still has almost 4.2K. stripping/sstripping doesn't pay off either.

readelf tells me that the start of section headers is much later in the code (byte 468 vs 8464), but I have no idea why. It's running on the same arch system as in 2018, the Makefile is the same and I'm not linking against any libraries (especially not libc). I guess something regarding ld has changed due to the fact that the object file is still quite small, but what and why?

Disclaimer: I'm building 32-bit executables on an x86-64 machine.

Edit: I'm using GNU binutils (as & ld) version 2.35.1 Here is a base64-encoded archive which includes the source and both executables (small old one, large new one) :

cat << EOF | base64 -d | tar xj
QlpoOTFBWSZTWVaGrEQABBp////xebj/7//Xf+a8RP/v3/rAAEVARARAeEADBAAAoCAI0AQ+NAam
ytMpCGmpDVPU0aNpGmh6Rpo9QAAeoBoADQaNAADQ09IAACSSGUwaJpTNQGE9QZGhoADQPUAA0AAA
AA0aA4AAAABoAAAAA0GgAAAAZAGgAHAAAAANAAAAAGg0AAAADIA0AASJCBIyE8hHpqPVPUPU/VAa
fqn6o0ep6BB6TQaNGj0j1ABobU00yeU9JYiuVVZKYE+dKNa3wls6x81yBpGAN71NoylDUvNryWiW
E4ER8XkfpaJcPb6ND12ULEqkQX3eaBHP70Apa5uFhWNDy+U3Ekj+OLx5MtDHxQHQLfMcgCHrGayE
Dc76F4ZC4rcRkvTW4S2EbJAsbBGbQxSbx5o48zkyk5iPBBhJowtCSwDBsQBc0koYRSO6SgJNL0Bg
EmCoxCDAs5QkEmTGmQUgqZNIoxsmwDmDQe0NIDI0KjQ64leOr1fVk6AaVhjOAJjLrEYkYy4cDbyS
iXSuILWohNh+PA9Izk0YUM4TQQGEYNgn4oEjGmAByO+kzmDIxEC3Txni6E1WdswBJLKYiANdiQ2K
00jU/zpMzuIhjTbgiBqE24dZWBcNBBAAioiEhCQEIfAR8Vir4zNQZFgvKZa67Jckh6EHZWAWuf6Q
kGy1lOtA2h9fsyD/uPPI2kjvoYL+w54IUKBEEYFBIWRNCNpuyY86v3pNiHEB7XyCX5wDjZUSF2tO
w0PVlY2FQNcLQcbZjmMhZdlCGkVHojuICHMMMB5kQQSZRwNJkYTKz6stT/MTWmozDCcj+UjtB9Cf
CUqAqqRlgJdREtMtSO4S4GpJE2I/P8vuO9ckqCM2+iSJCLRWx2Gi8VSR8BIkVX6stqIDmtG8xSVU
kk7BnC5caZXTIynyI0doXiFY1+/Csw2RUQJroC0lCNiIqVVUkTqTRMYqKNVGtCJ5yfo7e3ZpgECk
PYUEihPU0QVgfQ76JA8Eb16KCbSzP3WYiVApqmfDhUk0aVc+jyBJH13uKztUuva8F4YdbpmzomjG
kSJmP+vCFdKkHU384LdRoO0LdN7VJlywJ2xJdM+TMQ0KhMaicvRqfC5pHSu+gVDVjfiss+S00ikI
DeMgatVKKtcjsVDX09XU3SzowLWXXunnFZp/fP3eN9Rj1ubiLc0utMl3CUUkcYsmwbKKrWhaZiLO
u67kMSsW20jVBcZ5tZUKgdRtu0UleWOs1HK2QdMpyKMxTRHWhhHwMnVEsWIUEjIfFEbWhRTRMJXn
oIBSEa2Q0llTBfJV0LEYEQTBTFsDKIxhgqNwZB2dovl/kiW4TLp6aGXxmoIpVeWTEXqg1PnyKwux
caORGyBhTEPV2G7/O3y+KeAL9mUM4Zjl1DsDKyTZy8vgn31EDY08rY+64Z/LO5tcRJHttMYsz0Fh
CRN8LTYJL/I/4u5IpwoSCtDViIA=
EOF

Update: When using ld.gold instead of ld.bfd (to which /usr/bin/ld is symlinked to by default), the executable size becomes as small as expected:

$ cat Makefile 
TARGET=helloworld
all:
    as -32 -o ${TARGET}-asm.o ${TARGET}-asm.s
    ld.bfd -melf_i386 -o ${TARGET}-asm-bfd ${TARGET}-asm.o
    ld.gold -melf_i386 -o ${TARGET}-asm-gold ${TARGET}-asm.o
    rm ${TARGET}-asm.o

$ make -q
$ ls -l
total 68
-rw-r--r-- 1 eso eso   200 Dec  1 13:57 Makefile
-rwxrwxr-x 1 eso eso  8700 Dec  1 13:57 helloworld-asm-bfd
-rwxrwxr-x 1 eso eso   732 Dec  1 13:57 helloworld-asm-gold
-rw-r--r-- 1 eso eso   498 Dec  1 13:44 helloworld-asm.s

Maybe I just used gold previously without being aware.

Monochasium answered 27/11, 2020 at 13:8 Comment(9)
Given that the files are quite small, could you compress and then base64 encode them? Then you could put them directly into the question, making it self-contained (questions depending on external resources are frowned upon on this site). Also mention the exact assembler and linker versions if possible.Electroshock
Depends on your linker version, yes. It's 652 bytes for me with debian ld 2.28. Nowadays .data and .text should not overlap due to no-execute, so that gives you some padding. Since your string is a constant you can put it into .text and that should give you a small size again.Fetch
That padding of course does not result in 10 times growth in general, it's just shocking for such small programs.Fetch
It's not 10x in general, it's page-alignment of a couple sections as Jester says, per changes to ld's default linker script for security reasons. So it has a worst-case upper bound of about 2x 4k pages, and tiny executables are close to that worst case. gcc -Wl,--nmagic will turn off page-alignment of sections if you want that for some reason. (see the ld(1) man page)Schneider
@PeterCordes OP mentioned using -n and that it didn't help quite as much as he'd hoped it would.Electroshock
The question really is why were you expecting the same result with newer/different tools?Lillalillard
I had a chance to look at this. What if you add the option -no-pie to the link options? I noticed the reason it is larger is because there is a global offset table in one which suggests by default one of the environments decided to assume a position independent executable and the other not.Crier
@MichaelPetch: (update: nevermind, they aren't linking libc). I think ld.bfd defaults to -no-pie, but it will convert library calls like call printf to call printf@plt for you, and generate a PLT entry and a corresponding GOT entry. This being 32-bit code, perhaps ld.gold just leaves a runtime fixup for the actual libc printf symbol? I didn't look at the binary, and the OP hasn't provided source. I was assuming they'd use a write system call, not printf. Oh, but since they don't use -lc, that seems unlikely.Schneider
@PeterCordes : if you decode the base64 in the question and unbzip2 and untar the result you can see the source code and he has versions of both ELF executables. An objdump makes the issue pretty apparent. The makefile is in the question.Crier
S
14

It's not 10x in general, it's page-alignment of a couple sections as Jester says, per changes to ld's default linker script for security reasons:

  • First change: Making sure data from .data isn't present in any of the mapping of .text, so none of that static data is available for ROP / Spectre gadgets in an executable page. (In older ld, that meant the program-headers mapped the same disk-block twice, also into a RW-without-exec segment for the actual .data section. The executable mapping was still read-only.)

  • More recent change: Separate .rodata from .text into separate segments, again so static data isn't mapped into an executable page. Previously, const char code[]= {...} could be cast to a function pointer and called, without needing mprotect or gcc -z execstack or other tricks, if you wanted to test shellcode that way. (A separate Linux kernel change made -z execstack only apply to the actual stack, not READ_IMPLIES_EXEC.)

  • See Why an ELF executable could have 4 LOAD segments? for this history, including the strange fact that .rodata is in a separate segment from the read-only mapping for access to the ELF metadata.

That extra space is just 00 padding and will compress well in a .tar.gz or whatever.

So it has a worst-case upper bound of about 2x 4k extra pages of padding, and tiny executables are close to that worst case.

gcc -Wl,--nmagic will turn off page-alignment of sections if you want that for some reason. (see the ld(1) man page) I don't know why that doesn't pack everything down to the old size. Perhaps checking the default linker script would shed some light, but it's pretty long. Run ld --verbose to see it.

stripping won't help for padding that's part of a section; I think it can only remove whole sections.

ld -z noseparate-code uses the old layout, only 2 total segments to cover the .text and .rodata sections, and the .data and .bss sections. (And the ELF metadata that dynamic linking wants access to.)


Related:

Linking with gcc instead of ld

This question is about ld, but note that if you're using gcc -nostdlib, that used to also default to making a static executable. But modern Linux distros config GCC with -pie as the default, and GCC won't make a static-pie by default even if there aren't any shared libraries being linked. Unlike with -no-pie mode where it will simply make a static executable in that case. (A static-pie still needs startup code to apply relocations for any absolute addresses.)

So the equivalent of ld directly is gcc -nostdlib -static (which implies -no-pie). Or gcc -nostdlib -no-pie should let it default to -static when there are no shared libs being linked. You can combine this with -Wl,--nmagic and/or -Wl,-z -Wl,noseparate-code.

Also:

Schneider answered 27/11, 2020 at 13:34 Comment(3)
Thanks Peter, I've bindiff'ed the binaries and you're right that its largely 0x00 padding. I also tried linking on an Ubuntu 20.04 using ld 2.34 with similar results - I'll still wait with marking it solved, in case any concrete solution comes along. Unfortunately, -n/--nmagic doesn't fully shrink it back.Monochasium
I had never bothered to investigate this until a little while ago. One big difference is that the executables that the OP encoded into the archive in the question show that one of the versions (ld.bfd) generated an extra section with a global offset table in it. I suggested in a comment that they attempt to link with with the -no-pie option and see what happens.Crier
Also related: GCC is generating binaries filled with zeroes, and another example case (with C and asm linked together): gcc "hello world" 10x larger binary than expectedSchneider

© 2022 - 2024 — McMap. All rights reserved.