GCC is generating binaries filled with zeroes
Asked Answered
S

3

6

I'm trying to figure out why the binaries generated by GCC are so large.

Consider this empty program:

int main() {
    return 0;
}

Now I build it with GCC 9.2.1 20190827 (Red Hat 9.2.1-1) and glibc 2.29 without any additional parameters:

gcc -o test test.c

The resulting binary is 21984 bytes (~22 KB). Looking at the generated file with xxd, there are long runs of null-bytes in multiple places:

00000370: 006c 6962 632e 736f 2e36 005f 5f6c 6962  .libc.so.6.__lib
00000380: 635f 7374 6172 745f 6d61 696e 0047 4c49  c_start_main.GLI
00000390: 4243 5f32 2e32 2e35 005f 5f67 6d6f 6e5f  BC_2.2.5.__gmon_
000003a0: 7374 6172 745f 5f00 0000 0200 0000 0000  start__.........
000003b0: 0100 0100 0100 0000 1000 0000 0000 0000  ................
000003c0: 751a 6909 0000 0200 1d00 0000 0000 0000  u.i.............
000003d0: f03f 4000 0000 0000 0600 0000 0100 0000  .?@.............
000003e0: 0000 0000 0000 0000 f83f 4000 0000 0000  .........?@.....
000003f0: 0600 0000 0200 0000 0000 0000 0000 0000  ................
00000400: 0000 0000 0000 0000 0000 0000 0000 0000  ................
<3040 bytes of zeroes>
00000ff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001000: f30f 1efa 4883 ec08 488b 05e9 2f00 0048  ....H...H.../..H
<not zeroes>
00001190: f30f 1efa c300 0000 f30f 1efa 4883 ec08  ............H...
000011a0: 4883 c408 c300 0000 0000 0000 0000 0000  H...............
000011b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
<3632 bytes of zeros>
00001ff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00002000: 0100 0200 0000 0000 0000 0000 0000 0000  ................
00002010: 011b 033b 3400 0000 0500 0000 10f0 ffff  ...;4...........
<not zeroes>
000020e0: 410e 2842 0e20 420e 1842 0e10 420e 0800  A.(B. B..B..B...
000020f0: 1000 0000 ac00 0000 98f0 ffff 0500 0000  ................
00002100: 0000 0000 0000 0000 0000 0000 0000 0000  ................
<3376 bytes of zeroes>
00002e40: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00002e50: 0011 4000 0000 0000 d010 4000 0000 0000  ..@.......@.....
...

So the resulting binary has around 10 KB, or almost half, of nothing in it.

Looking with size -A, the size is more like what one would expect from a program doing nothing else than returning an exit code:

test  :
section                 size      addr
.interp                   28   4194984
.note.ABI-tag             32   4195012
.note.gnu.build-id        36   4195044
.gnu.hash                 28   4195080
.dynsym                   72   4195112
.dynstr                   56   4195184
.gnu.version               6   4195240
.gnu.version_r            32   4195248
.rela.dyn                 48   4195280
.init                     27   4198400
.text                    373   4198432
.fini                     13   4198808
.rodata                   16   4202496
.eh_frame_hdr             52   4202512
.eh_frame                192   4202568
.init_array                8   4210256
.fini_array                8   4210264
.dynamic                 400   4210272
.got                      16   4210672
.got.plt                  24   4210688
.data                      4   4210712
.bss                       4   4210716
.comment                  44         0
.gnu.build.attributes   4472   4218912
Total                   5991

When cross-compiling for PowerPC using GCC 9.2.0 and musl 1.1.23 it's even worse. Size of the binary grows to 67872 bytes (~67 KB), and looking with xxd, there is a continuous run of 64074 bytes of only zeroes.

Still, size -A reports even smaller sizes for this version:

test  :
section              size        addr
.interp                26   268435796
.note.gnu.build-id     36   268435824
.hash                  36   268435860
.dynsym                64   268435896
.dynstr                39   268435960
.rela.plt              12   268436000
.init                  28   268436012
.text                 496   268436048
.fini                  28   268436544
.eh_frame_hdr          28   268436572
.eh_frame              80   268436600
.init_array             4   268566284
.fini_array             4   268566288
.dynamic              216   268566292
.branch_lt              8   268566508
.got                   12   268566516
.plt                    4   268566528
.data                   4   268566532
.bss                   28   268566536
.comment               17           0
Total                1170

I also tried to compile the program with an old version of GCC which I happened to have handy: GCC 4.7.2 with uClibc 1.0.12. With this combination, the resulting binary is only 4769 bytes (~4 KB), and has no apparent runs of null-bytes in it.

Just to make sure that this doesn't only happen on tiny programs that do nothing, I looked at some real programs that I have cross-compiled with GCC 9.2.0 and musl 1.1.23. For example, tcpdump binary, compiled using -Os and stripped, contains a 32628 byte long continous run of null-bytes. So, why are zeroes trying to consume all of my disk space?

Slosberg answered 25/11, 2019 at 6:19 Comment(3)
What did your researches with objdump and the cross reference of the linker reveal? Did you try to strip debug sections?Mantilla
Try compiling with -Os -flto -ffunction-sections -fdata-sections and linking with -flto -Wl,--gc-sections. That should get rid of some unneeded things.Winwaloe
Although not the actual reason for the zeros, --gc-sections did snip a bit off the binary size. I might enable these to build the system.Slosberg
S
3

Answer from Florian Weimer helped me to the right direction. The culprit was not -z separate-code, but -z relro.

By adding -Wl,-z,norelro to PowerPC GCC options, file size for an empty program dropped from 67872 bytes to 3772 bytes! On x64 the impact was smaller: from 21984 to 18584 bytes. On a small, but actually functional, program the difference on PowerPC was around 50 % smaller, and with tcpdump, which I compared before, it's almost 32 KB.

The relro linker option apparently creates a new segment, which is used to remap the global offset table and mark it as read-only, which protects the program from stack overflowing attack. This explanation is most likely inaccurate; I didn't understand much of what I read while trying to figure it out.

The size difference on PPC is exactly 62 KB. Why such a large area is created, I have no idea.

Although the setting would be good to be kept enabled as a hardening measure, unfortunately my target board has only 11 MB of available flash, and I'm trying to fit a Linux-based system on it, so every byte counts, and I will disable the setting to keep the binary sizes down.

Slosberg answered 28/11, 2019 at 8:17 Comment(1)
This reminds me of another issue with gcc. If your program has say: static int g[1000000] = { [999999] = 1 }; then the compiler will literally make a 4MB chunk in the executable filled with 3.9999 MB of zeroes and the 1 at the end. Whereas the icc compiler actually compresses the initial data chunk with lzma and unpacks it on program startup.Guarino
P
7

Recent binutils defaults to -z separate-code, which adds additional PT_LOAD segments to the program which need further alignment.

You can override the default like this:

gcc -Wl,-z,noseparate-code -o test test.c

Due to alignment requirements, some zeros will still remain with this change.

Propagation answered 25/11, 2019 at 9:5 Comment(1)
This was actually enabled (or separate-code was disabled, rather) on PowerPC, but you answer pushed me to look from the right direction. See accepted answer.Slosberg
S
3

Answer from Florian Weimer helped me to the right direction. The culprit was not -z separate-code, but -z relro.

By adding -Wl,-z,norelro to PowerPC GCC options, file size for an empty program dropped from 67872 bytes to 3772 bytes! On x64 the impact was smaller: from 21984 to 18584 bytes. On a small, but actually functional, program the difference on PowerPC was around 50 % smaller, and with tcpdump, which I compared before, it's almost 32 KB.

The relro linker option apparently creates a new segment, which is used to remap the global offset table and mark it as read-only, which protects the program from stack overflowing attack. This explanation is most likely inaccurate; I didn't understand much of what I read while trying to figure it out.

The size difference on PPC is exactly 62 KB. Why such a large area is created, I have no idea.

Although the setting would be good to be kept enabled as a hardening measure, unfortunately my target board has only 11 MB of available flash, and I'm trying to fit a Linux-based system on it, so every byte counts, and I will disable the setting to keep the binary sizes down.

Slosberg answered 28/11, 2019 at 8:17 Comment(1)
This reminds me of another issue with gcc. If your program has say: static int g[1000000] = { [999999] = 1 }; then the compiler will literally make a 4MB chunk in the executable filled with 3.9999 MB of zeroes and the 1 at the end. Whereas the icc compiler actually compresses the initial data chunk with lzma and unpacks it on program startup.Guarino
S
1

So, why are zeroes trying to consume all of my disk space?

Because on most modern systems 22K extra bytes on disk are immaterial.

Some of the costs you observe are due to dynamic linking, some due to padding, some are to help you with debugging (e.g. .comment, .note.gnu.build-id, .eh_frame*.).

I can get the binary down to 624 bytes by not using libc and linking statically and stripping:

cat t.c
void _start()
{
  __asm__("movq $60,%rax; xorq %rdi,%rdi; syscall");
}

gcc -O3 t.c -static -nostdlib -Wl,-z,noseparate-code,--build-id=none &&
strip --strip-all a.out &&
./a.out && ls -l a.out
-rwxr-x--- 1 me mygroup 624 Nov 25 19:34 a.out

There is still .comment and .eh_frame which could be removed.

Shively answered 26/11, 2019 at 3:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.