Why are my C++ binary built with -LTO so very large?
Asked Answered
D

1

8

I'm compiling some binaries on Mac, but the compiled size has become huge with more recent compiler (up to ~20MB from ~5MB before). I think it's related to LTO (link time optimization) that was not activated before. I do not observe this file bloat on linux.

After playing around with strip (practically no reduction in size, despite trying Xcode based with flags -S -x and also no flags, and GNU libtools strip porvided by homebrew binutils recipe with flag -s, all of these seem to have the same effect) I found this tool : https://github.com/google/bloaty Bloaty McBloated, when run on my binary it produces this output :

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  53.9%  9.72Mi  53.8%  9.72Mi    __GNU_LTO,__wrapper_sects
  32.5%  5.86Mi  32.4%  5.86Mi    __GNU_DWARF_LTO,__debug_info
   6.2%  1.11Mi   6.2%  1.11Mi    __TEXT,__text
   2.2%   403Ki   2.2%   403Ki    __TEXT,__eh_frame
   1.6%   298Ki   1.6%   298Ki    __GNU_LTO,__wrapper_names
   1.0%   177Ki   1.0%   177Ki    Export Info
   0.7%   131Ki   0.7%   131Ki    Weak Binding Info
   0.4%  77.0Ki   0.4%  77.0Ki    __GNU_DWARF_LTO,__debug_str
   0.4%  75.8Ki   0.4%  75.8Ki    __DATA,__gcc_except_tab
   0.2%  44.6Ki   0.2%  44.6Ki    __GNU_LTO,__wrapper_index
   0.2%  39.4Ki   0.2%  39.4Ki    __DATA_CONST,__const
   0.2%  33.1Ki   0.2%  33.1Ki    __GNU_DWARF_LTO,__debug_abbrev
   0.1%  26.4Ki   0.1%  26.4Ki    __GNU_DWARF_LTO,__debug_line
   0.1%  21.7Ki   0.1%  23.6Ki    [20 Others]
   0.1%  19.0Ki   0.1%  19.0Ki    __TEXT,__text_cold
   0.1%  18.1Ki   0.1%  18.1Ki    __TEXT,__const
   0.0%  8.82Ki   0.0%  8.82Ki    __TEXT,__text_startup
   0.0%  8.60Ki   0.0%  8.60Ki    __TEXT,__cstring
   0.0%       0   0.0%  7.18Ki    __DATA,__pu_bss5
   0.0%       0   0.0%  6.88Ki    __DATA,__bss5
   0.0%  5.87Ki   0.0%  5.87Ki    __DATA,__la_symbol_ptr
 100.0%  18.1Mi 100.0%  18.1Mi    TOTAL

So can anyone tell me what these huge *_LTO sections are for, and how do I get rid of them, by post-processing or adding compilation flags to my build chain.

OS is MacOS, I'm using g++ 10, a full trace is here : https://github.com/yanntm/testGithbuActions/runs/1778387086?check_suite_focus=true

I'm trying to compile static as much as possible for better portability. The binary however is still dynamically linked to /usr/lib/libSystem.B.dylib (I can't statically link this one apparently with libtool).

I don't want any debug symbols as this is a production binary meant for end-users.

Dastard answered 27/1, 2021 at 18:3 Comment(4)
what strip options have you tried?Sodamide
@AlanBirtles I edited to add what I tried, both xcode strip and gnu version (since they have different flags). The first strip call does go from 19MB to 18MB, so it does do something. But the binary size is not in symbols as the dump from bloaty shows.Dastard
Which version of Xcode? What are your compile flags for your source? What are your link flags for your linker? What does g++ --version output? And curious, why not disable LTO?Bumbling
@Bumbling So it's standard containers offered by Github actions. Xcode_12.2, g++ from brew 10.2.0_2. Flags are -DNDEBUG -O3 at compile and -O3 -all-static -static-libgcc -static-libstdc++ at link. I tried adding "-flto" and "-fno-lto" to both compile and link. In both cases the binary is still 19MB with these LTO sections, I feel like it's ignoring these flags due to some environment configuration.Dastard
G
7

You will find the answer in gcc's documentation:

Link time optimization is implemented as a GCC front end for a bytecode representation of GIMPLE that is emitted in special sections of .o files.

[ ... ]

Since GIMPLE bytecode is saved alongside final object code, object files generated with LTO support are larger than regular object files.

[ ... ]

The current implementation only produces “fat” objects, effectively doubling compilation time and increasing file sizes up to 5x the original size.

But wait, there's more. You built only with -flto. Had you also used -ffat-lto-objects, then, as explained in gcc's info page:

'-ffat-lto-objects'

Fat LTO objects are object files that contain both the intermediate language and the object code. This makes them usable for both LTO linking and normal linking. This option is effective only when compiling with '-flto' and is ignored at link time.

Attempts to use strip will be in vain. strip only strips out debug data. This is not debug data, but, basically, halfway-compiled C++ code, with the final compilation happening as part of the link cycle. If you want to "get rid of them", don't use LTO.

EDIT: it's possible that some gcc/binutils configuration will leave LTO sections in the target binary. I looked into into Fedora's default rpmbuild configuration does, which builds with LTO by default but does not suffer from the same executable bloat.

It turns out that Fedora's rpmbuild executes a brp-strip-lto script that boils down to this:

sh -c "$STRIP -p -R .gnu.lto_* -R .gnu.debuglto_* -N __gnu_lto_v1 \"\$@\"" ARG0

The key options are the two -R options, it's unclear what the __gnu_lto_v1 symbol is, that gets removed by -N.

Grantee answered 27/1, 2021 at 22:8 Comment(6)
Based on the cited quotes, the bloat should only be in the intermediary binaries, not in the final build target binary. Correct?Bumbling
There might be options or settings to control that. LTO is enabled by default on Fedora 33's build toolchain, but the final binaries are not excessively large. But if LTO metadata does make it in the binary, strip won't touch it.Grantee
Thanks for the explanation of what it is, but as @Bumbling says, it seems that the final target should not contain these sections any more. Perhaps they remain because full static link was impossible due to OSX libSystem being unavailable as static lib ?Dastard
Might be some OS-specific stuff, I added some more info I dug up after perusing what Fedora does in its default LTO-enabled build configuration.Grantee
Thank you for your time, I'll try some similar strip -R flags on my binaries and post back. And yes the problem is I think specific to Mac, I'm using the same Gnu toolchain as on linux (instead of e.g clang) but the LTO issues only appear on Mac. It also seems that your statement "strip only strips out debug data" might be inexact seeing what that fedora build does, so we are all learning something :DDastard
You might like to use clang. In my experience, it does not suffer from this problem.Minuscule

© 2022 - 2024 — McMap. All rights reserved.