Why GCC does not do function dead code elimination with LTO when compiling the object file with -O0?
Asked Answered
N

1

1

Example:

notmain.c

int __attribute__ ((noinline)) notmain(int i) {
    return i + 1;
}

int notmain2(int i) {
    return i + 2;
}

main.c

int notmain(int);

int main(int argc, char **argv) {
    return notmain(argc);
}

I use noinline to ensure that what happens is not a secondary effect of whether notmain is inlined or not.

Compile and disassemble with -O1:

gcc -c -flto -O1 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

Outcome: notmain present and notmain2 not present:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       e9 f0 00 00 00          jmp    1139 <notmain>
    1049:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000001139 <notmain>:
    1139:       8d 47 01                lea    0x1(%rdi),%eax
    113c:       c3                      ret

However if I instead do:

gcc -c -flto -O0 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

then both are present:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       e9 f0 00 00 00          jmp    1139 <notmain>
    1049:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000001139 <notmain>:
    1139:       f3 0f 1e fa             endbr64
    113d:       55                      push   %rbp
    113e:       48 89 e5                mov    %rsp,%rbp
    1141:       89 7d fc                mov    %edi,-0x4(%rbp)
    1144:       8b 45 fc                mov    -0x4(%rbp),%eax
    1147:       83 c0 01                add    $0x1,%eax
    114a:       5d                      pop    %rbp
    114b:       c3                      ret

000000000000114c <notmain2>:
    114c:       f3 0f 1e fa             endbr64
    1150:       55                      push   %rbp
    1151:       48 89 e5                mov    %rsp,%rbp
    1154:       89 7d fc                mov    %edi,-0x4(%rbp)
    1157:       8b 45 fc                mov    -0x4(%rbp),%eax
    115a:       83 c0 02                add    $0x2,%eax
    115d:       5d                      pop    %rbp
    115e:       c3                      ret

So my question is what does -O1 change in the notmain.o object file that leads to the optimization not being done?

Interestingly I also tried to bisect which exact optimization from -O1 leads to this. man gcc lists all the flags that -O1 enables:

gcc -c -flto -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse -fforward-propagate -fguess-branch-probability -fif-conversion -fif-conversion2 -finline-functions-called-once -fipa-modref -fipa-profile -fipa-pure-const -fipa-reference -fipa-reference-addressable -fmerge-constants -fmove-loop-invariants -fmove-loop-stores -fomit-frame-pointer -freorder-blocks -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types -fssa-backprop -fssa-phiopt -ftree-bit-ccp -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-pta -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra -ftree-ter -funit-at-a-time notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

but notmain2 is still present.

I tired to observe the LTO with:

lto-dump -dump-body=notmain2 notmain.o

but I don't see anything that clearly would make a difference, with -O1:

Gimple Body of Function: notmain2
int notmain2 (int i)
{
  int _2;

  <bb 2> [local count: 1073741824]:
  _2 = i_1(D) + 2;
  return _2;

}

with -O0:

Gimple Body of Function: notmain2
int notmain2 (int i)
{
  int D.4724;
  int _2;

  <bb 2> :
  _2 = i_1(D) + 2;

  <bb 3> :
<L0>:
  return _2;

}

Tested on Ubuntu 23.04, GCC 12.2.0.

Nichrome answered 17/7, 2023 at 7:59 Comment(4)
I use noinline to ensure the attribute is not visible when using the function, so it does not affect anything. Why do you care about that a function is not removed when compiling with -O0?Protochordate
I am not sure I understand the question. Are you asking why -O0 -- "disable all optimizations" -- does not optimize away dead code?Muticous
@Muticous yes, but -O0 is set on the object file, not at final link. Final link does get -O3 and could therefore do dead function elimination. I don't understand how it is that the produced notmain.o file is different between -O0 and -O1.Nichrome
@Protochordate I want to understand the link process better, and how the produced notmain.o objects differ specifically. There isn't a huge use case besides understanding better.Nichrome
C
0

If all else fails, read the manual.

Note that it is generally ineffective to specify an optimization level option only at link time and not at compile time, for two reasons. First, compiling without optimization suppresses compiler passes that gather information needed for effective optimization at link time. Second, some early optimization passes can be performed only at compile time and not at link time.

Conductor answered 17/7, 2023 at 8:30 Comment(2)
While this touches my question, I would hope for something more specific than this. What more specifically is the difference between notmain.o on -O1 vs -O0. that leads to DCE vs non DCE in the final link? Can that difference be observed with objdump or lto-dump or some other tool?Nichrome
@CiroSantilliOurBigBook.com I'm afraid you will have to do this kind of research yourself. The public documentation is sparse, so you will have to read the source, or find someone who wrote that source and can share some insights. Most of us just rely on the documentation which clearly says "don't do that".Conductor

© 2022 - 2024 — McMap. All rights reserved.