Why GCC does not do function dead code elimination with LTO when compiling the object file with -O0?

Example:

notmain.c

int __attribute__ ((noinline)) notmain(int i) {
    return i + 1;
}

int notmain2(int i) {
    return i + 2;
}

main.c

int notmain(int);

int main(int argc, char **argv) {
    return notmain(argc);
}

I use noinline to ensure that what happens is not a secondary effect of whether notmain is inlined or not.

Compile and disassemble with -O1:

gcc -c -flto -O1 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

Outcome: notmain present and notmain2 not present:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       e9 f0 00 00 00          jmp    1139 <notmain>
    1049:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000001139 <notmain>:
    1139:       8d 47 01                lea    0x1(%rdi),%eax
    113c:       c3                      ret

However if I instead do:

gcc -c -flto -O0 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

then both are present:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       e9 f0 00 00 00          jmp    1139 <notmain>
    1049:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000001139 <notmain>:
    1139:       f3 0f 1e fa             endbr64
    113d:       55                      push   %rbp
    113e:       48 89 e5                mov    %rsp,%rbp
    1141:       89 7d fc                mov    %edi,-0x4(%rbp)
    1144:       8b 45 fc                mov    -0x4(%rbp),%eax
    1147:       83 c0 01                add    $0x1,%eax
    114a:       5d                      pop    %rbp
    114b:       c3                      ret

000000000000114c <notmain2>:
    114c:       f3 0f 1e fa             endbr64
    1150:       55                      push   %rbp
    1151:       48 89 e5                mov    %rsp,%rbp
    1154:       89 7d fc                mov    %edi,-0x4(%rbp)
    1157:       8b 45 fc                mov    -0x4(%rbp),%eax
    115a:       83 c0 02                add    $0x2,%eax
    115d:       5d                      pop    %rbp
    115e:       c3                      ret

So my question is what does -O1 change in the notmain.o object file that leads to the optimization not being done?

Interestingly I also tried to bisect which exact optimization from -O1 leads to this. man gcc lists all the flags that -O1 enables:

gcc -c -flto -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse -fforward-propagate -fguess-branch-probability -fif-conversion -fif-conversion2 -finline-functions-called-once -fipa-modref -fipa-profile -fipa-pure-const -fipa-reference -fipa-reference-addressable -fmerge-constants -fmove-loop-invariants -fmove-loop-stores -fomit-frame-pointer -freorder-blocks -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types -fssa-backprop -fssa-phiopt -ftree-bit-ccp -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-pta -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra -ftree-ter -funit-at-a-time notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

but notmain2 is still present.

I tired to observe the LTO with:

lto-dump -dump-body=notmain2 notmain.o

but I don't see anything that clearly would make a difference, with -O1:

Gimple Body of Function: notmain2
int notmain2 (int i)
{
  int _2;

  <bb 2> [local count: 1073741824]:
  _2 = i_1(D) + 2;
  return _2;

}

with -O0:

Gimple Body of Function: notmain2
int notmain2 (int i)
{
  int D.4724;
  int _2;

  <bb 2> :
  _2 = i_1(D) + 2;

  <bb 3> :
<L0>:
  return _2;

}

Tested on Ubuntu 23.04, GCC 12.2.0.

Recommended topics

Hot tags