Does GCC LTO perform cross-file dead code elimination?
Asked Answered
N

2

3

Say I have a function

void do_something() {
    //....
    #ifdef FEATURE_X
        feature_x();
    #endif
    //....
}

I can compile and run this with no problems; if I want the feature I can pass -D FEATURE_X and it works.

However, what if I would like to put do_something into another file (And not have to recompile that file as well each time I decide to change the option). If it was in the same file, I assume that

const int FEATURE_X=0;

void do_something() {
    //....
    if(FEATURE_X) {
        feature_x();
    }
    //....
}

will use dead code elimination properly, eliminating the call. If I put this in another file, without LTO,

extern const int FEATURE_X;

void do_something() {
    //....
    if(FEATURE_X) {
        feature_x();
    }
    //....
}

It will not remove the code (It has no way of knowing). So, with link time optimization enabled, can the compiler detect the value of FEATURE_X at link time, determine if the code is used or not, and remove it if appropriate?

Nebo answered 3/10, 2012 at 19:16 Comment(0)
T
7

GCC does cross module unreachable function removal, but it will not be able to determine the code is dead in your last testcase, because the constant value of FEATURE_X will be determined too late.

If you will use -D way or put your const int FEATURE_X=0; into every module then yes, the code will be eliminated.

Transfix answered 7/11, 2012 at 17:54 Comment(2)
Awesome, thanks. As a note, using backticks instead of regular quotes causes the surrounded text to be "code" typeset (monospace, etc).Nebo
It seems that recent versions of gcc do determine the constant value in time to eliminate dead code now.Dahl
E
0

Example showing that LTO leads to dead code elimination

Test setup:

notmain.c

int notmain(int i) {
    return i + 1;
}

int notmain2(int i) {
    return i + 2;
}

main.c

int notmain(int);

int main(int argc, char **argv) {
    return notmain(argc);
}

Control experiment without LTO

Compile and disassemble without LTO:

gcc -O3 -c notmain.c
gcc -O3 notmain.o main.c
objdump -d a.out

The output contains:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       e9 f7 00 00 00          jmp    1140 <notmain>
    1049:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000001140 <notmain>:
    1140:       f3 0f 1e fa             endbr64
    1144:       8d 47 01                lea    0x1(%rdi),%eax
    1147:       c3                      ret
    1148:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
    114f:       00

0000000000001150 <notmain2>:
    1150:       f3 0f 1e fa             endbr64
    1154:       8d 47 02                lea    0x2(%rdi),%eax
    1157:       c3                      ret

so the useless notmain2 was not removed.

We can also look at the object size:

size a.out

which outputs:

   text    data     bss     dec     hex filename
   1304     544       8    1856     740 a.out

Furthermore, as a bonus, we note that the function call is not inlined:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       e9 f7 00 00 00          jmp    1140 <notmain>
    1049:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

Observe LTO doing DCE

gcc -c -flto -O3 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

The output does not contain neither the notmain not the notmain2 symbols. Everything is fully inlined into main, which in a single instruction adds 1 to rdi, the first argument, and puts it into the return register eax:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       8d 47 01                lea    0x1(%rdi),%eax
    1047:       c3                      ret
    1048:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)

Inlining also mentioned at: Link-time optimization and inline

Beauty. Checking size:

size a.out

outputs:

   text    data     bss     dec     hex filename
   1217     544       8    1769     6e9 a.out

and we see that the text size is smaller as desired due to inline and dead code elimination.

LTO does DCE even when inlining doesn't happen

On the above example, it is not clear if function DCE elimination happens only when inlining is involved or not. So let's test it out with:

int __attribute__ ((noinline)) notmain(int i) {
    return i + 1;
}

Compile and disassemble:

gcc -c -flto -O3 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

The output contains:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       e9 f7 00 00 00          jmp    1140 <notmain>
    1049:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000001140 <notmain>:
    1140:       8d 47 01                lea    0x1(%rdi),%eax
    1143:       c3                      ret

and no notmain2. Therefore, the useless notmain2 was removed even though notmain wasn't.

Function removal does not happen when notmain.c is compiled with -O0

I don't understand why exactly: Why GCC does not do function dead code elimination with LTO when compiling the object file with -O0?

Tested on Ubuntu 23.04 amd64, GCC 12.2.0.

Emad answered 15/7, 2023 at 8:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.