Why do common C compilers include the source filename in the output?
Asked Answered
H

2

10

I have learnt from this recent answer that gcc and clang include the source filename somewhere in the binary as metadata, even when debugging is not enabled.

I can't really understand why this should be a good idea. Besides the tiny privacy risks, this happens also when one optimizes for the size of the resulting binary (-Os), which looks inefficient.

Why do the compilers include this information?

Hedgerow answered 5/9, 2015 at 12:41 Comment(5)
It's not just GCC, Clang does it (and any compiler toolchain that makes ELF binaries which follow the specification).Badoglio
@Badoglio I admit I simply grepped into it instead of reading all the 60 pages, but I found FILE mentioned only on page 25 of that document, and it doesn't say it's mandatory ("Conventionally, the symbol’s name gives the name of the source file associated with the object file").Hedgerow
I didn't read all 60 pages either. But when it comes to standards "conventionally" means "you should probably do this because people might rely on it". At the end of the day, if you're given a spec it's easier to just follow it to the letter (given that your users may decide to use the most esoteric features expressed in the standard) than to try and weasel your way out of implementing things you don't have to. after all GNU is the land of extreme amounts of extra features.Badoglio
However, it probably was too strong to claim that "any" compiler toolchain will implement STT_FILE, it just seems like most popular compilers would because some programmers need that feature.Badoglio
@Badoglio Makes sense -- thanks for your explanations and for the answer.Hedgerow
B
6

The reason why GCC includes the filename is mainly for debugging purposes, because it allows a programmer to identify from which source file a given symbol comes from as (tersely) outlined in the ELF spec p1-17 and further expanded upon in some Oracle docs on linking.

An example of using the STT_FILE section is given by this SO question.

I'm still confused why both GCC and Clang still include it even if you specify -g0, but you can stop it from including STT_FILE with -s. I couldn't find any explanation for this, nor could I find an "official reason" why STT_FILE is included in the ELF specification (which is very terse).

Badoglio answered 5/9, 2015 at 13:42 Comment(0)
U
5

I have learnt from this recent answer that gcc includes the source filename somewhere in the binary as metadata, even when debugging is not enabled.

Not quite. In modern ELF object files the file name indeed is a symbol of type FILE:

$ readelf bignum.o    # Source bignum.c
[...]
Symbol table (.symtab) contains 36 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS bignum.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    8
     9: 00000000000003f0   172 FUNC    GLOBAL DEFAULT    1 add
    10: 00000000000004a0   104 FUNC    GLOBAL DEFAULT    1 copy

However, once stripped, the symbol is gone:

$ strip bignum.o
$ readelf -all bignum.o | grep bignum.c
$

So to keep your privacy, strip the executable, or compile/link with -s.

Uptown answered 5/9, 2015 at 12:55 Comment(4)
Why "not quite"? I still count this as "included in the binary", although you correctly point out that it is embedded in a way that makes it easy to remove it. My question was about the motivation, anyway.Hedgerow
@FedericoPoloni Only a very subtle reason for "not quite": is a symbol table meta data? It is required for a successful link. Debug data however, is not, has its own ELF section and can be removed. It's not really a big issue and shouldn't stand in the way of understanding.Uptown
Although the symbol table in general is required for linking, this particular symbol isn't. So it's essentially using the symbol table as a place to hold some of the metadata.Choric
stripping all of the symbols doesn't seem like a valid solution because then the object file cannot be linked. I've tried using --as-needed as well but it still keeps particular symbols with filenames in them. I think a better solution would be to find a way to prevent them from getting into the .o to begin with.Bumblebee

© 2022 - 2024 — McMap. All rights reserved.