gcc vs clang: inlining a function with -fPIC

next(): movq last@GOTPCREL(%rip), %rdx movl (%rdx), %eax addl $1, %eax movl %eax, (%rdx) ret index(int): pushq %rbx movl %edi, %ebx call next()@PLT ## next() not inlined, call through PLT movl %ebx, %ecx sall %cl, %eax popq %rbx ret

next(): # @next() movq last@GOTPCREL(%rip), %rcx movl (%rcx), %eax incl %eax movl %eax, (%rcx) retq index(int): # @index(int) movq last@GOTPCREL(%rip), %rcx movl (%rcx), %eax incl %eax ## next() was inlined! movl %eax, (%rcx) movl %edi, %ecx shll %cl, %eax retq

I don't think the standard goes into that much detail. It merely says that roughly if the symbol has external linkage in different translation units, it is the same symbol. That makes clang's version correct.

From that point on, to the best of my knowledge, we're out of the standard. Compilers choices differ on what they consider a useful -fPIC output.

Note that g++ -c -std=c++11 -O3 -fPIE outputs:

0000000000000000 <_Z4nextv>:
   0:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 6 <_Z4nextv+0x6>
   6:   83 c0 01                add    $0x1,%eax
   9:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # f <_Z4nextv+0xf>
   f:   c3                      retq   

0000000000000010 <_Z5indexi>:
  10:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 16 <_Z5indexi+0x6>
  16:   89 f9                   mov    %edi,%ecx
  18:   83 c0 01                add    $0x1,%eax
  1b:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # 21 <_Z5indexi+0x11>
  21:   d3 e0                   shl    %cl,%eax
  23:   c3                      retq

So GCC does know how to optimize this. It just chooses not to when using -fPIC. But why? I can see only one explanation: make it possible to override the symbol during dynamic linking, and see the effects consistently. The technique is known as symbol interposition.

In a shared library, if index calls next, as next is globally visible, gcc has to consider the possibility that next could be interposed. So it uses the PLT. When using -fPIE however, you are not allowed to interpose symbols, so gcc enables the optimization.

So is clang wrong? No. But gcc seems to provide better support for symbol interposition, which is handy for instrumenting the code. It does so at the cost of some overhead if one uses -fPIC instead of -fPIE for building his executable though.

Additional notes:

In this blog entry from one of gcc developers, he mentions, around the end of the post:

While comparing some benchmarks to clang, I noticed that clang actually ignore ELF interposition rules. While it is bug, I decided to add -fno-semantic-interposition flag to GCC to get similar behaviour. If interposition is not desirable, ELF's official answer is to use hidden visibility and if the symbol needs to be exported define an alias. This is not always practical thing to do by hand.

Following that lead landed me on the x86-64 ABI spec. In section 3.5.5, it does mandate that all functions calling a globally visible symbols must go through the PLT (it goes as far as defining the exact instruction sequence to use depending on memory model).

So, though it does not violate C++ standard, ignoring semantic interposition seems to violate the ABI.

Last word: didn't know where to put this, but it might be of interest to you. I'll spare you the dumps, but my tests with objdump and compiler options showed that:

On the gcc side of things:

gcc -fPIC: accesses to last goes through GOT, calls to next() goes through PLT.
gcc -fPIC -fno-semantic-interposition: last goes through GOT, next() is inlined.
gcc -fPIE: last is IP-relative, next() is inlined.
-fPIE implies -fno-semantic-interposition

On the clang side of things:

clang -fPIC: last goes through GOT, next() is inlined.
clang -fPIE: last goes through GOT, next() is inlined.

And a modified version that compiles to IP-relative, inlined on both compilers:

// foo.cxx
int last_ __attribute__((visibility("hidden")));
extern int last __attribute__((alias("last_")));

int __attribute__((visibility("hidden"))) next_()
{
  return ++last_;
}
// This one is ugly, because alias needs the mangled name. Could extern "C" next_ instead.
extern int next() __attribute__((alias("_Z5next_v")));

int index(int scale) {
  return next_() << scale;
}

Basically, this explicitly marks that despite making them available globally, we use hidden version of those symbols that will ignore any kind of interposition. Both compilers then fully optimize the accesses, regardless of passed options.

Recommended topics

Hot tags