C++ virtual functions: Can the linker remove entries in the virtual function table which aren't called?
Asked Answered
G

1

11

This question is a kind of followup to eliminate unused virtual functions, which does not go deep enough for my interest.

The problem: When defining classes that have virtual functions, the compiler allocates storage for the virtual function table, and stores pointers to the functions in the table. This causes the linker to keep the code of those functions, regardless of whether they are ever called. This could potentially cause a lot of dead code to be retained in the executable, even when the compiler optimization settings demand elimination of dead code.

Now, if nowhere in the executable there is a call of a particular virtual function (or, in other words, an access to the respective slot of the virtual function table), the corresponding function pointer could be omitted from the virtual function table, and the linker would remove the function's code, with possible further omissions of other code that becomes unreferenced.

Obviously, this can't be done by the compiler, since it only becomes clear at link time whether a particular virtual function is called (assuming static linking - it is clear that it can't be done with dynamic linking). I'm not familiar enough with linkers in order to tell whether the compiler can emit virtual function tables in such a way that the linker can selectively elide individual unused entries in the table.

Basically, my train of thought is this: A function pointer in a virtual function table is a reference to a function which the linker uses to determine that the function's code needs to be retained in the executable. In a similar way, a virtual function call is a reference to a particular slot in all virtual function tables that derive from the class whose virtual function is getting called. Could this kind of referencing be communicated to the linker in such a way that it can elide a virtual function table slot when there are zero references to it?

Note that this isn't the same as replacing a virtual function call with a direct call when the compiler can determine the call target at compile time. I know that some compilers can do that, but that's a different case because the function actually gets called, and it is the overhead of virtual function dispatch that is removed. In my case I want the entire code removed for functions that aren't called.

If I had control over all class definitions, I could manually eliminate all virtual functions which aren't called. But that is unrealistic when using libraries.

Is this something that can be done with "link time optimization" or "whole program optimization"? Are there compilers which successfully do that?

Gilbye answered 3/6, 2015 at 10:14 Comment(1)
Common linkers don't provide the and operator, but it would easy to do.Agoraphobia
D
2

The problem with the dead code is that the compiler cannot possibly be sure that the code is dead from the perspective of dynamic libraries. An executable can dynamically include a library that uses the dead code (derives from classes owning the dead code).

In addition to that, changing the structure of the v-table during link-time might work perfectly fine if the executable is the only one making function calls. However, if a dynamic library makes any calls, it will have a different understanding of the v-table and it will call the wrong function.

Because of these facts, and on face value not much (if any) performance is gained, optimising linkers are very unlikely to have this feature.

De-virtualisation of virtual functions is actually related to this, and safe optimising linkers can only de-virtualise a very small number of function calls. For instance, it can only de-virtualise the function if it can guarantee that no dynamic library can play any part in the callstack.

edit @curiousguy has brought up a case where the compiler is able to be a bit more liberal with optimising, and that is when the linker can know that no external code knows about the class. An example of this is a class with file scope.

Dishearten answered 1/10, 2015 at 12:32 Comment(11)
Who said anything about changing vtable layout?Agoraphobia
How can you remove entries without changing the layout?Dishearten
I see your point. You can keep the layout the same... But to what end? Calling the deleted function will still not work. Though calling other functions is still safe. However optimisations have to be 100% safe.Dishearten
A vtable offset is like offset of a variable in a struct, it is a compile time constant. You cannot change it later. The only way to change vtable layout is to perform the optimisation before code generation. If you know that a member function cannot be called virtually, you can omit it in the vtable, have a null pointer (or pointer to abort) instead in the vtable, without changing vtable layout and breaking virtual calls to other functions. If the function is not called either virtual or non-virtually you can omit its code.Agoraphobia
f.ex. p->foo() produces movq -24(%rbp), %rax movq (%rax), %rax movq (%rax), %rax movq -24(%rbp), %rdx movq %rdx, %rdi call *%rax see goo.gl/0Gs9AeAgoraphobia
A non member function can be called either by name with a postfix expression or by having its address taken. The compiler knows that and can remove functions not called either way, from reachable code. see goo.gl/IGm8JZAgoraphobia
Well, it could change they offset later. They just don't because it's way too hard. The question is about what linked are able to do, not what is trivial to do. But you've missed the point. I'm saying it's not possible because the linker can't know a virtual function is never called (when you take into consideration dynamic libraries). Non-member functions need to be exported to be visible so they can be culled more easily. I don't believe they need to be exported to be called via polymorphic dispatch. Are you able to provide an example where the compiler knows it can cull a virtual function?Dishearten
If the code needs to be changed, then this is more than "linking", this is recompiling. This is not what a normal linker does. If you allow the linker to do that, you can just remove the concept of linking and just give all the source files to the compiler at once and officially do global optimisation. This is all I was trying to say about fixing offsets.Agoraphobia
Let us continue this discussion in chat.Agoraphobia
There are linkers that can change/add code. The Microsoft linker is one example.Dishearten
The feature is called Link-Time Code Generation (or Whole Program Optimisation). I don't know exactly what situations it kicks in. As you identified, this question is probably about the implementation as a whole, not necessarily about the linker, in any case.Dishearten

© 2022 - 2024 — McMap. All rights reserved.