Why don't compilers devirtualize calls for a final class when inlining?
Asked Answered
R

1

8
struct base {
    virtual void vcall() = 0;
};

struct foo final : base {
    void vcall() final;
};

void call_base(base& b) {
    b.vcall();
}

void call_foo(foo& f) {
    call_base(f);
}

void call_foo_directly(foo& f) {
    f.vcall();
}

clang 16 produces:

call_base(base&):
        mov     rax, qword ptr [rdi]
        jmp     qword ptr [rax]
call_foo(foo&):
        mov     rax, qword ptr [rdi]
        jmp     qword ptr [rax]
call_foo_directly(foo&):
        jmp     foo::vcall()@PLT

GCC and MSVC produce the same result, so it's not a problem limited to clang. Shouldn't it be possible for call_foo to contain a non-virtual call to foo::vcall() too? Is this a missed optimization, or is it possible for the call to be virtual?

See live example on Compiler Explorer.

Rateable answered 19/6, 2023 at 21:4 Comment(5)
It's probably not worth the effort. Typically you don't know the dynamic type when dealing with polymorphism. Note that while foo might be final, other thing can derive from baseMaggoty
@NathanOliver-IsonStrike wouldn't you know the dynamic type automatically when inlining call_base into call_foo? The compiler is clearly able to make this optimization locally.Rateable
Most likely a phase-ordering issue that would make this harder than it seems -- the compiler(s) are probably doing the virtual-to-direct optimization before inlining, since that order would then allow them to inline the now direct calls. Probably the payoff from that is better than the benefit you could get from this.Komi
Considering clang doesn't devirtualize base& b = f; b.vcall() but does devirtualize ((base&) f).vcall(), seems like a missed optimization.Beaty
Devirtualization after inlining seems to be known as a commonly missed optimization: lists.llvm.org/pipermail/llvm-dev/2019-May/132222.html, gcc.gnu.org/bugzilla/show_bug.cgi?id=91771, gcc.gnu.org/bugzilla/show_bug.cgi?id=89924Fourth
D
-1

The compiler does try, but there needs to be something to inline, if a function has no implementation it's just a empty call and that's what gets compiled; adding final just prevents the use of override later. To compile it with optimization volatile is kinda required so everything isn't optimized away.

Run this in bodbolt.

struct base {
    volatile int num = 111;
    virtual void vcall() = 0;
};

struct foo final : base {
    void vcall() {
        num += 222;
    };
};

void call_base(base& b) {
    b.vcall();
}
void call_foo(foo& f) {
    call_base(f);
}

void call_foo_directly(foo& f) {
    f.vcall();
}

void main_func(void) {
    foo val;
    call_foo(val);
    call_foo_directly(val);
}

This is the clang-15 with -O3 partial disassembly (same with -O2); vs couldn't inline call_foo.

main_func():                          # @main_func()
        mov     dword ptr [rsp - 8], 111
        add     dword ptr [rsp - 8], 222
        add     dword ptr [rsp - 8], 222
        ret
Distend answered 20/6, 2023 at 19:17 Comment(11)
You can inspect optimizations by not defining functions, so the += 222 volatile stuff isn't really necessary here. You can tell a virtual call apart from a non-virtual call whether it's a direct or indirect call. When declaring an object in main and making virtual calls through it, the compiler does optimize it, but that's not the scenario I'm curious about. final means that we 100% know that when calling through the derived class, we can make direct calls instead of virtual calls, because there cannot be any overriding function.Rateable
???, this isn't java, if you remove the volatile clang will compile main_func to just ret and there is no main in the example, just because the function has "main" in the name doesn't make it the entry point; you should check out what final means in c++.Distend
I didn't say that this is Java. The compiler can't optimize out anything that doesn't have a definition in the current TU. Just look at godbolt.org/z/G6vWP8vWf and you will see that there are two non-virtual calls in main. You can tell that it's non-virtual because it's a direct call instruction call foo::vcall()@PLT, so we are not making a call to an address fetched from the vtable.Rateable
@Distend Your answer misses the point of the question. The compiler ought to devirtualize the call even if the definition of the functions aren't available in the current TU. Giving it any definition, whether with volatile access or without, masks whether the optimization that OP is interested in is applied. Inlining the devirtualized call is not the goal.Fourth
@JanSchultke can you explain to me what do you meant by when inlining?Distend
call_base is obviously being inlined into call_foo judging by the assembly, but no devirtualization takes place even though it should be possible.Rateable
I'm not sure about that, how can the compiler optimize something that is not implemented, with an empty like here most is removed; it's really odd to me that compiler is going to remove chunks of code without making sure that it is actually unnecessary.Distend
@Distend it doesn't eliminate the call, but it devirtualizes the call. Look at the assembly! A virtual call consists of a mov from the vtable and an indirect call; a non-virtual call is simply a call instruction with a constant address. This question is not about optimizing functions away through inlining, it's about devirtualizing function calls (although devirtualizing is a pre-requisite for eliminating calls to empty functions, so the two are related a little bit).Rateable
I don't think that i can give you the solution you are locking for so I'll leave it here; as a final note, if you only care about the "extra" calls in the assembly, just make everything inline.Distend
@JanSchultke: Wording nitpick, the mov rax, [rdi] is loading a pointer to the vtable, not from the vtable itself. Loading a function pointer from the vtable (into RIP) happens via the memory operand for the memory-indirect jmp qword ptr [rax]. On RISC ISAs like AArch64, there'd be two loads and a jump with a register operand. But yes, fully inlining is something compilers can do in this case where they fail to devirtualize. :/ It's somewhat interesting that the same caller does definitely see through the virtualization when inlining is possible.Natterjack
@SrPanda: Yes, of course if you have small functions that are visible at compile time, the best thing is for them to be fully inlined as well as devirtualized. We don't expect that there is a workaround to get compilers to do this missed optimization, that's why the question is phrased as "why don't they?", asking what about compiler internals makes it hard for them to do this when they can inline in the same case. We know they devirtualize in some cases. It's not the call that we're trying to avoid, it's the extra load and the fact that it's indirect (call r/m64 vs call rel32)Natterjack

© 2022 - 2024 — McMap. All rights reserved.