Does Rust devirtualize trait object function calls?
Asked Answered
K

2

13

devirtualize: to change a virtual/polymorphic/indirect function call into a static function call due to some guarantee that the change is correct -- source: myself

Given a simple trait object, &dyn ToString, created with a statically known type, String:

fn main() {
    let name: &dyn ToString = &String::from("Steve");
    println!("{}", name.to_string());
}

Does the call to .to_string() use <String as ToString>::to_string() directly? Or only indirectly via the trait's vtable? If indirectly, would it be possible to devirtualize this call? Or is there something fundamental that hinders this optimization?

The motivating code for this question is much more complicated; it uses async trait functions and I'm wondering if returning a Box<dyn Future> can be optimized in some cases.

Kiethkiev answered 14/12, 2020 at 13:59 Comment(4)
I would bet it isn't devirtualized as trait objects are typically used with dynamic intention on purpose, while the typical idiom is to use generics when possible. Would be superb if this optimization is actually implementedSinkage
The general answer is "it depends", if the compiler is able to resolve the concrete type, then it will optimize the indirection away.Golly
Related: What are the actual runtime performance costs of dynamic dispatch?Golly
Yes, the compiler won't be stopped by the mere fact that something is &dyn Foo or Box<dyn Bar>; see here for an example.Voluptuous
P
11

Does Rust devirtualize trait object function calls?

No.

Rust is a language, it doesn't do anything; it only prescribes semantics.

In this specific case, the Rust language doesn't prescribe devirtualization, so an implementation is permitted to do it.


At the moment, the only stable implementation is rustc, with the LLVM backend -- though you can use the cranelift backend if you feel adventurous.

You can test your code for this implementation on the playground and select "Show LLVM IR" instead of "Run", as well as "Release" instead of "Debug", you should be able to check that there is no virtual call.

A revised version of the code isolates the cast to trait + dynamic call to make it easier:

#[inline(never)]
fn to_string(s: &String) -> String {
    let name: &dyn ToString = s;
    name.to_string()
}

fn main() {
    let name = String::from("Steve");
    let name = to_string(&name);
    println!("{}", name);
}

Which when run on the playground yields among other things:

; playground::to_string
; Function Attrs: noinline nonlazybind uwtable
define internal fastcc void @_ZN10playground9to_string17h4a25abbd46fc29d4E(%"std::string::String"* noalias nocapture dereferenceable(24) %0, %"std::string::String"* noalias readonly align 8 dereferenceable(24) %s) unnamed_addr #0 {
start:
; call <alloc::string::String as core::clone::Clone>::clone
  tail call void @"_ZN60_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$5clone17h1e3037d7443348baE"(%"std::string::String"* noalias nocapture nonnull sret dereferenceable(24) %0, %"std::string::String"* noalias nonnull readonly align 8 dereferenceable(24) %s)
  ret void
}

Where you can clearly see that the call to ToString::to_string has been replaced by a simple call to <String as Clone>::clone; a devirtualized call.

The motivating code for this question is much more complicated; it uses async trait functions and I'm wondering if returning a Box<dyn Future> can be optimized in some cases.

Unfortunately, you cannot draw any conclusion from the above example.

Optimizations are finicky. In essence, most optimizations are akin to pattern-matching+replacing using regexes: differences that to human look benign may completely throw off the pattern-matching and prevent the optimization to apply.

The only way to be certain that the optimization is applied in your case, if it matters, is to inspect the emitted assembly.

But, really, in this case, I'd be more worried about the memory allocation than about the virtual call. A virtual call is about 5ns of overhead -- though it does inhibit a number of optimization -- whereas a memory allocation (and the eventual deallocation) routinely cost 20ns - 30ns.

Proa answered 14/12, 2020 at 17:18 Comment(1)
If rustc/LLVM has enough context to eliminate the virtual call, it can also eliminate the allocation; at least that's what happens in a simple example. I'm not saying the OP should rely on that, but there's at least hope that the allocation will be eliminated along with the virtual call and end up not being something that would destroy performance.Voluptuous
M
4

Does the call to .to_string() use <String as ToString>::to_string() directly? Or only indirectly via the trait's vtable?

We can test this case by writing two functions, one that uses dyn ToString, and one that uses the concrete type String directly:

pub fn dyn_to_string() {
    let name: &dyn ToString = &String::from("Steve");
    println!("{}", name.to_string());
}

pub fn concrete_to_string() {
    let name: &String = &String::from("Steve");
    println!("{}", name.to_string());
}

And now we can view the generated assembly:

playground::dyn_to_string:
    ...
    callq   *<alloc::string::String as core::clone::Clone>::clone@GOTPCREL(%rip)
    movq    %rbx, 24(%rsp)
    leaq    <alloc::string::String as core::fmt::Display>::fmt(%rip), %rax

As you can see dyn_to_string is optimized to use <String as Clone>::clone directly instead of indirectly through a vtable - it was devirtualized. In fact, the concrete implementation is exactly the same as the trait object call:

set playground::concrete_to_string, playground::dyn_to_string

However, to answer the broader question:

Does Rust devirtualize trait object function calls?

It depends. The compiler cannot always perform devirtualization. It did in the above code, but in other cases, it might not. You should not expect that a trait object call will be devirtualized. Generics are a guaranteed zero cost abstraction. Trait objects are not.

Macaco answered 14/12, 2020 at 17:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.