What are the actual runtime performance costs of dynamic dispatch?
Asked Answered
H

1

29

There's some background on this topic in the Rust book section on static and dynamic dispatch, but the tl;dr is that calling a method on a trait reference and a few other various situation (function pointers, etc) results in dynamic instead of static dispatch.

What is actual runtime cost of this, after optimizations have been applied?

For example, imagine this set of structs & traits:

struct Buffer;
struct TmpBuffer;
struct TmpMutBuffer;

impl BufferType for Buffer { ... }
impl BufferType for BufferTmp { ... }
impl BufferType for BufferTmpMut { ... }

impl Buffer2D for BufferType { ... }

impl Buffer2DExt for Buffer2D { ... }

Notice that the traits here are implemented on traits themselves.

What is the calling cost of dynamic dispatch to invoke a method from Buffer2DExt on a struct reference?

The recent question What are Rust's exact auto-dereferencing rules? regards the dereferencing rules; are these rules applied at compile time, or runtime?

Histogenesis answered 20/2, 2015 at 5:7 Comment(2)
invoking any method on a struct reference will be static dispatch, not dynamic dispatch, as you already know the type and therefor the exact function to call.Coldshoulder
The dereferencing rules are applied at compile time.Gable
O
51

Disclaimer: the question is fairly open-ended, therefore this answer might be incomplete. Treat it with an even bigger grain of salt than you would normally.

Rust uses a simple "virtual table" to implement dynamic dispatch. This strategy is also used in C++ for which you can see a study here. The study is a bit dated though.

The cost of indirection

Virtual dispatch induces indirection, this has a cost for multiple reasons:

  • indirection is opaque: this inhibits inlining and constant propagation which are key enablers for many compiler optimizations
  • indirection has a runtime cost: if incorrectly predicted, you are looking at pipeline stalls and expensive memory fetches

Optimizing indirection

Compilers, however, muddies the water by trying their best at optimizing indirection away.

  • devirtualization: sometimes the compiler can resolve the virtual table look-up at compile time (usually, because it knows the concrete type of the object); if so, it can therefore use a regular function call rather than an indirect one, and optimize away the indirection
  • probabilistic devirtualization: last year Honza Hubička introduced a new optimization in gcc (read the 5-part series as it is very instructive). The gist of the strategy is to build the inheritance graph to make an educated guess at the potential type(s), and then use a pattern like if v.hasType(A) { v.A::call() } elif v.hasType(B) { v.B::call() } else { v.virtual-call() }; special-casing the most likely types means regular calls in this case, and therefore inlined/constant-propagated/full-goodies calls.

This latter strategy could be fairly interesting in Rust due to the coherence rules and privacy rules, as it should have more cases where the full "inheritance" graph is provably known.

The cost of monomorphization

In Rust, you can use compile-time polymorphism instead of run-time polymorphism; the compiler will emit one version of the function for each unique combination of compile-time parameters it takes. This, itself, has a cost:

  • compilation time cost: more code to be produced, more code to be optimized
  • binary size cost: the produced binary will end-up being larger, a typical size/speed trade-off
  • run-time cost: possibly, the larger code size might lead to cache misses at CPU level

The compiler might be able to merge together specialized functions that end up having the same implementation (because of phantom types, for example), however it is still more than likely than the produced binaries (executables and libraries) will end up larger.

As usual with performance, you have to measure in your case what is more beneficial.

Outfielder answered 20/2, 2015 at 9:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.