Disclaimer: the question is fairly open-ended, therefore this answer might be incomplete. Treat it with an even bigger grain of salt than you would normally.
Rust uses a simple "virtual table" to implement dynamic dispatch. This strategy is also used in C++ for which you can see a study here. The study is a bit dated though.
The cost of indirection
Virtual dispatch induces indirection, this has a cost for multiple reasons:
- indirection is opaque: this inhibits inlining and constant propagation which are key enablers for many compiler optimizations
- indirection has a runtime cost: if incorrectly predicted, you are looking at pipeline stalls and expensive memory fetches
Optimizing indirection
Compilers, however, muddies the water by trying their best at optimizing indirection away.
- devirtualization: sometimes the compiler can resolve the virtual table look-up at compile time (usually, because it knows the concrete type of the object); if so, it can therefore use a regular function call rather than an indirect one, and optimize away the indirection
- probabilistic devirtualization: last year Honza Hubička introduced a new optimization in gcc (read the 5-part series as it is very instructive). The gist of the strategy is to build the inheritance graph to make an educated guess at the potential type(s), and then use a pattern like
if v.hasType(A) { v.A::call() } elif v.hasType(B) { v.B::call() } else { v.virtual-call() }
; special-casing the most likely types means regular calls in this case, and therefore inlined/constant-propagated/full-goodies calls.
This latter strategy could be fairly interesting in Rust due to the coherence rules and privacy rules, as it should have more cases where the full "inheritance" graph is provably known.
The cost of monomorphization
In Rust, you can use compile-time polymorphism instead of run-time polymorphism; the compiler will emit one version of the function for each unique combination of compile-time parameters it takes. This, itself, has a cost:
- compilation time cost: more code to be produced, more code to be optimized
- binary size cost: the produced binary will end-up being larger, a typical size/speed trade-off
- run-time cost: possibly, the larger code size might lead to cache misses at CPU level
The compiler might be able to merge together specialized functions that end up having the same implementation (because of phantom types, for example), however it is still more than likely than the produced binaries (executables and libraries) will end up larger.
As usual with performance, you have to measure in your case what is more beneficial.