Performance.
Imagine a set of classes that override a virtual base method:
class Base {
public virtual int func(int x) { return 0; }
}
class ClassA: Base {
public override int func(int x) { return x + 100; }
}
class ClassB: Base {
public override int func(int x) { return x + 200; }
}
Now imagine you want to call the func
method:
Base foo;
//...sometime later...
int x = foo.func(42);
Look at what the CPU has to actually do:
mov ecx, bfunc$ -- load the address of the "ClassB.func" method from the VMT
push 42 -- push the 42 argument
call [eax] -- call ClassB.func
No problem? No, problem!
The assembly isn't that hard to follow:
mov ecx, foo$
: This needs to reach into memory, and hit the part of the object's Virtual Method Table (VMT) to get the address of the overridden foo
method. The CPU will begin the fetch of the data from memory, and then it will continue on:
push 42
: Push the argument 42
onto the stack for the call to the function. No problem, that can run right away, and then we continue to:
call [ecx]
Call the address of the ClassB.func
function. ← 𝕊𝕋𝔸𝕃𝕃!!!
That's a problem. The address of ClassB.func
function has not been fetched from the VMT yet. This means that the CPU doesn't know where the go to next. Ideally it would follow a jump
and continue spectatively executing instructions as it waits for the address of ClassB.func
to come back from memory. But it can't; so we wait.
If we are lucky: the data already is in the L2 cache. Getting a value out of the L2 cache into a place where it can be used is going to take 12-15 cycles. The CPU can't know where to go next without having to wait for memory for 12-15 cycles.
𝕋𝕙𝕖 ℂℙ𝕌 𝕚𝕤 𝕤𝕥𝕒𝕝𝕝𝕖𝕕 𝕗𝕠𝕣 𝟙𝟚-𝟙𝟝 𝕔𝕪𝕔𝕝𝕖𝕤
Our program is stuck doing nothing for 12-15 cycles.
The CPU core has 7 execution engines. The main job of the CPU is keeping those 7 pipelines full of stuff to do. That means:
- JITing your machine code into a different order
- Starting the fetch from memory as soon as possible, letting us move on to other things
- executing 100, 200, 300 instructions ahead. It will be executing 17 iterations ahead in your loop, across multiple function call and returns
- it has a branch predictor to try to guess which way a comparison will go, so that it can continue executing ahead while we wait. If it guesses wrong, then it does have to undo all that work. But the branch predictor is not stupid - it's right 94% of the time.
Your CPU has all this power, and capability, and it's just STALLED FOR 15 CYCLES!?
This is awful. This is terrible. And you suffer this penalty every time you call a virtual
method - whether you actually overrode it or not.
Our program is 12-15 cycles slower every method call because the language designer made virtual methods opt-out rather than opt-in.
This is why Microsoft decided to not make all methods virtual by default: they learned from Java's mistakes.
Someone ported Android to C#, and it was faster
In 2012, the Xamarin people ported all of Android's Dalvik (i.e. Java) to C#. From them:
Performance
When C# came around, Microsoft modified the language in a couple of significant ways that made it easier to optimize. Value types were introduced to allow small objects to have low overheads and virtual methods were made opt-in, instead of opt-out which made for simpler VMs.
(emphasis mine)
this
reference is null. See here for more info. – Kieserite