how virtual generic method call is implemented?
Asked Answered
E

3

17

I'm interesting in how CLR implementes the calls like this:

abstract class A {
    public abstract void Foo<T, U, V>();
}

A a = ...
a.Foo<int, string, decimal>(); // <=== ?

Is this call cause an some kind of hash map lookup by type parameters tokens as the keys and compiled generic method specialization (one for all reference types and the different code for all the value types) as the values?

Ernst answered 4/7, 2011 at 15:41 Comment(0)
R
14

I didn't find much exact information about this, so much of this answer is based on the excellent paper on .Net generics from 2001 (even before .Net 1.0 came out!), one short note in a follow-up paper and what I gathered from SSCLI v. 2.0 source code (even though I wasn't able to find the exact code for calling virtual generic methods).

Let's start simple: how is a non-generic non-virtual method called? By directly calling the method code, so the compiled code contains direct address. The compiler gets the method address from the method table (see next paragraph). Can it be that simple? Well, almost. The fact that methods are JITed makes it a little more complicated: what is actually called is either code that compiles the method and only then executes it, if it wasn't compiled yet; or it's one instruction that directly calls the compiled code, if it already exists. I'm going to ignore this detail further on.

Now, how is a non-generic virtual method called? Similar to polymorphism in languages like C++, there is a method table accessible from the this pointer (reference). Each derived class has its own method table and its methods there. So, to call a virtual method, get the reference to this (passed in as a parameter), from there, get the reference to the method table, look at the correct entry in it (the entry number is constant for specific function) and call the code the entry points to. Calling methods through interfaces is slightly more complicated, but not interesting for us now.

Now we need to know about code sharing. Code can be shared between two “instances” of the same method, if reference types in type parameters correspond to any other reference types, and value types are exactly the same. So, for example C<string>.M<int>() shares code with C<object>.M<int>(), but not with C<string>.M<byte>(). There is no difference between type type parameters and method type parameters. (The original paper from 2001 mentions that code can be shared also when both parameters are structs with the same layout, but I'm not sure this is true in the actual implementation.)

Let's make an intermediate step on our way to generic methods: non-generic methods in generic types. Because of code sharing, we need to get the type parameters from somewhere (e.g. for calling code like new T[]). For this reason, each instantiation of generic type (e.g. C<string> and C<object>) has its own type handle, which contains the type parameters and also method table. Ordinary methods can access this type handle (technically a structure confusingly called MethodTable, even though it contains more than just the method table) from the this reference. There are two types of methods that can't do that: static methods and methods on value types. For those, the type handle is passed in as a hidden argument.

For non-virtual generic methods, the type handle is not enough and so they get different hidden argument, MethodDesc, that contains the type parameters. Also, the compiler can't store the instantiations in the ordinary method table, because that's static. So it creates a second, different method table for generic methods, which is indexed by type parameters, and gets the method address from there, if it already exists with compatible type parameters, or creates a new entry.

Virtual generic methods are now simple: the compiler doesn't know the concrete type, so it has to use the method table at runtime. And the normal method table can't be used, so it has to look in the special method table for generic methods. Of course, the hidden parameter containing type parameters is still present.

One interesting tidbit learned while researching this: because the JITer is very lazy, the following (completely useless) code works:

object Lift<T>(int count) where T : new()
{
    if (count == 0)
        return new T();

    return Lift<List<T>>(count - 1);
}

The equivalent C++ code causes the compiler to give up with a stack overflow.

Radiotelegram answered 4/7, 2011 at 23:7 Comment(1)
@Ian, yeah, that was another of the sources I used, I forgot to mention it.Radiotelegram
U
4

Yes. The code for specific type is generated at the runtime by CLR and keeps a hashtable (or similar) of implementations.

Page 372 of CLR via C#:

When a method that uses generic type parameters is JIT-compiled, the CLR takes the method's IL, substitutes the specified type arguments, and then creates native code that is specific to that method operating on the specified data types. This is exactly what you want and is one of the main features of generics. However, there is a downside to this: the CLR keeps generating native code for every method/type combination. This is referred to as code explosion. This can end up increasing the application's working set substantially, thereby hurting performance. Fortunately, the CLR has some optimizations built into it to reduce code explosion. First, if a method is called for a particular type argument, and later, the method is called again using the same type argument, the CLR will compile the code for this method/type combination just once. So if one assembly uses List, and a completely different assembly (loaded in the same AppDomain) also uses List, the CLR will compile the methods for List just once. This reduces code explosion substantially.

Uremia answered 4/7, 2011 at 15:45 Comment(4)
Thx. I can't reproduce the map lookup speed downgrade on the growing size of generic method specialization with different combinations of the valuetype type parameters...Ernst
Look at the stuff I added from the book CLR via C#.Uremia
The book is wrong. Specific code is generated only for value type arguments (which prevents boxing), and reference types all share the same machine code.Learning
This is from back in 2004 and beta 2, any chance of a more recent stuff?Uremia
G
-1

EDIT

I now came across I now came across https://msdn.microsoft.com/en-us/library/sbh15dya.aspx which clearly states that generics when using reference types are reusing the same code, thus I would accept that as the definitive authority.

ORIGINAL ANSWER

I am seeing here two disagreeing answers, and both have references to their side, so I will try to add my two cents.

First, Clr via C# by Jeffrey Richter published by Microsoft Press is as valid as an msdn blog, especially as the blog is already outdated, (for more books from him take a look at http://www.amazon.com/Jeffrey-Richter/e/B000APH134 one must agree that he is an expert on windows and .net).

Now let me do my own analysis.

Clearly two generic types that contain different reference type arguments cannot share the same code

For example, List<TypeA> and List<TypeB>> cannot share the same code, as this would cause the ability to add an object of TypeA to List<TypeB> via reflection, and the clr is strongly typed on genetics as well, (unlike Java in which only the compiler validates generic, but the underlying JVM has no clue about them).

And this does not apply only to types, but to methods as well, since for example a generic method of type T can create an object of type T (for example nothing prevents it from creating a new List<T>), in which case reusing the same code would cause havoc.

Furthermore the GetType method is not overridable, and it in fact always return the correct generic type, prooving that each type argument has indeed its own code. (This point is even more important than it looks, as the clr and jit work based on the type object created for that object, by using GetType () which simply means that for each type argument there must be a separate object even for reference types)

Another issue that would result from code reuse, as the is and as operators will no longer work correctly, and in general all types of casting will have serious problems.

NOW TO ACTUAL TESTING:

I have tested it by having a generic type that contaied a static member, and than created two object with different type parameters, and the static fields were clrearly not shared, clearly prooving that code is not shared even for reference types.

EDIT:

See http://blogs.msdn.com/b/csharpfaq/archive/2004/03/12/how-do-c-generics-compare-to-c-templates.aspx on how it is implemented:

Space Use

The use of space is different between C++ and C#. Because C++ templates are done at compile time, each use of a different type in a template results in a separate chunk of code being created by the compiler.

In the C# world, it's somewhat different. The actual implementations using a specific type are created at runtime. When the runtime creates a type like List, the JIT will see if that has already been created. If it has, it merely users that code. If not, it will take the IL that the compiler generated and do appropriate replacements with the actual type.

That's not quite correct. There is a separate native code path for every value type, but since reference types are all reference-sized, they can share their implementation.

This means that the C# approach should have a smaller footprint on disk, and in memory, so that's an advantage for generics over C++ templates.

In fact, the C++ linker implements a feature known as “template folding“, where the linker looks for native code sections that are identical, and if it finds them, folds them together. So it's not a clear-cut as it would seem to be.

As one can see the CLR "can" reuse the implementation for reference types, as do current c++ compilers, however there is no guarantee on that, and for unsafe code using stackalloc and pointers it is probably not the case, and there might be other situations as well.

However what we do have to know that in CLR type system, they are treated as different types, such as different calls to static constructors, separate static fields, separate type objects, and a object of a type argument T1 should not be able to access a private field of another object with type argument T2 (although for an object of the same type it is indeed possible to access private fields from another object of the same type).

Guadalcanal answered 4/11, 2013 at 6:3 Comment(10)
You're confusing two different things: sharing of type data structures (specifically MethodTable and EEClass; which include static fields) and sharing of JITed native code. The data structures clearly can't be shared, you are right about that. But the native code can be and is shared. Just because reflection doesn't allow you to call the code incorrectly doesn't mean the checks are done in the native code. You can verify that the code is indeed shared by using SOS.dll in WinDbg to get the address of the native code for the same method in e.g. List<TypeA> and List<TypeB>.Radiotelegram
@Radiotelegram See my test, static fields should be unique to a type, and not to an instance, and if the code is shared, then static fields of different types would share their values, and also the fact that they are two different types indicate the factGuadalcanal
@Radiotelegram Nevertheless the JIT can share any code that it want, but it is implementation specific, and in general you should consider it as not using the same code, (remember the JIT has rights to reuse any code even non-generic, so that means nothing, and there is code that cannot be shared even by generics, such as GetType() and static members)Guadalcanal
@Radiotelegram I think that you are the one that is confusing two different things, the creation of different types and code for the different types, and the JIT sharing of code which is an implementation specific JIT optimization and is unreleated to generic types (though for generic types it is possibly doing more code sharing than for everything else as it is more suitable for it) and as a result it is irrelevant for our discussionGuadalcanal
Then I don't understand what exactly do you mean by saying “List<TypeA> and List<TypeB> cannot share the same code”. If you're not talking about the JITted code, then what code are you talking about? Static fields are data, not code.Radiotelegram
@Radiotelegram again code sharing is not part of the CLI specifications, it is just an implementation specific optimization, and it even applies in non-generic situations, for example two methods with similar code, just differing in type (similar to a generic method, just that the types do not inherit from a common base class) can share code even without being generic, another case is a generic and non generic class (such as List and List<T>) then if the code is the same, then the code can be reused even without both being genericGuadalcanal
On the other hand generic methods cannot always share code even for reference objects, say for example we have a class MyType<T> where T: MyBase, and suppose that we have code that calls T.MyMethod(), and then we have two type arguments A and B, both inherit from MyBase, however in one of them MyMethod() is virtual while in the other it is not, in this case clearly we can't share the code, as they need totally different code (one for calling the overriden method if any, and one does not), and there are probably other examples as well, not to mention if using unsafe code and/or stackalloc...Guadalcanal
In your example, either the code wouldn't compile at all (if MyBase doesn't have MyMethod) or it would ignore the non-virtual method (if MyBase has virtual MyMethod), so the same code can be reused. In any case, if “it's implementation detail whether code is shared” is true, then “code cannot be shared” has to be false.Radiotelegram
@Radiotelegram I don't understand, if my base is an interface than u r free to declare virtual or not, and even with a actual base class than MyMethod may or not be virtual (in case it is virtual than B can declare it new, and if the base is non virtual then A declares it new virtual), either way stackallloc can anyway not be sharedGuadalcanal
@Radiotelegram You misunderstood me, I have never said that sharing code is never possible, in fact it is possible even if it is non generic as I showed, what I did say is the specifications for implementing generics does not include code sharing, although a JIT compiler might use code sharing for some generic methods just as it can for non generic, (however the type object including the static data is impossible to share), it is implementation specfic and subject to change any time, so the basic answer for how generic is implemented is "they create a full separate type for every generic argumentGuadalcanal

© 2022 - 2024 — McMap. All rights reserved.