Why does adding beforefieldinit drasticly improve the execution speed of generic classes?

Asked 28/1, 2013 at 20:41 Answered 29/1, 2013 at 16:3

I'm working on a proxy and for generic classes with a reference type parameter it was very slow. Especially for generic methods (about 400 ms vs 3200 ms for trivial generic methods that just returned null). I decided to try to see how it would perform if I rewrote the generated class in C#, and it performed much better, about the same performance as my non-generic class code.

Here is the C# class I wrote:: (note I changed by naming scheme but not a heck of a lot)::

namespace TestData
{
    public class TestClassProxy<pR> : TestClass<pR>
    {
        private InvocationHandler<Func<TestClass<pR>, object>> _0_Test;
        private InvocationHandler<Func<TestClass<pR>, pR, GenericToken, object>> _1_Test;
        private static readonly InvocationHandler[] _proxy_handlers = new InvocationHandler[] { 
            new InvocationHandler<Func<TestClass<pR>, object>>(new Func<TestClass<pR>, object>(TestClassProxy<pR>.s_0_Test)), 
        new GenericInvocationHandler<Func<TestClass<pR>, pR, GenericToken, object>>(typeof(TestClassProxy<pR>), "s_1_Test") };



        public TestClassProxy(InvocationHandler[] handlers)
        {
            if (handlers == null)
            {
                throw new ArgumentNullException("handlers");
            }
            if (handlers.Length != 2)
            {
                throw new ArgumentException("Handlers needs to be an array of 2 parameters.", "handlers");
            }
            this._0_Test = (InvocationHandler<Func<TestClass<pR>, object>>)(handlers[0] ?? _proxy_handlers[0]);
            this._1_Test = (InvocationHandler<Func<TestClass<pR>, pR, GenericToken, object>>)(handlers[1] ?? _proxy_handlers[1]);
        }


        private object __0__Test()
        {
            return base.Test();
        }

        private object __1__Test<T>(pR local1) where T:IConvertible
        {
            return base.Test<T>(local1);
        }

        public static object s_0_Test(TestClass<pR> class1)
        {
            return ((TestClassProxy<pR>)class1).__0__Test();
        }

        public static object s_1_Test<T>(TestClass<pR> class1, pR local1) where T:IConvertible
        {
            return ((TestClassProxy<pR>)class1).__1__Test<T>(local1);
        }

        public override object Test()
        {
            return this._0_Test.Target(this);
        }

        public override object Test<T>(pR local1)
        {
             return this._1_Test.Target(this, local1, GenericToken<T>.Token);
        }
    }
}

This is compiles in release mode to the same IL as my generated proxy here is the class that its proxying::

namespace TestData
{
    public class TestClass<R>
    {
        public virtual object Test()
        {
            return default(object);
        }

        public virtual object Test<T>(R r) where T:IConvertible
        {
            return default(object);
        }
    }
}

There was one-exception, I was not setting the beforefieldinit attribute on the type generated. I was just setting the following attributes::public auto ansi

Why did using beforefieldinit make the performance improve so much?

(The only other difference was I wasn't naming my parameters which really didn't matter in the grand scheme of things. The names for methods and fields are scrambled to avoid collision with real methods. GenericToken and InvocationHandlers are implementation details that are irrelevant for sake of argument.
GenericToken is literally used as just a typed data holder as it allows me to send "T" to the handler

InvocationHandler is just a holder for the delegate field target there is no actual implementation detail.

GenericInvocationHandler uses a callsite technique like the DLR to rewrite the delegate as needed to handle the different generic arguments passed )

EDIT:: Here is the test harness::

private static void RunTests(int count = 1 << 24, bool displayResults = true)
{
    var tests = Array.FindAll(Tests, t => t != null);
    var maxLength = tests.Select(x => GetMethodName(x.Method).Length).Max();

    for (int j = 0; j < tests.Length; j++)
    {
        var action = tests[j];
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < count; i++)
        {
            action();
        }
        sw.Stop();
        if (displayResults)
        {
            Console.WriteLine("{2}  {0}: {1}ms", GetMethodName(action.Method).PadRight(maxLength),
                              ((int)sw.ElapsedMilliseconds).ToString(), j);
        }
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
    }
}

private static string GetMethodName(MethodInfo method)
{
    return method.IsGenericMethod
            ? string.Format(@"{0}<{1}>", method.Name, string.Join<Type>(",", method.GetGenericArguments()))
            : method.Name;
}

And in a test I do the following::

Tests[0] = () => proxiedTestClass.Test();
Tests[1] = () => proxiedTestClass.Test<string>("2");
Tests[2] = () => handClass.Test();
Tests[3] = () => handClass.Test<string>("2");
RunTests(100, false);
RunTests();

Where Tests is a Func<object>[20], and proxiedTestClass is the class generated by my assembly, and handClass is the one I generated by hand. RunTests is called twice, once to "warm" things up, and again to run it and print to the screen. I mostly took this code from a post here by Jon Skeet.

Fielder answered 28/1, 2013 at 20:41 Comment(4)

Can you show us your measuring code? It's possible beforefieldinit just moved the initialization outside of your measurements. – Manama 28/1, 2013 at 21:28

I just added the test harness code its not really a complicated one its tested in a microloop. – Fielder 28/1, 2013 at 21:38

If you post a complete repro code file on pastebin I'll look into it tomorrow. I'm interested. – Bogor 28/1, 2013 at 21:47

It's a bit rough as a sln. However you can download it from bitbucket:: bitbucket.org/mburbea/delegatedproxy/downloads If you'd like to reproduce the slowness change ProxiedTypeBuilder.cs line 130 to TypeAttributes.AutoClass | TypeAttributes.Public); – Fielder 28/1, 2013 at 22:32

As stated in ECMA-335 (CLI cpecification), part I, section 8.9.5:

The semantics of when and what triggers execution of such type initialization methods, is as follows:

A type can have a type-initializer method, or not.

A type can be specified as having a relaxed semantic for its type-initializer method (for convenience below, we call this relaxed semantic BeforeFieldInit).

If marked BeforeFieldInit then the type’s initializer method is executed at, or sometime before, first access to any static field defined for that type.

If not marked BeforeFieldInit then that type’s initializer method is executed at (i.e., is triggered by):

a. first access to any static field of that type, or

b. first invocation of any static method of that type, or

c. first invocation of any instance or virtual method of that type if it is a value type or

d. first invocation of any constructor for that type.

Also, as you can see from the Michael's code above, the TestClassProxy has only one static field: _proxy_handlers. Notice, that it is used only two times:

In the instance constructor
And in the static field initializer itself

So when BeforeFieldInit is specified, type-initializer will be called only once: in the instance constructor, right before the first access to _proxy_handlers.

But if BeforeFieldInit is omitted, CLR will place the call to the type-initializer before every TestClassProxy's static method invocation, static field access, etc.

In particular, the type-initializer will be called on every invocation of s_0_Test and s_1_Test<T> static methods.

Of course, as stated in ECMA-334 (C# Language Specification), section 17.11:

The static constructor for a non-generic class executes at most once in a given application domain. The static constructor for a generic class declaration executes at most once for each closed constructed type constructed from the class declaration (§25.1.5).

But in order to guarantee this, CLR have to check (in a thread-safe manner) if the class is already initialized, or not.

And these checks will decrease the performance.

PS: You might be surprised that performance issues will gone once you change s_0_Test and s_1_Test<T> to be instance-methods.

Bashful answered 29/1, 2013 at 15:55 Comment(2)

What I don't understand is why the calls are added to the static methods, even if those are JITted after the type initializer has already been called. – Manama 29/1, 2013 at 16:11

Very strange but seems to be true. In the C# version of the class moving the assignment of the handlers array into the .cctor causes the performance to be awful. If I use a valuetype like int the performance is better. However being as the CLR generates an entire class it can probably store some value to say its already inited. For the reference types it must have no storage for checking inited. – Fielder 29/1, 2013 at 18:45

First, if you want to learn more about beforefieldinit, read Jon Skeet's article C# and beforefieldinit. Parts of this answer are based on that and I'll repeat the relevant bits here.

Second, your code does very little, so overhead will have significant impact on your measurements. In real code, the impact is likely to be much smaller.

Third, you don't need to use Reflection.Emit to set whether a class has beforefieldint. You can disable that flag in C# by adding a static constructor (e.g. static TestClassProxy() {}).

Now, what beforefieldinit does is that it governs when is the type initializer (method called .cctor) called. In C# terms, type initializer contains all static field initializers and code from the static constructor, if there is one.

If you don't set that flag, the type initializer will be called when either an instance of the class is created or any of the static members of the class are referenced. (Taken from the C# spec, using CLI spec here would be more accurate, but the end result is the same.^*)

What this means is that without beforefieldinit, the compiler is very confined about when to call the type initializer, it can't decide to call it a bit earlier, even if doing that would be more convenient (and resulted in faster code).

Knowing this, we can look at what's actually happening in your code. The problematic cases are static methods, because that's where the type initializer might be called. (Instance constructor is another one, but you're not measuring that.)

I focused on the method s_1_Test(). And because I don't actually need it to do anything, I simplified it (to make the generated native code shorter) to:

public static object s_1_Test<T>(TestClass<pR> class1, pR local1) where T:IConvertible
{
    return null;
}

Now, let's look at the disassembly in VS (in Release mode), first without the static constructor, that is with beforefieldinit:

00000000  xor         eax,eax
00000002  ret

Here, the result is set to 0 (it's done in somewhat obfuscated manner for performance reasons) and the method returns, very simple.

What happens with static the static constructor (i.e. without beforefieldinit)?

00000000  sub         rsp,28h
00000004  mov         rdx,rcx
00000007  xor         ecx,ecx
00000009  call        000000005F8213A0
0000000e  xor         eax,eax
00000010  add         rsp,28h
00000014  ret

This is much more complicated, the real problem is the call instruction, which presumably calls a function that invokes the type initializer if necessary.

I believe this the source of the performance difference between the two situations.

The reason why the added check is necessary is because your type is generic and you're using it with a reference type as a type parameter. In that case, the JITted code for different generic versions of your class is shared, but the type initializer has to be called for each generic version. Moving the static methods to another, non-generic type would be one way to solve the issue.

^* Unless you do something crazy like calling instance method on null using call (and not callvirt, which throws for null).

Manama answered 29/1, 2013 at 16:3 Comment(6)

Isn't JITed x86 code always shared between AppDomains? If yes the cctor check must always be present. – Bogor 29/1, 2013 at 16:34

@Bogor Not according to MSDN: “If an assembly is not loaded domain-neutral, it must be JIT-compiled in every application domain in which it is loaded.” – Manama 29/1, 2013 at 16:36

It's JIT perf bug, then. A big benefit could be derived from a small amount of work for the JIT team. – Bogor 29/1, 2013 at 16:48

@Bogor I figured it out: it's because of another kind of sharing: between different versions of generic type. (E.g. TestClassProxy<object> and TestClassProxy<string> share JIT, but have to be initialized separately.) So it's actually not a perf bug. – Manama 29/1, 2013 at 18:53

Right the CLR shares code between instances of Ref type code, but clearly it stores different static variables for each class type [e.g. MyClass<string>.SomeProp has different storage than MyClass<Object>.SomeProp. Why doesn't the was classed inited value also get its own storage for ref types? – Fielder 29/1, 2013 at 18:59

@MichaelB It does (I assume), but there has to be some code to check it. And I think that's what the called function does. – Manama 29/1, 2013 at 19:4

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags