Multiple types in one dynamic assembly is way slower than multiple dynamic assemblies with one type each
Asked Answered
D

2

19

So I'm emitting some dynamic proxies via DefineDynamicAssembly, and while testing I found that:

  • One type per dynamic assembly: fast, but uses a lot of memory
  • All types in one dynamic assembly: very very slow, but uses much less memory

In my test I generate 10,000 types, and the one-type-per-assembly code runs about 8-10 times faster. The memory usage is completely in line with what I expected, but how come the time to generate the types is that much longer?

Edit: Added some sample code.

One assembly:

var an = new AssemblyName( "Foo" );
var ab = AppDomain.CurrentDomain.DefineDynamicAssembly( an, AssemblyBuilderAccess.Run );
var mb = ab.DefineDynamicModule( "Bar" );

for( int i = 0; i < 10000; i++ )
{                
    var tb = mb.DefineType( "Baz" + i.ToString( "000" ) );
    var met = tb.DefineMethod( "Qux", MethodAttributes.Public );
    met.SetReturnType( typeof( int ) );

    var ilg = met.GetILGenerator();
    ilg.Emit( OpCodes.Ldc_I4, 4711 );
    ilg.Emit( OpCodes.Ret );

    tb.CreateType();
}

One assembly per type:

 for( int i = 0; i < 10000; i++ )
 {
    var an = new AssemblyName( "Foo" );
    var ab = AppDomain.CurrentDomain.DefineDynamicAssembly( an,
                                                            AssemblyBuilderAccess.Run );
    var mb = ab.DefineDynamicModule( "Bar" );

    var tb = mb.DefineType( "Baz" + i.ToString( "000" ) );
    var met = tb.DefineMethod( "Qux", MethodAttributes.Public );
    met.SetReturnType( typeof( int ) );

    var ilg = met.GetILGenerator();
    ilg.Emit( OpCodes.Ldc_I4, 4711 );
    ilg.Emit( OpCodes.Ret );

    tb.CreateType();
}
Dictate answered 14/11, 2017 at 21:48 Comment(1)
Can you share some details about your build environment and .Net framework? I tried this on an old version (.Net 3.5 release build) and I got the following times: Time for one assembly: 03.6230775; Time for many assemblies: 02.7712719. A much smaller difference than what you are seeing. However, the first time I ran I saw a much larger difference - closer to yours in fact - suggesting something with JIT profiling might be going on.Kimbra
J
8

On my PC in LINQPad using C# 7.0 I get one assembly about 8.8 seconds, one assembly per type about 2.6 seconds. Most of the time in the one assembly is in DefineType and CreateType whereas in the time is mainly in DefineDynamicAssembly+DefineDynamicModule.

DefineType checks there is no name conflicts, which is a Dictionary lookup. If the Dictionary is empty, this is about a check for null.

The majority of the time is spent in CreateType, but I don't see where, however it appears that it requires extra time adding types to a single Module.

Creating multiple modules slows the whole process down, but most of the time is spent creating modules and in DefineType, which has to scan every module for a duplicate, so now has increasing up to 10,000 null checks. With a unique module per type, CreateType is very fast.

Jari answered 15/11, 2017 at 0:16 Comment(3)
Hmm, I tried one assembly+ multiple modules with one type each, it is a bit faster, but uses the same amount of memory as the one type per assembly way. The dictionary stuff seems unlikely to cause all this, but I tried clearing it's internal dictionary after each type, no luck.Dictate
What internal dictionary do you mean? If you are doing multiple types in one assembly, the dictionary is needed to track the types created.Jari
Just an idea I got from #2504145, though pretty pointless as what was a list is now a dictionary anyway...Dictate
V
8

In my checks about why defining multiple modules in one assembly is slower than to create a new assembly with one module, using these pieces of code:

Single-Assembly Scenario:

        var an = new AssemblyName("Foo");
        var ab = AppDomain.CurrentDomain.DefineDynamicAssembly(an, AssemblyBuilderAccess.Run);
        for (int i = 0; i < 10000; i++)
        {
            ab.DefineDynamicModule("Bar" + i.ToString("000"));
        }

Multi-Assembly Scenario:

        var an = new AssemblyName("Foo");
        for (int i = 0; i < 10000; i++)
        {
            var ab = AppDomain.CurrentDomain.DefineDynamicAssembly(an, AssemblyBuilderAccess.Run);
            ab.DefineDynamicModule("Bar");
        }
  1. I found that around 20% (50% in multiple assemblies example) of the times, the underlying code goes through all module names to check for any conflict. This part is understandable and expected.
  2. When using one assembly, another 60%-80% of the times, CLI's DefineDynamicModule() is under pressure. However, when using multiple assemblies, this method never gets called; instead, other methods are responsible for the remaining 50%.

Let's go deeper inside the ECMA-335 documentation for CLI.

II.6 An assembly is a set of one or more files deployed as a unit.

Page 140

So we understand now that an assembly is essentially a package and modules are the main components. That being said:

II.6 A module is a single file containing executable content in the format specified here. If the module contains a manifest then it also specifies the modules (including itself) that constitute the assembly. An assembly shall contain only one manifest amongst all its constituent files.

Page 140

Based on this information, we know that when we create the assembly, we automatically add one module to the assembly as well. This is why we never get a hit on the CLI's DefineDynamicModule() function if we keep creating new assemblies. Instead, we get a hit on the CLI's GetInMemoryAssemblyModule() method to retrieve information about the Manifest Module (the module that is created automatically).

So here we have a little performance gain; with one assembly, we get 10001 modules, but with multiple assemblies, we get a total of 10000 modules. Not much though, so this one extra module should not be the main reason behind this.

II.6.5 When an item is in the current assembly, but is part of a module other than the one containing the manifest, the defining module shall be declared in the manifest of the assembly using the .module extern directive.

Page 146

and

II.6.7 The manifest module, of which there can only be one per assembly, includes the .assembly directive. To export a type defined in any other module of an assembly requires an entry in the assembly’s manifest.

Page 146

Therefore, each time you create a new module, you are actually adding a new file to an archive, and then modifying the first file of the archive to reference the new module. Essentially in the single-assembly code, we are adding 10000 modules, and then we edit the first module 10000 times. This isn't the case with the multi-assembly code in which we only edit the first automatically generated module, 10000 times.

This is the overhead we see. And it increases exponentially on my system.

(5000 = 1.5s, 10000 = 6s, 20000 = 25s)

With your code, however, the bottleneck is the unmanaged CLR's SetMethodIL function called from the CreateTypeNoLock.CreateTypeNoLock() method and I couldn't find anything in the documentation about this, yet.

Unfortunately, though, it is hard to decompile and understand CLR.dll to see what actually happens there and as the result, we are just making guesses based on the public published information by Microsoft at this stage.

Verada answered 22/11, 2017 at 23:20 Comment(4)
Nicely investigated! However, in my single assembly only test, I'm just creating one module, not one for each type... I did try doing one assembly with multiple modules too, and it's a bit faster, but still way slower than one type per assembly.Dictate
@Chris, Yeah, I mentioned that The bottleneck is the SetMethodIL supposedly called for the type constructor. I don't know why yet this slows things down.Verada
Does the time SetMethodIL takes change depending on the scenario? Seems like that should be unrelated unless something funky is going on...Dictate
Long story, I need to update the post. Give me an hour or something.Verada

© 2022 - 2024 — McMap. All rights reserved.