In my checks about why defining multiple modules in one assembly is slower than to create a new assembly with one module, using these pieces of code:
Single-Assembly Scenario:
var an = new AssemblyName("Foo");
var ab = AppDomain.CurrentDomain.DefineDynamicAssembly(an, AssemblyBuilderAccess.Run);
for (int i = 0; i < 10000; i++)
{
ab.DefineDynamicModule("Bar" + i.ToString("000"));
}
Multi-Assembly Scenario:
var an = new AssemblyName("Foo");
for (int i = 0; i < 10000; i++)
{
var ab = AppDomain.CurrentDomain.DefineDynamicAssembly(an, AssemblyBuilderAccess.Run);
ab.DefineDynamicModule("Bar");
}
- I found that around 20% (50% in multiple assemblies example) of the times, the underlying code goes through all module names to check for any conflict. This part is understandable and expected.
- When using one assembly, another 60%-80% of the times, CLI's
DefineDynamicModule()
is under pressure. However, when using multiple assemblies, this method never gets called; instead, other methods are responsible for the remaining 50%.
Let's go deeper inside the ECMA-335 documentation for CLI.
II.6 An assembly is a set of one or more files deployed as a unit.
Page 140
So we understand now that an assembly is essentially a package and modules are the main components. That being said:
II.6 A module is a single file containing executable content in the format specified here. If the module contains a manifest then it also specifies the modules (including itself) that constitute the assembly. An assembly shall contain only one manifest amongst all its constituent files.
Page 140
Based on this information, we know that when we create the assembly, we automatically add one module to the assembly as well. This is why we never get a hit on the CLI's DefineDynamicModule()
function if we keep creating new assemblies. Instead, we get a hit on the CLI's GetInMemoryAssemblyModule()
method to retrieve information about the Manifest Module (the module that is created automatically).
So here we have a little performance gain; with one assembly, we get 10001 modules, but with multiple assemblies, we get a total of 10000 modules. Not much though, so this one extra module should not be the main reason behind this.
II.6.5 When an item is in the current assembly, but is part of a module other than the one containing the manifest, the defining module shall be declared in the manifest of the assembly using the .module extern directive.
Page 146
and
II.6.7 The manifest module, of which there can only be one per assembly, includes the .assembly directive. To export a type defined in any other module of an assembly requires an entry in the assembly’s manifest.
Page 146
Therefore, each time you create a new module, you are actually adding a new file to an archive, and then modifying the first file of the archive to reference the new module. Essentially in the single-assembly code, we are adding 10000 modules, and then we edit the first module 10000 times. This isn't the case with the multi-assembly code in which we only edit the first automatically generated module, 10000 times.
This is the overhead we see. And it increases exponentially on my system.
(5000 = 1.5s, 10000 = 6s, 20000 = 25s)
With your code, however, the bottleneck is the unmanaged CLR's SetMethodIL
function called from the CreateTypeNoLock.CreateTypeNoLock()
method and I couldn't find anything in the documentation about this, yet.
Unfortunately, though, it is hard to decompile and understand CLR.dll to see what actually happens there and as the result, we are just making guesses based on the public published information by Microsoft at this stage.