How did Microsoft create assemblies that have circular references?
Asked Answered
D

9

111

In the .NET BCL there are circular references between:

  • System.dll and System.Xml.dll
  • System.dll and System.Configuration.dll
  • System.Xml.dll and System.Configuration.dll

Here's a screenshot from .NET Reflector that shows what I mean:

enter image description here

How Microsoft created these assemblies is a mystery to me. Is a special compilation process required to allow this? I imagine something interesting is going on here.

Decorate answered 22/8, 2009 at 17:26 Comment(7)
Very good question. I've never actually taken the time to inspect this, but I'm curious to know the answer. Indeed, it seems like Dykam has provided a sensible one.Imtiaz
why are those dll's not merged into one, if they all require each other? is there any practical reason for that?Florentinaflorentine
Interesting question... I'd like to know Eric Lippert's answer to this one ! And as Andreas said, I wonder why they didn't put everything in the same assembly...Marketing
Well if one assembly needs to get updated, they won't need to touch the other ones. Thats the only reason i see. Interesting question thoughMadame
Take a look at this presentation (asmmeta files): msakademik.net/academicdays2005/Serge_Lidin.pptDiarrhoea
@Andreas Petersson -- my guess is that assemblies are loaded lazily, so there's a chance that something using mscorlib might not necessarily use the configuration or XML APIs, in which case less memory is devoted to storing the IL.Decorate
@Mehrdad -- the link you pointed to is gone, but this has it: web.archive.org/web/20100806233100/http://www.msakademik.net/…Asturias
E
60

I can only tell how the Mono Project does this. The theorem is quite simple, though it gives a code mess.

They first compile System.Configuration.dll, without the part needing the reference to System.Xml.dll. After this, they compile System.Xml.dll the normal way. Now comes the magic. They recompile System.configuration.dll, with the part needing the reference to System.Xml.dll. Now there's a successful compilation with the circular reference.

In short:

  • A is compiled without the code needing B and the reference to B.
  • B is compiled.
  • A is recompiled.
Eldreda answered 22/8, 2009 at 17:43 Comment(2)
It's blocked by Visual Studio, but can be done using the command line compiler (csc.exe) directly. See my answer.Hypoglossal
I know. Mono's main build system isn't Visual Studio. Guess Microsofts isn't either.Eldreda
D
36

RBarryYoung and Dykam are onto something. Microsoft uses internal tool which uses ILDASM to disassemble assemblies, strip all internal/private stuff and method bodies and recompile IL again (using ILASM) into what is called 'dehydrated assembly' or metadata assembly. This is done every time public interface of assembly is changed.

During the build, metadata assemblies are used instead of real ones. That way cycle is broken.

Dossier answered 22/8, 2009 at 17:58 Comment(8)
Interesting answer, do you have any links?Masonite
I am trying to find external reference to the tool. I do not think it is published outside Microsoft, but concept is simple: disassemble-strip internals-reassemble.Dossier
Agreed - interesting answer. Some links to back this up would be good.Decorate
Yes, that's indeed the way it is done (from personal experience).Sutherlan
If they would indeed strip all internal and private stuff, it would not be possible to decompile it, see what the private classes are, what others they reference etc. This is however very well possible with the exception of methods decorated with the attribute MethodImpl(MethodImplOptions.InternalCall), which are only a few. Perhpaps I misunderstood your point. A link that backs up your story could be helpful, too.Mowbray
True. We do not need internal and private stuff in assembly A to be able to reference its publics when compiling assembly B. You point about InternalCall methods stands, but is not relevant to this discussion. Sorry, but there is no link. In my knowledge, this is not documented publicly.Dossier
But these are strongly signed assemblies. These dehydrated metadata-only assemblies would have a different cryptographic hashes. Perhaps the compiler doesn't check this, instead it's a runtime concept only. Or maybe it's a special compiler too.Decorate
They are not strongly signed until after build (they are delay signed), so dehydrated assemblies are not signed.Dossier
H
27

It can be done the way Dykam described but Visual Studio blocks you from doing it.

You'll have to use the command-line compiler csc.exe directly.

  1. csc /target:library ClassA.cs

  2. csc /target:library ClassB.cs /reference:ClassA.dll

  3. csc /target:library ClassA.cs ClassC.cs /reference:ClassB.dll


//ClassA.cs
namespace CircularA {
    public class ClassA {
    }
}


//ClassB.cs
using CircularA;
namespace CircularB {
    public class ClassB : ClassA  {
    }
}


//ClassC.cs
namespace CircularA {
    class ClassC : ClassB {
    }
}
Hypoglossal answered 22/8, 2009 at 17:50 Comment(5)
You can do this in Visual studio too though it's quite harsh to too, the basic way is to use #if's and remove the reference using the solution explorer, reversing that in the third step. A other way i'm thinking of is a third project file including the same files but different references. This would work as you can specify the build order.Eldreda
As far as i know, can't test it here.Eldreda
I would really like to see that. From what I experimented here, the moment you try to Add Reference, the IDE stops you.Hypoglossal
I know. But a third project not having that reference AND the #if symbol, and be referenced by the second, which is referenced by the first. No cycle. But the third uses the code of the first and outputs to the first assembly location. an assembly can easily be replaced by another with the same specs. But I think strongnaming can cause a problem in this method.Eldreda
It's a little like Srdjan's answer, though a different method.Eldreda
L
19

Its pretty easy to do in Visual Studio as long as you don't use project references... Try this:

  1. Open visual studio
  2. Create 2 Class Library projects "ClassLibrary1" & "ClassLibrary2".
  3. Build
  4. From ClassLibrary1 add a reference to ClassLibrary2 by browsing to the dll created in step 3.
  5. From ClassLibrary2 add a reference to ClassLibrary1 by browsing to the dll created in step 3.
  6. Build again (Note: if you make changes in both projects you would need to build twice to make both references "fresh")

So this is how you do it. But seriously... Don't you EVER do it in a real project! If you do, Santa wont bring you any presents this year.

Lashaun answered 24/8, 2009 at 3:7 Comment(1)
The only exception being if it is between December 26-31 and the presents have already been securedHaletta
A
6

I guess it could be done by starting with an acyclic set of assemblies and using ILMerge to then coalesce the smaller assemblies into logically related groups.

Aldarcie answered 22/8, 2009 at 17:29 Comment(0)
G
4

Well, I've never done it on Windows, but I have done it on a lot of the compile-link-rtl environments that served as the practical progenitors for it. What you do is first make stub "targets" without the cross-references then link, then add the circular references, then re-link. The linkers generally do not care about circular refs or following ref chains, they only care about being able to resolve each reference on it's own.

So if you have two libraries, A and B that need to reference each other, try something like this:

  1. Link A without any refs to B.
  2. Link B with refs to A.
  3. Link A, adding in the refs to B.

Dykam makes a good point, It's compile, not link in .Net, but the principle remains the same: Make your cross-referenced sources, with their exported entry points, but with all but one of them having their own references to the others stubbed out. Build them like that. Then, unstub the external references and rebuild them. This should work even without any special tools, in fact, this approach has worked on every operating system that I have ever tried it on (about 6 of them). Though obviously something that automates it would be a big help.

Girardo answered 22/8, 2009 at 17:44 Comment(2)
the theorem is right. However in the .Net world, linking is done dynamic and not a problem. It's the compilation step where this solution is needed.Eldreda
Sorry to fix you again :P. But the referencing(linking) at compile time happens in the .Net world, which is everything which is derived from that specific ECMA spec. Thus Mono, dotGnu and .Net. Not Windows itself.Eldreda
H
1

One possible approach is to use conditional compilation (#if) to first compile a System.dll that doesn't depend on those other assemblies, then compile the other assemblies, and at last recompile System.dll to include the parts depending on Xml and Configuration.

Harragan answered 22/8, 2009 at 17:42 Comment(2)
Unfortunately this doesn't allow you to conditionally reference an assembly (I wish it was possible, it would really help in one of my projects...)Marketing
Conditional references can be easily done by editing the .csproj file. Just add a Condition attribute to the <Reference> element.Harragan
M
0

Technically, it's possible that these were not compiled at all, and assembled by hand. These are low level libraries, after all.

Maxwellmaxy answered 6/10, 2009 at 17:2 Comment(1)
Not really. There isn't many low level stuff in it , only basic. What made you think it would be low level? The runtime and corlib is low level. Relatively. Still plain C or C++, thought the JIT contains low level stuff.Eldreda
T
0

Agreed. asmmeta.exe is like ildasm, but omits all the IL (just ret) and some privates, though sometimes privates are needed like for struct sizes.

The more general idea is that of a multi-pass build, which Microsoft has relied on heavily forever.

The stripped down ildasm output can be thought as as "header" file, in a system that does not really have them.

First visit each directory (with massive parallelism!) running ilasm. Then visit each directory (again with massive parallelism) running csc. After csc, in the same pass, run the like-ildasm tool, output back the original "headers". Compare them. If there are any mismatches, the build is broken. A developer failed to update the header. It is too late to just patch it up, without restarting the build (perhaps with a proper dependency graph, most directories will not be affected).

This is also a way to upgrade versions easily. The like-ilasm code can have names for version numbers. Though this is really a minor outcome of a multi-pass build.

Tenet answered 4/11, 2021 at 1:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.