Determine whether .NET assemblies were built from the same source
Asked Answered
C

7

30

Does anyone know of a way to compare two .NET assemblies to determine whether they were built from the "same" source files?

I am aware that there are some differencing utilities available, such as the plugin for Reflector, but I am not interested in viewing differences in a GUI, I just want an automated way to compare a collection of binaries to see whether they were built from the same (or equivalent) source files. I understand that multiple different source files could produce the same IL, and realise that the process would only be sensitive to differences in the IL, not the original source.

The main obstacle to just comparing the byte streams for the two assemblies is that .NET includes a field called "MVID" (Module Version Identifier) the assembly. This appears to have a different value for every compilation, so if you build the same code twice the assembly will be different.

A related question is, does anyone know how to force the MVID to be the same for each compilation? This would avoid us needing to have a comparison process that is insensitive to differences in the value of the MVID. A consistent MVID would be preferable, as this means that standard checksums could be used.

The background behind this is that a third-party company is responsible for independently reviewing and signing off our releases, prior to us being permitted to release to Production. This includes reviewing the source code. They want to independently confirm that the source code we give them matches the binaries that we earlier built, tested and currently plan to deploy. We are looking for a process that allows them to independently build the system from the source we supply them with, and the compare the checksums against the checksums for the binaries we have tested.

BTW. Please note that we are using continuous integration, automated builds, source control etc. The issue is not related to an internal lack of control over what source files went into a given build. The issue is that a third party is responsible for verifying that the source we give them produces the same binaries that we have tested and plan to put into Production. They should not be trusting any of our internal systems or controls, including the build server or the source code control system. All they care about is getting the source associated with the build, performing the build themselves, and verifying that the outputs match what we say we are deploying.

The runtime speed of the comparison solution is not particularly important.

thanks

Cubby answered 31/5, 2010 at 0:36 Comment(4)
If the ONLY difference is the MVID, surely it would always appear at the same position in the byte stream and you could have your difference algorithm ignore those byte positions?Sweven
Yes, that's correct, but I would need to know the structure of the file to in order to ignore this field. Do you know of a reference on the format?Cubby
Is that even possible? Couldn't different source code (C#, VB.NET, whatever) result in the same binary (or IL code for that matter)? It might not make a functional difference then, but would be still a difference. EDIT: Whoops, sorry. Just saw now that they rebuild and then compare the binaries.Aubarta
They have the source, and they will build it, and the files will be almost identical... so I fail to see why they have to compare the presumably identical files to use the one provided by you rather than the version they've built.Thaumaturgy
C
10

It's not too painful to use command-line tools to filter out MVID and date-time stamps from a text representation of the IL. Suppose file1.exe and file2.exe are built from the same sources:

c:\temp> ildasm /all /text file1.exe | find /v "Time-date stamp:" | find /v "MVID" > file1.txt

c:\temp> ildasm /all /text file2.exe | find /v "Time-date stamp:" | find /v "MVID" > file2.txt

c:\temp> fc file1.txt file2.txt

Comparing files file1.txt and FILE2.TXT

FC: no differences encountered

Canberra answered 7/7, 2010 at 13:51 Comment(2)
I don't think this is entirely the most robust method for reasons I can't yet discern the root cause of. To discover this I basically built my source, and copied the deploy folder where everything is copied. Then I deleted the contents of the deploy folder and rebuilt the source. I generated the disassembly texts using your technique but found differences between the two beyond all the filtering options you and others provide.Layamon
... It looks like certain GUIDs are updating. ".field /*04000027*/ static assembly valuetype '<PrivateImplementationDetails>{A310135E-980F-48EA-A97F-FB0E9C30EA63}'/*0200000F*//'__StaticArrayInitTypeSize=6'/*02000010*/ '$$method0x600001d-1' at I_00002CE0" Our build is somewhat complicated, merging CLI C++ interop with .NET and C# and spans some 60+ projects. It's unfortunate there's no way to fix the IDs used in the generation.Layamon
M
9

When comparing class libraries with ILDasm v4.0.319.1, it seems that image base is not initialized. To avoid mismatches, use a revised solution:

ildasm /all /text assembly.dll
| find /v "// Time-date stamp:"
| find /v "// MVID:"
| find /v "// Checksum:"
| find /v "// Image base:"
> assembly.dasm

Entry point (image base) is actually interesting information for executable assemblies, and will have to be verified carefully. Injecting a new image base is a common way to make a program do something entirely else. In my case, I am trying to verify consistency of multi-threaded builds, so it's safe to skip over the entry point.

A note on performance: I took an 8MB DLL that was built for AnyCPU, and ran ILDasm. Resulting file was 251MB in size and took several minutes to make. Roughly 32x the size was produced.

Mcmahon answered 22/11, 2011 at 15:56 Comment(0)
O
8

I have used the solution of Jerry Currry on .Net 4 assemblies and found out that there is now a third item that will vary on each build: Checksum. Isn't it surprising to find a checksum inside an assembly? I think that adding the checksum of a file inside that file will change the checksum...

Anyway, the modified command is:

ildasm /all /text "assembly.dll"
| find /v "// Time-date stamp:"
| find /v "// MVID:"
| find /v "// Checksum:"
> assembly.dasm

Note that I have also changed the search strings a bit by adding the slashes, in order to avoid unintentional matches. The lines of this command should be run together on the same line, split for readability. File names will need double quotes around them if they contain spaces.

Ought answered 19/5, 2011 at 8:1 Comment(0)
T
3

There are a few ways to do this depending on the amount of work you're willing to do and the importance of performance and/or accuracy. One way as Eric J. pointed is to compare the assemblies in binary, excluding the parts that change on every compilation. This solution is easy and fast but could give you a lot of false negatives. One better way is to drill down by using reflection. If performance is critical you can start by comparing the types and if they match go to member definitions. After checking type and member definitions and if everything is equal to that point you can go further by examining the actual IL of each method by getting it through GetILAsByteArray method. Again you're going to find differences even if everything is the same but compiled with a little bit different flags or different version of the compiler. I'd say that the best solution is to use a continuous integration tools that tags the build with the changeset number of your source control (you are using one, right?).

A related article

Thaumaturgy answered 31/5, 2010 at 1:3 Comment(1)
(Q. edited to include additional detail) You and Eric J are correct regarding ignoring the variant portion of the file. This is simple if the format is documented, but I've not found a ref yet. Do you know of one? Regarding using reflection, we are inclined towards the simplest solution, because the external party will need to understand and test the utility. If it's provided by the dev team, there will be greater suspicion of it than if the software were provided by a fourth party. Ignoring a few bytes in the file will be simpler than using reflection.Cubby
P
3

you can use MonoCecil and give it a small modification to get the problem solved. I did it, you can read how over here: http://groups.google.com/group/mono-cecil/browse_thread/thread/6ab42df05daa3a/49e8b3b279850f13#49e8b3b279850f13

Regards Florian

Pathe answered 18/6, 2011 at 22:2 Comment(0)
N
1

You can use the Reflector Diff AddIn here.

Notion answered 13/3, 2014 at 8:12 Comment(0)
W
0

Another solution to consider:

Source code information is stored when binaries are compiled in debug mode. Then you can check if pdb matches exe and if pdb lines matches source code.

Wherever answered 18/11, 2010 at 7:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.