Why is the binary output not equal when compiling again?
Asked Answered
P

4

37

I'm using a build script to compile several C# projects. The binary output is copied to a result folder, overwriting the previous version of the files, and then added/committed to subversion.

I noticed that the binary output of the compilation are different even when there was no change to the source or environment at all. How is this possible? Isn't the binary result supposed to be exactly equal for the same input?

I'm not intentionally using any kind of special timestamps anywhere, but does the compiler (Microsoft, the one included in .NET 4.0) possibly add timestamps itself?

The reason I'm asking is I'm committing the output to subversion, and due to the way our build server works the checked in changes trigger a rebuild, causing the once again modified binary files to be checked in in a circle.

Popp answered 19/1, 2012 at 14:15 Comment(4)
subversion-ing both source and binaries sounds redundant to me, wouldn't you be better off not keeping sources only under subversion ? you could try aggregating assemblies as needed via solutions, avoiding the need to version build outputs (i do something similar, under a sourcesafe environment)Mossberg
@alex Due to the vast size of the project and how our teams work, this isn't easy in my case, but I'll definitely try to walk in that direction.Popp
I created a request to MS, please upvote: visualstudio.uservoice.com/forums/121579-visual-studio-2015/…Jairia
Alex Nolasco's answer contains link to documentation on deterministic builds. What more do you need?Milagro
D
34

ANOTHER UPDATE:

Since 2015 the compiler team has been making an effort to get sources of non-determinism out of the compiler toolchain, so that identical inputs really do produce identical outputs. See the "Concept-determinism" tag on the Roslyn github for more details.


UPDATE: This question was the subject of my blog in May 2012. Thanks for the great question!


How is this possible?

Very easily.

Isn't the binary result supposed to be exactly equal for the same input?

Absolutely not. The opposite is true. Every time you run the compiler you should get a different output. Otherwise how could you know that you'd recompiled?

The C# compiler embeds a freshly generated GUID in an assembly on every compilation, thereby guaranteeing that no two compilations produce exactly the same result.

Moreover -- even without the GUID, the compiler makes no guarantees whatsoever that two "identical" compilations will produce the same results.

In particular, the order in which the metadata tables are populated is highly dependent on details of the file system; the C# compiler starts generating metadata in the order in which the files are given to it, and that can be subtly changed by a variety of factors.

due to the way our build server works the checked in changes trigger a rebuild, causing the once again modified binary files to be checked in in a circle.

I'd fix that if I were you.

Despumate answered 19/1, 2012 at 14:30 Comment(10)
I remember gcc producing identical binaries (not sure if it is guaranteed), so the .NET behavior surprised me. It makes sense though.Popp
@mafutrct: People are indeed sometimes surprised by this. For example, the government agencies that review the code that goes into gambling machines have the expectation that they should be able to get the source code and the binary from the vendor, and recompile the source code themselves and get an identical binary, as a "proof" that the binary and the sources match. Unfortunately for them, proof that a binary matches its sources is not a service that the C# team has ever claimed to provide, and so they are having to find another solution.Despumate
@EricLippert: That's very interesting. I was not able to find any information on your example via web search. Is there an article online somewhere about that?Woodson
@Brian: I'm not aware of any.Despumate
I've occasionally wondered whether the compiler preserves the GUID property of each Type that is emitted into the metadata... I've seen cases where the GUID changes between recompilations as well as cases where it stays the same.Villager
@Eric: regarding "how could you know you've recompiled" - you would see a new created/modified timestamp on the new DLLs. Another way is to clean and build as default. I think even if it is not the default, there should be some way to allow identical binary generation.Holton
@Eric: just to add, the Mvid could instead be defined as a hash of the code. Being defined in the CLI specification as generating a different binary in itself is a design decision and not a reason for not being able to generate identical binaries. If I feed in identical input to the same compiler on the same machine, there is no reason I could not design the CLI to produce identical binary output by defining the Mvid differently as cited above.Holton
Is it possible to add a feature to MSBuild that is similar to Resharper 10's "build only when necessary" feature? Basically, don't even build a project, even if the referenced DLLs were touched but their public signatures haven't changed.Jairia
Please upvote: visualstudio.uservoice.com/forums/121579-visual-studio-2015/…Jairia
This answer is now obsolete with the availability of the /deterministic switch.Holton
S
13

Yes, the compiler includes a timestamp. Additionally, in some cases the compiler will auto-increment the assembly version number. I haven't seen any guarantee anywhere that the binary result is meant to be identical.

(Note that if the source is already in Subversion, I'd generally steer clear of also adding the binary files in there. I'd usually only include releases of third-party libraries. It depends on exactly what you're doing though.)

Syncretize answered 19/1, 2012 at 14:16 Comment(3)
Is there an easy way to avoid that?Popp
The binary files are used as input for a different project that resides in the same repository. (Actually, a lot more projects.)Popp
You could make a copy of the binary files somewhere other than the output directory, link against the copy, and place the copy of the binary files (but not the output directory itself) in version control. Of course, if you do that you have to decide whose responsibility it is to update that binary folder. If the two projects are very closely related, you could also just add the project itself your solution file and then add a reference to the project. I think VS will probably handle that scenario intelligently.Woodson
P
9

As mentioned by others, the compiler does generate a distinct build hence the different result. What you are looking for is the ability to create deterministic builds and now this is included as part of the roslyn compiler.

Roslyn command line options

/deterministic Produce a deterministic assembly (including module version GUID and timestamp)

Read more about this feature https://github.com/dotnet/roslyn/blob/master/docs/compilers/Deterministic%20Inputs.md

Pegues answered 23/6, 2017 at 21:24 Comment(2)
I see this is available for MSBuild with VS2015. Can this also be done with devenv.com? I can't seem to find a switch. The reason I ask is MSBuild does not support extensions like Installer Projects but devenv command line does.Holton
Found you can via adding a property group to .csproj <Project> <PropertyGroup> <Deterministic>true</Deterministic> </PropertyGroup> </Project> as per gist.github.com/aelij/b20271f4bd0ab1298e49068b388b54aeHolton
C
2

As far as I know, only MS binaries are different on every compile. 20 or so years ago, it wasn't like that. The MS binaries were the same after every compile (assuming the source code was the same).

Carnallite answered 17/2, 2016 at 3:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.