Alternative to binaries in Subversion
Asked Answered
O

15

23

Some of my colleagues are convinced that committing build artefacts to the subversion repository is a good idea. The argument is that this way, installation and update on the test machines is easy - just "svn up"!

I'm sure there are weighty arguments against this bad practice, but all I can think of are lame ones like "it takes up more room". What are the best, killer reasons to not do this? And what other approaches should we do instead?

This is for Java code if that makes a difference. Everything is compiled from Eclipse (with no automated PDE builds).

When I say add the build artifacts, I mean a commit would look like this:

"Added the new Whizbang feature"

 M src/foo/bar/Foo.java
 M bin/Foo.jar

Each code change has the corresponding generated jar file.

Om answered 2/4, 2009 at 11:43 Comment(6)
Why the down votes? It's a common issue - I'm not saying it's a good idea!Om
Wow! It's starting to look like "binaries in subversion" is actually not considered an anti-pattern! This is quite a shock to me.Om
Not sure, but doesn't it often result in conflicts? If I build the jar myself and then update, it's often hard to 'merge' the too. Maybe put them in a special dir where people that build the jars themselves won't put them.Chevaldefrise
you never merge the binaries, you store the latest version only and don't try to think of them as mergeable. You have the same issue with all other binary sources too.Paterfamilias
Merge I guess here would be when you change the build output locally (by compiling stuff) and then "svn up", you will merge your colleagues changes into your own and it will give a conflict almost certainly.Om
"weighty arguments against this bad practice, but all I can think of are lame ones" I'm slowly realizing that if you can't give any good reasons yourself, well then it actually might be just fine, even if it doesn't "feel" right, it might be that feeling that is wrongChromite
O
24

In my opinion the code repository should only contain source code as well as third party libraries required to compile this source code (also the third party libraries might be retrieved with some dependency management tool during the build process). The resulting binaries should not get checked in along with the source code.

I think the problem in your case is that you don't have proper build scripts in place. That's why building a binary from the sources involves some work like starting up eclipse, importing the project, adjusting classpathes, etc...

If there are build scripts in place, getting the binaries can be done with a command like:

svn update; ant dist

I think the most important reason not to checkin the binaries along with the source is the resulting size of your repository. This will cause:

  • Larger repository and maybe too few space on versioning system server
  • Lots of traffic between versioning system server and the clients
  • Longer update times (imagine you do an SVN update from the internet...)

Another reason might be:

  • Source code is easily comparable, so lots of the features of a versioning system do make sense. But you can't easily compare binaries...

Also your approach as described above introduces a lot of overhead in my opinion. What if a developer forgets to update a corresponding jar file?

Octans answered 2/4, 2009 at 11:58 Comment(6)
Excellently formed argument. You hit the nail on the head with the svn up; ant dist comment.Om
I completely disagree with the "svn update; ant dist" comment. If I put a binary through an expensive testing process, that's exactly the binary I want to deploy onto my servers, not something that's probably similar. Binaries do need to be archived. Maybe not in the repository, but archived.Jovi
Your answer doesn't take into account that VCS is a time machine. In 5 or 10 years, nothing guarantees that the source will compile cleanly with current to date compilers, not even that the same tested binary will be generated.Neglect
@Jim T: Yes, I agree that binaries should get archived, but not together with the source... what I wanted to show with my post was that it is easily possible to reproduce the artifacts if you have a proper build script in placeOctans
@Juliano: But checking in the resulting binaries alone does not solve that problem. What you would have to do here is to checkin the compiler and all required products to translate your source...Octans
@Juliano: but are you going to version a virtual machine image of your build environment? What about the VM software itself? etc.Glyph
P
16

Firstly, Subversion (and all others nowadays) are not source code control managers (I always thought SCM means Software Configuration Management), but version control systems. That means they store changes to the stuff you store in them, it doesn't have to be source code, it could be image files, bitmap resources, configuration files (text or xml), all kinds of stuff. There's only 1 reason why built binaries shouldn't be considered as part of this list, and that's because you can rebuild them.

However, think why you would want to store the released binaries in there as well.

Firstly, its a system to assist you, not to tell you how you should build your applications. Make the computer work for you, instead of against you. So what if storing binaries takes up space - you have hundreds of gigabytes of disk space and super fast networks. Its not a big deal to store binary objects in there anymore (whereas ten years ago it might have been a problem - this is perhaps why people think of binaries in SCM as a bad practice).

Secondly, as a developer, you might be comfortable with using the system to rebuild any version of an application, but the others who might use it (eg qa, test, support) might not. This means you'd need an alternative system to store the binaries, and really, you already have such a system, its your SCM! Make use of it.

Thirdly, you assume that you can rebuild from source. Obviously you store all the source code in there, but you don't store the compiler, the libraries, the sdks, and all the other dependant bits that are required. What happens when someone comes along and asks "can you build me the version we shipped 2 years ago, a customer has a problem with that version". 2 years is an eternity nowadays, do you even have the same compiler you used back then? What happens when you check all the source out only to find that the newly updated sdk is incompatible with your source and fails with errors? Do you wipe your development box and reinstall all the dependencies just to build this app? Can you even remember what all the dependencies were?!

The last point is the big one, to save a few k of disk space, you might cost yourself days if not weeks of pain. (And Sod's law also says that whichever app you need to rebuild will be the one that required the most obscure, difficult to set up dependency you were ever glad to get rid of)

So store the binaries in your SCM, don't worry over trivialities.

PS. we stick all binaries in their own 'release' directory per project, then when we want to update a machine, we use a special 'setup' project that consists of nothing but svn:externals. You export the setup project and you're done as it fetches the right things and puts them into the right directory structure.

Paterfamilias answered 2/4, 2009 at 13:8 Comment(4)
Nicely argued, even if it does go against my gut feeling. +1Om
Your 3rd point argues for putting the SDKs, compilers, etc in SCM, not (just) the build products. If you can't rebuild that source from 2 years ago then you already can't offer the customer a fix because you can't rebuild the product with altered source.Courcy
true, but generally you just need to get the binaries out again - possibly to test or re-ship. Once there, if you can't rebuild them for a fix, you're still in no worse a position than you were if you didn't store the binaries.Paterfamilias
Maybe binaries should be added only to tags/release/x.y/binaries. Since tags are not supposed to be changed, then binaries will fit nicely.Contributory
D
6

A continuous integration server like Hudson would have the ability to archive build artifacts. It doesn't help your argument with "why not" but at least it is an alternative.

Detrusion answered 2/4, 2009 at 11:48 Comment(1)
+1 interesting. Might be a better solution than putting binaries in tags, which I suggest in my own answer.Glyph
N
5

I'm sure there are weighty arguments against this bad practice

You have the wrong presumption that committing "build artifacts" to the version control is a bad idea (unless you wrongly phrased your question). It is not.

It is ok, and very important indeed, to keep what you call "build artifacts" in version control. More than that, you should also keep compilers and anything else used to transform the set of source files to a finished product.

In five years from now, you'll certainly be using different compilers and different build environments, that may happen to not be able to compile today's version of your project, for whatever reason. What could be a simple small change to fix a bug in a legacy version, will transform into a nightmare of porting that old software to current compilers and build tools, just to recompile a source file that had a one-line change.

So, there is no reason you should be so afraid of storing "build artifacts" in version control. What you may want to do is to keep them in separate places.

I suggest separating them like:

 ProjectName
 |--- /trunk
 |    |--- /build
 |    |    |--- /bin        <-- compilers go here
 |    |    |--- /lib        <-- libraries (*.dll, *.jar) go here
 |    |    '--- /object     <-- object files (*.class, *.jar) go here
 |    '--- /source          <-- sources (*.java) go here
 |         |--- package1    <-- sources (*.java) go here
 |         |--- package2    <-- sources (*.java) go here

You have to configure your IDE or your build scripts to place object files in /ProjectName/trunk/build/object (perhaps even recreating the directory structure under .../source).

This way, you give your users the option to checkout either /ProjectName/trunk to get the full building environment, or /ProjectName/trunk/source to get the source of the application.

In ../build/bin and ../build/lib you must place the compilers and libraries that were used to compile the final product, the ones used to ship the software to the user. In 5 or 10 years, you will have them there, available for your use in some eventuality.

Neglect answered 2/4, 2009 at 12:18 Comment(4)
I mean the output from builds, not the inputs (be them compilers or 3rd party libraries).Om
Ah, and not even well-known ticketed outputs, but everyday "single bug fix or feature" outputs.Om
Yes, it is important to keep the output of builds just like the inputs. At least of some milestones, if you want to save some space.Neglect
I think it would be a good idea to store the build environment for your sources somehow, maybe in a separate repository, because just storing compiler and libraries is not enough. Image you compiled using a legacy version of gcc, like 3.3, and kept only the compiler. Try to run that on a recent Linux distro :-)Ppm
H
5

"committing build artifacts to the subversion repository" can be a good idea if you know why.

It is a good idea for a release management purpose, more specifically for:

1/ Packaging issue

If a build artifact is not just an exe (or a dll or...), but also:

  • some configuration files
  • some scripts to start/stop/restart your artifact
  • some sql to update your database
  • some sources (compressed into a file) to facilitate debugging
  • some documentation (javadoc compressed in a file)

then it is a good idea to have a build artifact and all those associated files stored in a VCS.
(Because it is not anymore just a matter of "re-building" the artifact, but also of "retrieving" all those extra files that will make that artifact run)

2/ Deployment issue

Suppose you need to deploy many artifacts in different environment (test, homologation, pre-production, production).
If:

  • you produce many build artifacts
  • those artifacts are quite long to recreate from scratch

then having those artifacts in a VCS is a good idea, in order to avoid recreating them.
You can just query them from environment to environment.

But you need to remember:

  • 1/ you cannot store every artifacts you make in the VCS: all the intermediate build you make for continuous integration purpose must not be stored in the VCS (or you end up with a huge repository with many useless versions of the binaries).
    Only the versions needed for homologation and production purposes need to be referenced.
    For intermediate build, you need an external repository (maven or a shared directory) in order to publish/test quickly those builds.

  • 2/ you should not store them in the same Subversion Repository, since your development is committed (revision number) much more often than your significant builds (the ones deemed worthy of homologation and production deployment)
    That means the artifacts stored in that second repository must have a naming convention for the tag (or for a property) in order to easily retrieve the revision number of the development from which they have been built.

Hideout answered 2/4, 2009 at 15:3 Comment(0)
F
4

In my experience could storing of Jars in SVN end in a mess.
I think it is better to save the Jar-files in a Maven-Repository like Nexus.
This has also the advantages, that you can use a dependecy managing tool like Maven or Ivy.

Footlocker answered 2/4, 2009 at 12:8 Comment(0)
S
4

Binaries, especially your own, but also third party, have no place in a source control tool like SVN.

Ideally you should have a build scripts to build your own binaries (that can then be automated with one of the many fine automatic build tools that can check the source straight out of SVN).

For third party binaries you will need a dependency management tool like Maven2. You can then set up a local Maven repository to handle all third party binaries (or just rely on the public ones). The local repo can also manage your own binaries.

Smacking answered 2/4, 2009 at 12:13 Comment(0)
G
4

Putting the binaries in the trunk or branches is definitely overkill. Besides taking up space like you mention, it also leads to inconsistencies between source and binaries. When you refer to revision 1234, you don't want to wonder whether that means "the build resulting from the source at revision 1234" vs "the binaries in revision 1234". The same rule of avoiding inconsistencies applies to auto-generated code. You should not version what can be generated by the build.

OTOH I'm more or less OK with putting binaries in tags. This way it is easy for other projects to use the binaries of other projects via svn:externals, without needing to build all these dependencies. It also enables testers to easily switch between tags without needing a full build environment.

To get binaries in tags, you can use this procedure:

  1. check out a clean working copy
  2. run the build script and evaluate any test results
  3. if the build is OK, svn add the binaries
  4. instead of committing to the trunk or branch, tag directly from your working copy like this: svn copy myWorkingCopyFolder myTagURL
  5. discard the working copy to avoid accidental commits of binaries to the trunk or branch

We have a tagbuild script to semi-automate steps 3 and 4.

Glyph answered 2/4, 2009 at 13:8 Comment(0)
P
2

One good reason would be to quickly get an executable running on a new machine. In particular if the build environment takes a while to set up. (Load compilers, 3rd party libraries and tools, etc.)

Principe answered 2/4, 2009 at 11:53 Comment(0)
U
1

Checking in significant binaries violates a usage principle of source code/SVN, namely that files in source control should possess a meaningful property of difference.

Todays source file is meaningfully different to yesterdays source file; a diff will produce a set of changes which make sense to a human reader. Todays picture of the front of the office does not possess a meaningful diff with regard to yesterdays picture of the office.

Because things like images do not possess the concept of difference, WHY are you storing them in a system which exists record and store the differences between files?

Revision based storage is about storing histories of changes to files. There is no meaingful change history in the data of (say) JPEG files. Such files are stored perfectly as well simply in a directory.

More practically, storing large files - build output files - in SVN makes checkout slow. The potential to abuse SVN as a generalised binary repository is there. It all seems fine at first - because there aren't many binary files. Of course, the number of files increases at time passes; I've seen modules which take hours to check out.

It is better to store large associated binary files (and output files) in a directory structure and refer to them from the build process.

Underarm answered 2/4, 2009 at 12:42 Comment(4)
because you still want to store all the correct sources in one place - not 'get the source code, then download the images from file server x', chances are those images will have been deleted long ago.Paterfamilias
images can still be differenced, just not with the tools you use for text files - eg, you want to tell the difference between 2 jpegs? check them both out and look at them side by side. easy. Oh, and the check-in comment should help massively.Paterfamilias
You need to retain control of your associated binary files. Retaining control does not require those files being placed in SVN.Underarm
Yes. JPEG files have a meaningful difference - but not as a diff. Revision based storage is about storing histories of changes to files. There is no meaingful change history in the data of JPEG files. Such files are stored perfectly as well simply in a directory.Underarm
M
1

On my projects, I usually have post-build hooks to build from a special working copy on the server, namely in a path reachable from a HTTP browser. That means, after every commit, anyone [who can read the internal web] can easily download the relevant binaries. No consistency problems, instant updating + a path towards automated testing.

Materialize answered 2/4, 2009 at 13:25 Comment(0)
L
1

Version control should have everything you need to do: svn co and then build. It shouldn't have intermediates or final product, as that defeats the purpose. You can create a new project in SVN for the result and version the binary result separately (for releases and patches if needed).

Lyceum answered 2/4, 2009 at 14:13 Comment(0)
P
0

Do you mean you have the sources plus the result of the build in the same repository ?

This is a good argument for a daily build, with versioned build scripts in a separate repository. Binary in the repository itself is not bad, but sources + result of build looks bad to me

If you build several binaries and don't notice a build breakage somewhere, then you end up with binaries from different revision, and you are preparing yourself for some subtle bug chase.

Advocate for a daily, separately versioned autobuild script, than just against the binaries + code

Pita answered 2/4, 2009 at 11:48 Comment(0)
V
0
  • Subversion is a Source Control Manager -> Binaries are not source
  • If you use "svn up" command to update production all developers with commit-permissions can update/modify/broke production?

Alternatives: Use continuous integration like Hudson or Cruise Control.

View answered 2/4, 2009 at 11:51 Comment(4)
By production machines I meant "production-like" integration testing machines. Sorry about that, I've edited it now.Om
Subversion is not an SCM, it is a Version Control System. Read: svnbook.red-bean.com/en/1.5/svn.intro.whatis.htmlNeglect
As a blanket statement "Binaries" are not source is not really correct. There are many resources that need version control that are purely binary, such as images. However in this particular case it seems that the included files are superfluous.Digitalize
@Neglect - You are right my english not as right as you :-) @Peter M - You are right too, but at the answer context binaries are results build artifacts. I agree with you about images and other resources, for me its right to put on VCSView
O
0

I think the feeling of having done a bad thing when binary files are comitted to the VCS is reasoned by the basic idea that one should never put redundant things in an archive, reasoned by resource economy and drawbacks of double data management.

That is why: If you can easily reconstruct your archived state of work from the other files of that certain version, like with simple recompiling or installing standard setups, you should not commit such binaries, but rather commit something like a README or INSTALL file. If the difficulties or risk of failing to reconstruct is too much, do commit.

Olimpiaolin answered 3/6, 2013 at 14:25 Comment(1)
The dispute will be about judging the risk and how easy both ways areOlimpiaolin

© 2022 - 2024 — McMap. All rights reserved.