How good is Subversion at storing lots of binary files? [closed]
Asked Answered
M

7

40

I'm looking for a place to put a few GB of documents (mostly .doc and .xls). My team already has a Subversion server set up for managing the documents we create, so I'd prefer to use that if possible. How well will Subversion handle all this extra stuff? Most of it is legacy information and will only ever have one version, but it is possible that a few documents could be updated.

I've been warned that SVN isn't particularly lots-of-big-binary-files-friendly. I'm wary of trying it to see whether it works since they'll always be in the repository history even if I later delete them.

Any alternatives? We'll need the ability to comment on and/or tag documents, but we can use a Delicious-like service combined with the URLs for the documents in SVN (or similar).

Later I'm not so worried about diffs on the binaries since, as stated above, they won't change much. I'm OK with a slight hassle if they do -- it's no worse than SharePoint.

Miskolc answered 11/2, 2009 at 20:34 Comment(0)
F
35

There's a difference between lots of big binary files, and a big number of binary files.

In my experience SVN is fine with individual binary files of several hundred megabytes. The only problems I've seen begin to occur with individual files of around a gigabyte or so. Operations fail for mysterious and unknown reasons, possibly SVN failing to handle network related problems.

I am not aware of any SVN problems related to the number of binary files, beyond their lack of merge-ability and the fact that binary files often can't be efficiently stored as deltas (SVN can use deltas).

So;

  • 1000 1MB files = fine.
  • 100 10MB files = fine
  • 10 100MB files = fine
  • 1 >1000MB file = not a good idea.

I would hope the size of your documents fits into one of the fine categories :)

Fifine answered 11/2, 2009 at 20:34 Comment(3)
I was hoping this distinction was true, but I wasn't sure.Miskolc
Apparently, the "fact that revisions are not stored as deltas" is not true, according to the other answers. Could you change that?Onitaonlooker
it takes a lot of ram to store the files, so maybe your web server is giving up (if served via Apache). I know I used to get errors with my little VM, these went after I allocated more RAM. Newer versions will be better apparently.Jugurtha
I
42

In my previous company we setup Subversion to store CAD files. Files upto 100 MB were stored in Subversion. If many people 'add' big files to Subversion webserver can be a bottleneck. However, incremental commits were perfectly ok.

Subversion stored 'binary delta'. In fact, on server side, binary and text files are treated exactly same in storing the 'delta'. Check "binary delta encoding improvements' section on page http://subversion.tigris.org/svn_1.4_releasenotes.html. It explicitly says "Subversion uses the xdelta algorithm to compute differences between strings of bytes" (and not strings of 'characters').

Just for experiment, I stored the 10 version of CAD (CATIA part file). Each version I made minor modifications to part and then check the serverside repository size. The total size was about 1.2x for about 10 revision (x - being the original file size).

Remember to set svn:needs-lock property. In my experience, Best way is to use 'auto props' to set the svn:needs-lock based on file extension.

Infamous answered 11/2, 2009 at 20:34 Comment(0)
F
35

There's a difference between lots of big binary files, and a big number of binary files.

In my experience SVN is fine with individual binary files of several hundred megabytes. The only problems I've seen begin to occur with individual files of around a gigabyte or so. Operations fail for mysterious and unknown reasons, possibly SVN failing to handle network related problems.

I am not aware of any SVN problems related to the number of binary files, beyond their lack of merge-ability and the fact that binary files often can't be efficiently stored as deltas (SVN can use deltas).

So;

  • 1000 1MB files = fine.
  • 100 10MB files = fine
  • 10 100MB files = fine
  • 1 >1000MB file = not a good idea.

I would hope the size of your documents fits into one of the fine categories :)

Fifine answered 11/2, 2009 at 20:34 Comment(3)
I was hoping this distinction was true, but I wasn't sure.Miskolc
Apparently, the "fact that revisions are not stored as deltas" is not true, according to the other answers. Could you change that?Onitaonlooker
it takes a lot of ram to store the files, so maybe your web server is giving up (if served via Apache). I know I used to get errors with my little VM, these went after I allocated more RAM. Newer versions will be better apparently.Jugurtha
S
3

We built our subversion client exactly for this, as we did really big design/consulting jobs that really needed version control. We never had any problems with it.

Schug answered 11/2, 2009 at 20:34 Comment(0)
S
1

It depends on how often the files are updated. It can't do anything about merging binary files and so everytime there's a conflict you'll have pain. Otherwise it's just storage and retrieval, and while it's not as good as with text it still handles that just fine.

Stockroom answered 11/2, 2009 at 20:34 Comment(0)
G
0

I personally use Mercurial for such tasks. I've used it to store several hundreds of gigs of media. Yes, it takes up some disk space, but disk space is cheap. With Mercurial, you also get the benefit of it being distributed, so doing a "checkout", or clone as is know in Mercurial, you get the whole repo, not just a snapshot. If your server ever dies then, your still in business.

Grizzly answered 11/2, 2009 at 20:34 Comment(1)
Quick question, how do you deal with cloning multi-GB repositories everytime you need to create a new working copy?Folacin
C
-4

From what I've seen Git is very fast compared to Subversion, and I've heard it's somewhat faster than Mercurial, but only by a bit. However, I've not specifically tested it with large, or lots of, binary files.

That being said the way Git tracks changes, I would imagine it's very efficient at dealing with binary files.

I can say this for sure though; Once I got used to Git there's no way I'd choose to go back to Subversion. When I have to work with Subversion repositories I still use Git though git-svn. This way I get all the advantages of distributed version control, but still have really nice support for pushing commits back to the central Subversion repository.

Cuenca answered 11/2, 2009 at 20:34 Comment(8)
I'm a huge Git fan, but we already have SVN infrastructure, and we don't have it for Git here. If SVN won't work, so be it, but if it will, I'll take the free admin!Miskolc
This is a straight question: what's so great about Git?Furfuraceous
Instead of trying to explain it here. Watch this, very opinionated, talk from the creator of Git. youtube.com/watch?v=4XpnKHJAok8 Yes, it's opinionated, but I happen to agree the opinion. All I can say is give it a real chance. A couple of weeks at least. And you'll understand.Cuenca
Try telling us what its really like with binaries without imagining what it might be like. I can imagine git doesn't work with Microsoft files whatsoever - that'd be just as stupid a statement as your 'answer' was.Jugurtha
In my case svn worked better than git. I was working on very large php web project that had lot of binary files scattered across the directories. svn shallow checkout worked very well us. Git sparse checkout did not work because #11214795Pesthouse
For using Git with large binary files, you should check Git Large File Storage extension that is now availableHanoi
Git is good for text files but definitely does not handle binary files very well.Kindle
@Hanoi I may have missed something, but it doesn't appear that git LFS support something like the delta algorithm that SVN has?Production
A
-5

Well it's going to take up alot of space storing all that in Subversion, I'll tell you that much. Subversion doesn't store binary files via delta's the way that it stores text files. It'll probably take up as much space as it would to just store a bunch of binary files on your hard drive, plus the repository.

You may be able to a server-side tiddlywiki to store the urls to the documents within Subversion.

If they are mostly .doc and .xls files, there's also Microsoft's Sharepoint.

Airworthy answered 11/2, 2009 at 20:34 Comment(6)
You are correct sir, which is the big problem we have at our work. There are other versioning systems being released which deal with binary files AND deltas.Lindsylindy
SharePoint would be tough, if only because it would take me weeks to upload all of these files individually.Miskolc
Huh? One of the main selling points of Subversion over CVS is that Subversion DOES do deltas on binary files.Oshiro
Maybe something has changed since I started using it. Can you point me to some documentation of this? Thanks Andy!Airworthy
@leeand00: Here's an article that talks about SVN storage. ibm.com/developerworks/java/library/j-svnbins.htmlSulfatize
It's late.. But for other readers here is the "Delta of binaries" documented: svnbook.red-bean.com/en/1.6/svn.forcvs.binary-and-trans.htmlAlarcon

© 2022 - 2024 — McMap. All rights reserved.