I want to separate binary files (media) from my code repositories. Is it worth it? If so, how can I manage them?
Asked Answered
V

1

1

Our repositories are getting huge because there's tons of media we have ( hundreds of 1 MB jpegs, hundreds of PDFs, etc ).

Our developers who check out these repositories have to wait an abnormally long time because of this for certain repos.

Has anyone else had this dilemma before? Am I going about it the right way by separating code from media? Here are some issues/worries I had:

  • If I migrate these into a media server then I'm afraid it might be a pain for the developer to use. Instead of making updates to one server he/she will have to now update two servers if they are doing both programming logic and media updates.
  • If I migrate these into a media server, I'll still have to revision control the media, no? So the developer would have to commit code updates and commit media updates.
  • How would the developer test locally? I could make my site use absolute urls, eg src="http://media.domain.com/site/blah/image.gif", but this wouldn't work locally. I assume I'd have to change my site templating to decide whether it's local/development or production and based on that, change the BASE_URL.
  • Is it worth all the trouble to do this? We deal with about 100-150 sites, not a dozen or so major sites and so we have around 100-150 repositories. We won't have the time or resources to change existing sites, and we can only implement this on brand new sites.
  • I would still have to keep scripts that generate media ( pdf generators ) and the generated media on the code repository, right? It would be a huge pain to update all those pdf generators to POST files to external media servers, and an extra pain taking caching into account.

I'd appreciate any insight into the questions I have regarding managing media and code.

Vesuvianite answered 21/10, 2010 at 17:57 Comment(7)
You might want to tell us which VCS you're using. (I'm very curios which modern VCS would get slow due to lots of big files in its repository.)Rostov
Ahh, you got me. We are actually ( yes, yes, yes the 1980 ) using CVS still. We haven't yet had the time or resources to migrate to git yet.Vesuvianite
It's good that you're asking about this now, before you dive into a tool transition. How often, if ever, do those binary assets change? Their modification patterns will affect the strategy you want to use.Scenography
@Novelocrat - image updates are made daily across dozens of sites.Vesuvianite
OK, so a history in something like Git would grow huge, even if each site had its own repository. Next question: how tightly coupled are the code and the media? Particularly, do media content changes correspond directly to code changes?Scenography
Often times when developers are tasked, they are given both programmatic AND media changes at once. So front end changes + programming changes + media content are all updated usually in the same task. Very common day to day thing.Vesuvianite
Note that SVN stores binaries as diffs, too, while CVS must store them verbosely version by version, because it doesn't have a binary diff algorithm. CVS will make your repository's disk space explode if you checkin a lot of binary stuff.Rostov
L
0

First, yes, separating media and generated content (like the generated pdf) from the source control is a good idea.
That is because of:

  • disk space and checkout time (as you describe in your question)
  • the lack of CVS feature actually used by this kind of file (no diff, no merge, only label and branches)

That said, any transition of this kind is costly to put in place.
You need to separate the release management process (generate the right files at the right places) from the development process (getting from one or two referential the right material to develop/update your projects)

Binaries fall generally into two categories:

  • non-generated binaries:
    They are best kept in an artifact repository (like Nexus for instance), under a label that would match the label used for the text sources in a VCS
  • generated binaries (like your pdf):
    ideally, they shouldn't be kept in any repository, but only generated during the release management phase in order to be deployed.
Liger answered 22/10, 2010 at 6:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.