"Layering" git repository

Asked 12/5, 2011 at 11:37 Answered 12/5, 2011 at 12:25

I'm using git on a daily basis for a while now, and this time I've run into a problem which I could describe like this.

I have a repository which holds entire website structure, and web root is in the root of the repository. Everything was fine until that was repository for a single site. However, that same repo is now used for several sites - basically the same site, in different languages, minor template tweaks, different graphics, etc. Those things are naturally versioned.

There is a master branch, which holds original source code of the site, and I'd like to have master (or some other branch) to hold code that is universal across all sites, as there will eventually be changes that are too site-specific to include in universal part of the repo.

Next, there is a branch for every single site which uses this source code. All those branches (say, site1, site2, and site3) are created from master branch, and each site clones correct branch.

Well, it seemed like a good idea, until I started making changes everywhere.

If I made a change on site1 branch, and I needed to copy that change to site2 branch, I would cherry-pick commit from one branch to another. Merging is out of the question there, as there are other changes on site1 branch which do not belong with site2 branch. Is there some other, more elegant solution for this kind of situation, or is that cherry-picking is exactly for this purpose?

Now, the real "problem" for me is when I change master, and then I want to copy all those changes to all branches. Naturally, considering the fact that all branches are descendants of master, and that I do want those changes in all site* branches, I switch to each branch and merge master.

This creates a pretty nasty-looking history for all branches. Each round of merges complicates graph considerably, which leads me to two conclusions:

this way of layering branches can work as long as I watch my step and not do anything stupid, and not trying to get any sense out of all-branches history graph. Or..
there has to be some better, more appropriate way to do it.

To illustrate my "problem", I'll give an image of graph that I got after creating those branches, adding few branch-specific commits, cherry-picking few of them, adding and merging one commit from master to all branches, commit or two to specific branches, and then one more master-to-all merge.

sort of not-so-simple history graph

I don't know, I like simplicity, and maybe I'm not used to seeing hard-to-follow graphs like this one (which will only grow in complexity with every following merge, I'm afraid).

I guess I could do cherry-picking all the way, and have neat history graph, but that doesn't sound right either, since I might do several commits in a row, and then forget to pick one of them to all other branches...

So... Any ideas, experiences, suggestions that you wouldn't mind to share?

UPDATE: I choose a solution described in my comment on accepted answer. Thanks to everyone who contributed!

UPDATE 2: Even though it's not tightly related to this question, recently I stumbled upon this model of branching that appears to be suitable for pretty much any organized development cycle, with GIT as underlying DVCS. It's a really good read. Recommended.

Undertaking answered 12/5, 2011 at 11:37 Comment(2)

Too less time for an answer, but have a look at rebase. – Rollway 12/5, 2011 at 11:39

Merging may not be as bad as you think. Fundamentally, a merge means "incorporate everything from another branch into this one" - which it sounds like is what you want to do. You might like to use git merge --log to include a list of the subjects of the merged commit in your merge commit message, and then use git log --first-parent (or gitk --first-parent) to view your history on the siteX branches. – Murrell 12/5, 2011 at 12:30

Alternate answer:

You could move the abstraction from branch level to repository level. Create one main repo with a master branch. Clone this repository for each site. When changes are made on the master branch, pull these changes into each site repo. This way you will only need one master branch per repo.

Original answer:

When the master branch has been changed you could rebase the other branches onto the updated master branch. Lets assume you have pl_site based on some commit on master and that master has changed:

 o---o---o---o---o  master
         \
          o---o---o---o---o  pl_site

After you have rebased pl_site, it will look like:

 o---o---o---o---o  master
                  \
                   o'---o'---o'---o'---o'  pl_site

Note that the commits on rebased version of pl_site are new. pl_site now contains the changes that were made on master.

Commands:

$ git checkout pl_site
$ git rebase master

Hillhouse answered 12/5, 2011 at 12:17 Comment(7)

This only gets you so far. A year from now, that rebase might not work, and you might end up destroying history in order to get it to do so. (That is, the first commit of pl_site beyond master won't necessarily always apply to any future version of master.) – Murrell 12/5, 2011 at 12:23

That doesn't solve the real problem, rebase just destroys the commit history so it looks pretty. – Electromotive 12/5, 2011 at 12:28

@Dietrich: Its the history, that looks ugly, so either you tolerate, that it is destroyed, or you keep it ugly (or you find a tool, that just prints it pretty :D) – Rollway 12/5, 2011 at 12:30

Added an alternate answer. The problem is really that there is only one repo and the abstractions are on branch level, not on repository level. One repo for each site feels more logical. – Hillhouse 12/5, 2011 at 12:32

Yes, but the history diagram in the question is only for one site, except for de_site which is only one line. – Electromotive 12/5, 2011 at 12:37

@Magnus: I was thinking more on the lines of your alternate solution. Basically, I'd have to create a master repo, and then, say, create new repo, add master repo as a remote, pull master from remote, and keep pulling from master after every update, while pushing all this (and new changes on top of this master-based repo) into a new repo? Sounds pretty straight forward. – Undertaking 12/5, 2011 at 12:39

@mr.b: Yes it should be pretty straight forward. Just play with it until you find a solution that you are happy with. Remember that you still have the original repo if something goes wrong :) – Hillhouse 12/5, 2011 at 12:41

I don't have a good answer for you, because your problem is complicated and has complicated solutions.

Option 1: Refactor

You said that the different sites are "basically the same site". So move them to different projects, and keep the main_site in a project by itself. The other sites will then include main_site as a subproject.

So, for the banner...

en_site/
en_site/images/banner.jpg
en_site/master/
en_site/master/images/banner.jpg

Your web site code, configuration script, deployment script, or whatever will make sure that images/banner.jpg is chosen over master/images/banner.jpg. Maybe when you deploy the site master/images gets copied first and then images gets copied over it, maybe you do something more sophisticated.

This might be a lot of work. However, when you look at the history, you'll get something like this:

en_site: A -> B -> C -> D
de_site: E -> F -> G -> H -> I
main_site: J -> K -> L -> M -> N -> O

Option 2: Use Darcs

In Darcs, you can move patches from branch to branch. Some commercial VCSs can probably do this too. So your branches would look like this:

master: patch1 patch2 patch3 patch4
en_site: patch1 patch2 patch3 patch4 en1 en2 en3
de_site: patch1 patch2 patch3 patch4 de1 de2 de3

Suppose that you want to port patch en2 to the German site.

de_site: patch1 patch2 patch3 patch4 de1 de2 de3 en2

Voila. However, this is not as clean as it looks. Darcs aficionados will point out that this patch model matches our conceptual model of "moving a patch to another branch", however, this glosses over the fact that you'll still have to test to make sure that the en2 patch doesn't break everything when you put it on de_site.

For example, what if en2 makes a change to the same part of the code as de1? What then? You have to merge manually, no matter what VCS you are using. For every obvious case like this, there is another case which the VCS won't detect and you'll have to check it yourself.

My experience

When I first started using git, it seemed like git merge was magic. However, no amount of VCS trickery is going to hide the fact that your site has some very complicated interdependencies. You can either refactor your site to remove the interdependencies, or hope that your VCS history doesn't become so complicated that you can no longer understand it.

The tradeoff between new branches, new projects, and refactoring things into libraries is a delicate tradeoff. Maintaining a large collection of patchsets is much more work than maintaining a large collection of projects which all use a common library, the (large) amount of work necessary to refactor may pay off quickly. Or it may not.

Electromotive answered 12/5, 2011 at 12:25 Comment(1)

I'm on the crossroads regarding exactly what you described.. So either I'm going to refactor and implement those per-site specifics in some sensible way so they can override each other, or I'm going to choose not-so-simple-yet-maintainable VCS-based solution, while keeping code intact. Thanks for the ideas. – Undertaking 12/5, 2011 at 12:41

Recommended topics

Hot tags