How does Git solve the merging problem? [closed]

Asked 4/3, 2009 at 21:43 Answered 10/3, 2009 at 16:6

SVN made branching much easier by making branches really cheap, but merges remain a real problem in SVN - one that Git supposedly solves.

Does Git achieve this, and how?

(disclaimer: All I know about Git is based on the Linus lecture - total git noob here)

Isoniazid answered 4/3, 2009 at 21:43 Comment(1)

In response to: "SVN made branching much easier by making branches really cheap". Are you sure you did't accidentally replace Git with SVN? I know one of the big features Git boasts is cheap branching... I've heard branching in SVN is a nightmare because much of it is manual (make new directory with branched content, etc.). – Kuehn 1/3, 2012 at 21:12

Git will not prevent conflict in merges but can reconcile history even when they do not share any parent ancestor.
(through The grafts file (.git/info/grafts), which is a list, one per line, of a commit followed by its parents, that you can modify for that "reconciliation" purpose.)
So pretty powerful right there.

But to really have a glimpse on "how merges have been thought through", you can start by turning to Linus himself, and realize this issue is not so much about "algorithm":

Linus: Me personally, I want to have something that is very repeatable and non-clever. Something I understand or tells me that it can't do it.
And quite frankly, merging single-file history without taking all the other files' history into account makes me go "ugh".

The important part of a merge is not how it handles conflicts (which need to be verified by a human anyway if they are at all interesting), but that it should meld the history together right so that you have a new solid base for future merges.

In other words, the important part is the trivial part: the naming of the parents, and keeping track of their relationship. Not the clashes.

And it looks like 99% of SCM people seem to think that the solution to that is to be more clever about content merges. Which misses the point entirely.

So Wincent Colaiuta adds (emphasis mine):

There is no need for fancy metadata, rename tracking and so forth.
The only thing you need to store is the state of the tree before and after each change.

What files were renamed? Which ones were copied? Which ones were deleted? What lines were added? Which ones were removed? Which lines had changes made inside them? Which slabs of text were copied from one file to another?
You shouldn't have to care about any of these questions and you certainly shouldn't have to keep special tracking data in order to help you answer them: all the changes to the tree (additions, deletes, renames, edits etc) are implicitly encoded in the delta between the two states of the tree; you just track what is the content.

Absolutely everything can (and should) be inferred.

Git breaks the mould because it thinks about content, not files.
It doesn't track renames, it tracks content. And it does so at a whole-tree level.
This is a radical departure from most version control systems.
It doesn't bother trying to store per-file histories; it instead stores the history at the tree level.
When you perform a diff you are comparing two trees, not two files.

The other fundamentally smart design decision is how Git does merges.
The merging algorithms are smart but they don't try to be too smart. Unambiguous decisions are made automatically, but when there's doubt it's up to the user to decide.
This is the way it should be. You don't want a machine making those decisions for you. You never will want it.
That's the fundamental insight in the Git approach to merging: while every other version control system is trying to get smarter, Git is happily self-described as the "stupid content manager", and it's better for it.

Shanly answered 4/3, 2009 at 22:24 Comment(7)

This strikes me as a feature intended to help you recover from past mistakes. While that is a noble and good thing, it doesn't really help you not make the mistake in the first place. – Ligate 5/3, 2009 at 4:36

can you further explain what's a tree? git n00b here. – Holster 14/3, 2009 at 23:8

hansen_j, for more on git trees read newartisans.com/2008/04/git-from-the-bottom-up.html – Acanthous 14/3, 2009 at 23:58

@hansen j : a tree is the list of blobs (SHA1-referenced contents) or sub-trees, and names. Note that two file with the same content/size will have the same SHA1. The tree will still list 2 files (because 2 different names), but Git will only store the unique content once! – Shanly 15/3, 2009 at 0:6

@Shanly "Every other version control system" - Is that still correct? Don't Mercurial and Bazaar also do what Git does? Would it not be more accurate (at least now in 2011) to now say "Centralized version control systems?" – Duchy 18/7, 2011 at 14:16

@Mike: they usually store more informations for managing merge, mainly around rename detection, like the hg addremove (thread.gmane.org/gmane.comp.version-control.git/177146/…), even though rename detection is still vehemently opposed by Linus (article.gmane.org/gmane.comp.version-control.git/177315). They all do merges, but Git tries to keep it more simple than others. – Shanly 18/7, 2011 at 14:33

@Mike: plus Git is the only one to be a content manager. All ther other are file manager. See blog.daemon.com.au/blog-post/know-subversion-git-or-mercurial for more. – Shanly 18/7, 2011 at 20:40

It is now generally agreed on that 3-way merge algorithm (perhaps with enhancements such like rename detection and dealing with more complicated history), which takes into account version on current branch ('ours'), version on merged branch ('theirs'), and version of common ancestor of merged branches ('ancestor') is (from the practical point of view) the best way to resolve merges. In most cases, and for most of the contents tree level merge (which version of file to take) is enough; there rarely is need for dealing with contents conflicts, and then diff3 algorithm is good enough.

To use 3-way merge you need to know common ancestor of merged branches (co called merge base). For this you need to know full history between those branches. What Subversion before (current) version 1.5 was lacking (without third party tools such like SVK or svnmerge) was merge tracking, i.e. remembering for merge commit what parents (what commits) were used in merge. Without this information it is not possible to calculate correctly common ancestor in the presence of repeated merges.

Take for account the following diagram:

---.---a---.---b---d---.---1
        \        /
         \-.---c/------.---2

(which would probably get mangled... it would be nice to have ability to draw ASCII-art diagrams here).
When we were merging commits 'b' and 'c' (creating commit 'd'), the common ancestor was the branching point, commit 'a'. But when we want to merge commits '1' and '2', now the common ancestor is commit 'c'. Without storing merge information we would have to conclude wrongly that it is commit 'a'.

Subversion (prior to version 1.5), and earlier CVS, made merging hard because you had to calculate common ancestor yourself, and give information about ancestor manually when doing a merge.

Git stores information about all parents of a commit (more than one parent in the case of merge commit) in the commit object. This way you can say that Git stores DAG (direct acyclic graph) of revisions, storing and remembering relationships between commits.

(I am not sure how Subversion deals with the issues mentioned below)

Additionally merging in Git can deal with two additional complication issues: file renames (when one side renamed a file, and other didn't; we want to get rename, and we want to get changes applied to correct file) and criss-cross merges (more complicated history, when there is more than one common ancestor).

File renames during merge are managed using heuristic similarity score based (both similarity of file contents and similarity of pathname is taken into account) rename detection. Git detects which files correspond to each other in merged branches (and ancestor(s)). In practice it works quite well for real world cases.
Criss-cross merges, see definition at revctrl.org wiki, (and presence of multiple merge bases) are managed by using recursive merge strategy, which generates single virtual common ancestor.

Dubois answered 4/3, 2009 at 21:43 Comment(1)

I tried improving the diagram, by formatting it as a block quote ... I hope I didn't break it due to insufficient understanding, my apologies in that case. – Contempt 7/3, 2009 at 20:48

Answers above are all correct, but I think they miss the centerpoint of git's easy merges for me. An SVN merge requires you to keep track and remember what's been merged and that's a huge PITA. From their docs:

svn merge -r 23:30 file:///tmp/repos/trunk/vendors

Now that's not killer, but if you forget whether it's 23-30 inclusive or 23-30 exclusive, or whether you've already merged some of those commits, you're hosed and you've got to go figure out the answers to avoid repeating or missing commits. God help you if you branch a branch.

With git it's just git merge and all this happens seamlessly, even if you've cherry-picked a couple commits or done any number of fantastical git-land things.

Othilie answered 10/3, 2009 at 16:6 Comment(2)

I think you're forgetting about merge tracking that svn has since recently. – Isoniazid 10/3, 2009 at 16:56

that's true, I haven't had much experience with the new merge stuff. From a distance it looks kludgy "once a --reintegrate merge is done from branch to trunk, the branch is no longer usable for further work. It's not able to correctly absorb new trunk changes...' better than nothing certainly. – Othilie 11/3, 2009 at 17:18

As far as I know, the merging algorithms are not any smarter than those in other version control systems. However, because of git's distributed nature, there is no need for centralized merging efforts. Every developer can rebase or merge small changes from other developers into his tree at any time, thus the conflicts that arise tend to be smaller.

Manoff answered 4/3, 2009 at 22:1 Comment(0)

-9

Git just makes it more difficult to screw up everyone else's repository with a bad merge.

The only real benefit is that Git is much, much faster at merging because everything is done locally and it's written in C.

SVN, properly used, is perfectly usable.

Ligate answered 4/3, 2009 at 22:23 Comment(1)

Git also does diffing differently. It looks at the content difference, rather than a file by file line edits. – Stinky 5/3, 2009 at 1:11

Recommended topics

Hot tags