How do contents of git index evolve during a merge (and what's in the index after a failed merge)?
Asked Answered
H

1

8

I have a fuzzy idea of what the git index contains as one does git-adds and git-commits, but I don't have a clue of what happens to these contents when one does a git-merge. I'm particularly interested in learning what the index holds when a merge fails (e.g. due to some conflict).

Hapte answered 23/1, 2014 at 13:25 Comment(2)
merge and rebase are similar in some points. Maybe you can see what happens by doing git rebase -i and replace all pick by edit and then observe .git modificationNorthern
@Asenar: the similarity arises because rebase is a series of cherry-pick operations internally, and for each cherry-picked commit, git uses the merge machinery to apply the changes implied by that commit.Cassius
C
12

For any given path, there are up to four "version numbers" in the index, numbered 0 (zero) through 3. I'll call them "slots" as if they were actually there for every entry, and then easily indexed (this makes them easier to think about), although actually extra versions are introduced dynamically only when needed. These "virtual slots" can be "empty", meaning the file does not exist.

(Actually, once an entry is created in the index, it's marked with a flag bit, CE_REMOVED, if needed. This gets hairy because a whole directory full of files can be marked "removed" and then a file can be created with the name of the previous directory and marked "added". Let's just pretend we have fixed slots, there-but-empty, instead. :-) )

Slot #0 is the "normal", un-conflicted, all-is-well entry. It contains a bunch of cache data, the path name, and the blob-ID (the SHA-1) for the file stored in the repository.

When a merge succeeds, it's all "business as usual", so the only special case is a conflicted merge. A merge is "conflicted" when slots 1, 2, and/or 3 are non-empty. Skipping over most of the mechanics, what happens is this. The merge uses the "newest" name for all the slots, and:

  • Slot zero is left empty (you can't "commit" until you resolve the conflict, by which time this slot won't be empty anymore unless you really want the file to be removed).
  • Slot 1 ("base") is filled with the common ancestor version. If the file is new (in both revisions), this slot is empty.
  • Slot 2 ("ours") is filled with the target (HEAD, unless you're manually invoking some of the underlying merge machinery) version. If the file was removed in HEAD / target-of-merge, this slot is empty instead.
  • Slot 3 ("theirs") is filled with the being-merged-in version. If the file was removed in the being-merged-in revision, this slot is empty.

Once you resolve the conflict and "git add", the #0 slot gets filled in with whatever you "add", wiping out the entries in #1 through #3—or, if you "git rm" the conflicted file, the other stage entries are still removed, but now the #0 slot remains empty, which also resolves the conflict.

More concretely, then, suppose you have a common ancestor that has (among others) these two files:

gronk
flibby

You're on branch cleanup and you've renamed gronk to breem, and edited both that and flibby. You decide to git merge work, where they modified gronk but did not rename it, and removed flibby. Some other file(s) merged cleanly.

The index will contain three versions of bleem and two versions of flibby:

$ git checkout cleanup
Switched to branch 'cleanup'
$ git merge work
CONFLICT (modify/delete): flibby deleted in work and modified
in HEAD. Version HEAD of flibby left in tree.
Auto-merging bleem
CONFLICT (content): Merge conflict in bleem
Automatic merge failed; fix conflicts and then commit the result.
$ git ls-files --stage
100644 4362aba7f3b7abf2da0d0ed558cbf5bc0d12e4b0 1   bleem
100644 49db92a61392e9fd691c4af6e1221f408452a128 2   bleem
100644 04b399c8fe321902ce97a1538248878756678ca2 3   bleem
100644 366b52546711401122b791457793a38c033838dd 1   flibby
100644 6fecb1480f45faaabc31b18c91262d03d3767cde 2   flibby
100644 7129c6edb96d08bb44ca1025eb5ae41d41be8903 0   x.txt

You can see the original (base) version of bleem with git show :1:bleem. That was called gronk in the base version (and in work as well, in this case), but now it's called bleem because git believes you renamed gronk to bleem in cleanup. (Git finds the renames between the merge-base and HEAD and then applies the same renaming to work if necessary, as in this case.)

Likewise, you can see the work version with git show :3:bleem or git show work:gronk, and the HEAD version with any of: git show HEAD:bleem, git show cleanup:bleem, or git show :2:bleem (slot 2 contains the HEAD aka cleanup version, and is named according to the name in HEAD).

For flibby, though, since it was removed in work, there is no "theirs" (slot 3) version.

To resolve the conflicts, you need only tell git add or git rm to update the slot-zero entry and remove the 1-through-3 entries. Of course, with git add, what goes into slot 0 is whatever is in the work directory now, so you generally have to edit the files first.

Incidentally, I labeled slots 2 and 3 "ours" and "theirs" above. This is how git checkout treats them as well (git checkout --ours and git checkout --theirs let you write version 2 or 3 into slot 0; such a checkout, like most checkouts, "erases" the other slots too, thus resolving the conflict). However, in a rebase, the HEAD branch is actually the branch being rebased-on-to, and the "theirs" version is your branch-being-rebased. So the ours/theirs terminology is not really that great, in my opinion: it's too easy to get it backwards during a rebase.

I should also note that git checkout -m will "re-create" a merge conflict, if you're in the middle of a conflicted merge, by erasing slot 0 and "resurrecting" the versions in slots 1-3 as needed (and writing the conflicted merge file to the working directory, obeying any change in your merge.conflictstyle setting as well).

Cassius answered 23/1, 2014 at 14:40 Comment(1)
This is a very nice answer.Reluctant

© 2022 - 2024 — McMap. All rights reserved.