The full answer is complicated. Edward Thomson's covers much of it. Here is considerably more detail.
Let's start, though, with this: git mergetool
runs—I should say, you run it—after all the rest of git merge
is done. Your merge tools do not even enter the picture until git merge
has completed (and failed due to conflicts). This changes a lot of the way you will think about these.
How (recursive and resolve) merge works
The user issues a git merge
command.
So far so good.
Git applies some git-specific algorithm to automatically merge the two modified files.
Whoops, no, we've already derailed and the train may be heading off the cliff. :-)
The first step at this point is to choose a merge strategy. Let's pick the default (-s recursive
) strategy. If we pick some other strategy, the next step may be different (it is entirely different for -s ours
, and somewhat different for -s octopus
, but none of those are interesting right now anyway).
The next step is to find all the merge bases. With any luck there is only one. We'll come back to the recursion issue later. There could be no merge base, though. Older versions of Git used an empty tree as a fake merge base. Newer ones—2.9 or later—demand that you add --allow-unrelated-histories
here (and then proceed in the same way). With an empty tree, every file is added, in both non-base commits.
If there is one merge base, it might be the same as either branch tip. If so, there is no merge to perform. There are two sub-cases here as well, though. There may be nothing to merge, because the merge base is the other commit and the other commit is "behind" (is an ancestor of) the current commit. In this case, Git always does nothing. Or, the other commit may be ahead of (a descendant of) the current commit. In this case, Git normally does a fast-forward operation, unless you specify --no-ff
. In both cases (fast-forward or --no-ff
), no actual merging happens. Instead, the further-ahead commit gets extracted. It either becomes the current commit (fast-forward merge: whatever branch you are on, it now points to the further-ahead commit), or Git makes a new commit using that commit's tree, and the new commit become the current commit.
A real merge: merging one merge base with two commits
We are now at a phase where we have a single merge base commit B, and two commits L (local or left-side, --ours
) and R (remote or right-side, --theirs
). Now, the two normal (-s recursive
and -s resolve
) strategies do a pair of git diff --name-status
operations with rename detection enabled, to see if there are files in the B-to-L change that change their names, and if there are files in the B-to-R change that change their names. This also finds out if there are newly added files in either L or R, and if files are deleted in either L or R. All of this information is combined to produce file identities, so that Git knows which sets of changes to combine. There may be conflicts here: a file whose path was PB in the base, but is now both PL and PR, has a rename/rename conflict, for instance.
Any conflicts at this point—I call them high level conflicts—lie outside the domain of file-level merging: they will make Git end this merge process with a conflict, regardless of whatever else occurs. In the meantime, though, we end up with "identified files", as I said above, without quite defining it. Loosely, what this means is that just because some path P got changed, doesn't mean it's a new file. If there was a file base
in the base commit B, and it's now called renamed
in L but still called base
in R, Git will use the new name, but compare B:base with L:renamed and B:base with R:base when Git goes to combine changes at the file level.
In other words, the file identity we compute at this stage tells us (and Git) which files in B match which files in L and/or R. This identity is not necessarily by path name. It's just usually the case that all three paths match.
There are a few small tweaks you can insert during this first diff
phase:
Renormalization (merge.renormalize
): you can make Git apply text conversions from .gitattributes
and/or core.eol
settings. The .gitattributes
settings include the ident
filter and any smudge and clean filters (though only the smudge direction applies here).
(I assumed Git did this early, since it may affect rename detection. I have not actually tested this, though, and I just looked through the Git source and it seems to not use this at this stage. So perhaps merge.renormalize
does not apply here, even though a smudge filter could radically rewrite a file. Consider a filter-pair that encrypts and decrypts, for instance. This is probably a bug, albeit a small one. Fortunately EOL conversion has no effect at all on similarity index values.)
You can set the similarity index for when Git will consider files to be renamed, or disable rename detection entirely. This is the -X find-renames=n
extended strategy option, previously called the rename threshold. It is the same as the git diff -M
or --find-renames
option.
Git currently has no way to set the "break" threshold a la git diff -B
. This also affects file identity computation, but if you can't set it, it doesn't really matter. (You probably should be able to set it: another minor buglet.)
Merging individual files
Now that we have our files identified and have decided which ones match up with which other ones, we finally proceed to the file-merging level. Note that here, if you are using the built-in merge driver, the remaining settable diff options will start to matter.
Let me quote this bit again, as it's relevant:
Git applies some ... algorithm to automatically merge the two modified files. For this purpose it creates a BASE, LOCAL, OTHER and BACKUP version of the file.
There are three (not four) files involved at this point, but Git does not create any of them. They are the files from B, L, and R. These three files exist as blob objects in the repository. (If Git is renormalizing files, it does have to create the renormalized ones as blob objects at this point, but then they live in the repository, and Git just sort of pretends they were in the original commits.)
The next step is quite critical, and it is where the index comes into the picture. The hash IDs of those three blob objects are HB, HL, and HR. Git gets ready to place these three hashes into the index, in slots 1, 2, and 3 respectively, but now uses the rules described in the git read-tree
documentation under the 3-Way Merge section:
- If all three hashes are equal, the file is already merged and nothing happens: the hash goes into slot zero. Even if only the second and third hashes are equal, the file is still already merged: both L and R make the same change with respect to B. The new hash goes into slot zero and the file-merge is complete.
- If HB = HL and HB ≠ HR, the right side (remote/other/
--theirs
) file should be the result. This hash goes into slot zero and the file-merge is complete.
- If HB ≠ HL and HB = HR, the left side (local/
--ours
) file should be the result. This hash goes into slot zero and the file-merge is complete.
- This leaves only the case where all three hashes differ. Now the files really do need to be merged. Git places all three hashes into the three index slots.
There are a few special cases that can apply at this point, all having to do with higher-level conflicts. It's possible that one or two index slots are left empty for some path names, because the index is carefully managed in a way that keeps it synchronized with the work-tree (so that it can play its role as a cache that speeds up Git a lot). But in principle, especially when we are concerned with merge drivers, we can think of this as just "all three slots"—they just may be three slots spread across several names, in the case of renamed files.
Invoking merge drivers (.gitattributes
)
At this point, we have an actual file-level merge to perform. We have three input files. Their actual contents are stored in the repository, as blob objects. Their hash IDs are stored in the index, in slots 1 through 3 (usually of a single index entry, but in the case of renames, maybe using more than one index entry). We may now:
Use git's built in file merge (which is also available as an external command, git merge-file
).
The built in file merge works directly from the index (though if we want to run it via git merge-file
we must extract the blobs into the file system). It extracts the files, does its thing to merge them, and optionally—depending on extended-strategy-options -X ours
or -X theirs
—writes conflict markers as well. It drops its final result into the work-tree, under whatever path name Git chose as the final path name, and is finished.
Use a merge driver (via .gitattributes
). A merge driver is run with arguments. However, these arguments are constructed by having Git extract the three blob objects to three temporary files.
The arguments are expanded from whatever we put in as %O
, %A
, %B
, %L
, and %P
. These argument letters don't quite match what we have been using: %O
is the name of the base file, %A
is the name of the left-side / local / --ours
version, %B
is the name of the right-side / other / remote / --theirs
version, %L
is the conflict-marker-size
setting (default 7), and %P
is the path that Git wants to use to save the final result in the work-tree.
Note that %O
, %A
, and %B
are all the names of temporary files that Git created (to hold the blob contents). None of them match %P
. Git expects the merge driver to leave the result of the merge in the path %A
(which Git will then rename to %P
on its own).
In all cases, the merged file goes into the work-tree, at this point. If the merge went well, the higher-numbered slots in the index are cleaned out: Git, in effect, runs git add
on the work-tree file, writing the data into the repository as a blob object, and getting a hash ID that goes into slot zero. If the merge failed with conflicts, the higher-numbered slots remain in place; slot zero is left empty.
The end result of all this is that the work-tree holds the merged files, perhaps with conflict markers, and the index holds the result of the merge, perhaps with conflicts that should be resolved.
Using git mergetool
This works much the same way as a merge driver. Aside from running only after the merge has completed with its results in the index and work-tree, though, the main differences are:
git mergetool
will make extra copies of files (the .orig
files).
- It knows exactly how to run each known tool, i.e., what arguments to pass to make that tool do something useful. There is no equivalent to a driver
%O
placeholder, for instance.
- It can run commands on all the as-yet-unmerged files in some directory.
In fact, git mergetool
is a big shell script: it uses git ls-files -u
to find the unmerged index entries, and git checkout-index
to extract each stage from the index. It even has special cases for the higher level conflicts such as add/add or rename/delete.
There's an additional driver shell-script fragment per known tool: look in
$ ls $(git --exec-path)/mergetools
to see all the individual tool drivers. These are passed a flag, $base_present
, for handling add/add conflicts. (They are sourced, i.e., run with . "$MERGE_TOOLS_DIR/$tool"
, so that they can override shell functions defined in the script.)
For unknown tools, you use the shell's variable names $BASE
, $LOCAL
, and $REMOTE
to know where the script has put the three files extracted from the index, and you write your result to $MERGED
(which is in fact the work-tree name for the file). The script does this:
setup_user_tool () {
merge_tool_cmd=$(get_merge_tool_cmd "$tool")
test -n "$merge_tool_cmd" || return 1
diff_cmd () {
( eval $merge_tool_cmd )
}
merge_cmd () {
( eval $merge_tool_cmd )
}
}
i.e., eval
s your tool command in a sub-shell, so that you cannot override things the way the known tools can.
Recursive merge
When Git needs to perform a recursive merge ...
Most of this question is kind of moot at this point. A merge tool never sees this situation at all, because git mergetool
is invoked after Git itself has finished the recursive merge and left the result in the index and work-tree. However, merge drivers do get a say here.
When the -s recursive
merge strategy is merging merge-bases to make a new "virtual commit", it invokes another git merge
—well, more precisely, just calls itself recursively—on the merge base commits (but see below). This inner git merge
knows that it's being invoked recursively, so when it is about to apply a .gitattributes
merge driver, it checks the recursive =
setting there. This determines whether the merge driver is used again, or some other merge driver is used for the inner merge. For the built-in merge driver, Git turns off the extended strategy options, i.e., neither -X ours
nor -X theirs
is in effect.
When an inner merge completes, its result—all the files that would be left in the work-tree, were this not an inner, recursive merge—is actually saved as a real commit. This is true even if there were unresolved conflicts. These unresolved conflicts may even contain conflict markers. Nonetheless, this is the new "virtual merge base" commit, and it is a true commit; it just has no external name by which you can find its commit hash.
If there are three or more merge bases at this particular level, rather than just two merge bases, this new virtual merge base is now merged with the next remaining merge base, iteratively. Logically, Git could use a divide-and-conquer strategy here: if there were 32 merge bases initially, it could merge them two at a time to produce 16 commits, merge those two at a time to produce 8, and so on. Aside from doing ceil(log2(N)) merges instead of N-1 merges, though, it's not clear that this would buy much: it's already quite rare to have N > 1.