You're being bitten by git log
's cheap and sleazy implementation of --follow
, plus the fact that git log
often doesn't even look inside merges.
Fundamentally, --follow
works internally by changing the name of the file it's looking for. It does not remember both names, so when the linearization algorithm (breadth first search via priority queue) goes down the other leg of the merge, it has the wrong name. You are correct that the order of commit visits matters since it's when Git deduces a rename that Git changes the name of the file it's searching for.
In this graph (it looks like you ran the script several times because the hashes changed—the hashes here are from the first sample):
* 06b5bb7 Merge branch 'feature'
|\
| * 07ccfb6 Change
* | 448ad99 Move
|/
* 31eae74 First commit
git log
will visit commit 06b5bb7
, and put 448ad99
and 07ccfb6
on the queue. With the default topo order it will next visit 448ad99
, examine the diff, and see the rename. It is now looking for a.txt
instead of b.txt
. Commit 448ad99
is selected, so git log
will print it to the output; and Git adds 31eae74
to the visit queue. Next, Git visits 07ccfb6
, but it is now looking for a.txt
so this commit is not selected. Git adds 31eae74
to the visit queue (but it's already there so this is a no-op). Finally, Git visits 31eae74
; comparing that commit's tree to the empty tree, Git finds an added a.txt
so this commit gets selected.
Note that had Git visited 07ccfb6
before 448ad99
, it would have selected both, because at the start it is looking for b.txt
.
The -m
flag works by "splitting" a merge into two separate internal "virtual commits" (with the same tree, but with the (from ...)
added to their "names" so as to be able to tell which virtual commit resulted from which parent). This has the side effect of retaining both of the split merges and looking at their diffs (since the result of splitting this merge is two ordinary non-merge commits). So now—note that this uses your new repository with its new different hashes in the second sample—Git visits commit 36c80a8 (from 1a07e48)
, diffs 1a07e48
vs 36c80a8
, sees a change to b.txt
and selects the commit, and puts 1a07e48
on the visit queue. Next, it visits commit 36c80a8 (from 05116f1)
, diffs 05116f1
vs 36c80a8
, and puts 05116f1
on the visit queue. The rest is fairly obvious from here.
How can I display cleanly all of the commits that changed a file, following through renames?
The answer for Git is that you can't, at least not using what is built in to Git.
You can (sometimes) get a little closer by adding --cc
or -c
to your git log
command. This makes git log
look inside merge commits, doing what Git calls a combined diff. But this doesn't necessarily work anyway, because, hidden away in a different part of the documentation is this key sentence:
Note that combined diff lists only files which were modified from all parents.
Here is what I get with --cc
added (note, the ...
is literally there, in git log
's output):
$ git log --graph --oneline --follow --cc -- b.txt
* e5a17d7 (HEAD -> master) Merge branch 'feature'
|\
| |
...
* | 52e75c9 Move
|/
| diff --git a/a.txt b/b.txt
| similarity index 100%
| rename from a.txt
| rename to b.txt
* 7590cfd First commit
diff --git a/a.txt b/a.txt
new file mode 100644
index 0000000..e965047
--- /dev/null
+++ b/a.txt
@@ -0,0 +1 @@
+Hello
Fundamentally, though, you'd need git log
to be much more aware of file renames at merge commits, and to have it look for the old name down any leg using the old file name, and the new name down any leg using the new name. This would require that git log
use (most of) the -m
option internally on each merge—i.e., split each merge into N separate diffs, one per parent, so as to find which legs have what renames—and then keep a list of which name to use down which branches of merges. But when the forks come back together, i.e., when the multiple legs of the merge (which becomes a fork in our reverse direction) rejoin, it's not clear which name is the correct name to use!