`git log --follow --graph` skips commits
Asked Answered
T

1

5

Setup

git version 2.11.0.windows.1

Here is a bash snippet to reproduce my test repository:

git init

# Create a file
echo Hello > a.txt
git add a.txt
git commit -m 'First commit'

# Change it on one branch
git checkout -b feature
echo Hi > a.txt
git commit -am 'Change'

# Rename it on the other
git checkout master
git mv a.txt b.txt
git commit -m 'Move'

# Merge both changes
git merge --no-edit feature

At the end, git log --graph --pretty=oneline --abbrev-commit prints:

*   06b5bb7 Merge branch 'feature'
|\
| * 07ccfb6 Change
* | 448ad99 Move
|/
* 31eae74 First commit

Problem

Now, I want to get the full log for b.txt (ex-b.txt).
git log --graph --pretty=oneline --abbrev-commit --follow -- b.txt prints:

...
* | 1a07e48 Move
|/
* 5ff73f6 First commit

As you can see, the Change commit is not listed, even though it did modify the file.

I think I have tracked it down to the implicit use of --topo-order by --graph, since adding --date-order brings the commit back, but that might be chance.

Additionally, adding -m shows the merge commit (which is fine) and the Change commit, but then the merge commit is duplicated:

*   36c80a8 (from 1a07e48) Merge branch 'feature'
|\
| | 36c80a8 (from 05116f1) Merge branch 'feature'
| * 05116f1 Change
* | 1a07e48 Move
|/
* 5ff73f6 First commit

Question

What am I missing to explain the weird behaviour I'm witnessing?
How can I display cleanly all of the commits that changed a file, following through renames?

Tuberculate answered 29/9, 2017 at 11:2 Comment(0)
W
11

You're being bitten by git log's cheap and sleazy implementation of --follow, plus the fact that git log often doesn't even look inside merges.

Fundamentally, --follow works internally by changing the name of the file it's looking for. It does not remember both names, so when the linearization algorithm (breadth first search via priority queue) goes down the other leg of the merge, it has the wrong name. You are correct that the order of commit visits matters since it's when Git deduces a rename that Git changes the name of the file it's searching for.

In this graph (it looks like you ran the script several times because the hashes changed—the hashes here are from the first sample):

*   06b5bb7 Merge branch 'feature'
|\
| * 07ccfb6 Change
* | 448ad99 Move
|/
* 31eae74 First commit

git log will visit commit 06b5bb7, and put 448ad99 and 07ccfb6 on the queue. With the default topo order it will next visit 448ad99, examine the diff, and see the rename. It is now looking for a.txt instead of b.txt. Commit 448ad99 is selected, so git log will print it to the output; and Git adds 31eae74 to the visit queue. Next, Git visits 07ccfb6, but it is now looking for a.txt so this commit is not selected. Git adds 31eae74 to the visit queue (but it's already there so this is a no-op). Finally, Git visits 31eae74; comparing that commit's tree to the empty tree, Git finds an added a.txt so this commit gets selected.

Note that had Git visited 07ccfb6 before 448ad99, it would have selected both, because at the start it is looking for b.txt.

The -m flag works by "splitting" a merge into two separate internal "virtual commits" (with the same tree, but with the (from ...) added to their "names" so as to be able to tell which virtual commit resulted from which parent). This has the side effect of retaining both of the split merges and looking at their diffs (since the result of splitting this merge is two ordinary non-merge commits). So now—note that this uses your new repository with its new different hashes in the second sample—Git visits commit 36c80a8 (from 1a07e48), diffs 1a07e48 vs 36c80a8, sees a change to b.txt and selects the commit, and puts 1a07e48 on the visit queue. Next, it visits commit 36c80a8 (from 05116f1), diffs 05116f1 vs 36c80a8, and puts 05116f1 on the visit queue. The rest is fairly obvious from here.

How can I display cleanly all of the commits that changed a file, following through renames?

The answer for Git is that you can't, at least not using what is built in to Git.

You can (sometimes) get a little closer by adding --cc or -c to your git log command. This makes git log look inside merge commits, doing what Git calls a combined diff. But this doesn't necessarily work anyway, because, hidden away in a different part of the documentation is this key sentence:

Note that combined diff lists only files which were modified from all parents.

Here is what I get with --cc added (note, the ... is literally there, in git log's output):

$ git log --graph --oneline --follow --cc -- b.txt
*   e5a17d7 (HEAD -> master) Merge branch 'feature'
|\  
| | 
... 
* | 52e75c9 Move
|/  
|   diff --git a/a.txt b/b.txt
|   similarity index 100%
|   rename from a.txt
|   rename to b.txt
* 7590cfd First commit
  diff --git a/a.txt b/a.txt
  new file mode 100644
  index 0000000..e965047
  --- /dev/null
  +++ b/a.txt
  @@ -0,0 +1 @@
  +Hello

Fundamentally, though, you'd need git log to be much more aware of file renames at merge commits, and to have it look for the old name down any leg using the old file name, and the new name down any leg using the new name. This would require that git log use (most of) the -m option internally on each merge—i.e., split each merge into N separate diffs, one per parent, so as to find which legs have what renames—and then keep a list of which name to use down which branches of merges. But when the forks come back together, i.e., when the multiple legs of the merge (which becomes a fork in our reverse direction) rejoin, it's not clear which name is the correct name to use!

Wiener answered 29/9, 2017 at 15:27 Comment(1)
Shoot. Well, thank you for that comprehensive breakdown of the issue anyway :)Tuberculate

© 2022 - 2024 — McMap. All rights reserved.