As noted in comments, the --follow
option must precede the stand-alone --
that indicates the end of the options list.
Even it the follow renames seems to work now, when I add --grep="rename" --invert-grep
to remove the "rename" commit, I get 0 results
That makes sense (but is a bug of sorts),1 because of the the way --follow
works. The issue here is that Git doesn't have any kind of file history at all. All that Git has, is the set of commits that are in the repository. The commits are the history:
Each commit is numbered, by its big ugly hash ID, which is unique to that one particular commit. No other commit—in any Git repository2—has that hash ID.
Each commit has a full snapshot of every file.
Each commit also stores the hash ID of a previous commit—or, for a merge commit, two or more previous commits.
So these numbers string commits together, backwards:
... <-F <-G <-H
The uppercase letters here stand in for the actual commit hash IDs, by which Git finds the commits. Each commit has a "backwards-pointing arrow" coming out of it—the stored hash ID of the previous commit—so that if we could just remember the hash ID of the last commit in the chain, we could have Git work backwards through the chain.
A branch name just tells Git which commit is the last commit in that branch:
I--J <-- feature1
/
...--F--G--H
\
K--L <-- feature2
Here, commit J
is the last commit one of the feature branches and commit L
is the last commit on another. Note that commits up through H
are on both branches (and quite likely also on the main or master branch as well).
The git log
command simply works through the commits, one at a time, starting from whatever "last commit" you choose. The default "last commit" is the one at the tip of whatever branch you have checked out right now. This process works backwards: Git starts with the last commit and works backwards, one commit at a time.
The -M
option to git diff
, which is short for --find-renames
, enables rename detection in git diff
. The --follow
option to git log
does the same for git log
, but also takes the name of one single file to look for. (Giving the -M
option to git log
makes it use the rename detector at each diff, but since it's not looking for one specific file, that just affects the -p
or --name-status
style of output. With --follow
, git log
is looking for that one specific file, as we'll see in a moment.)
The rename detector works this way:
You give Git two commits, before and after or old and new or, say, F
and G
. (You can put the new commit on the left side, and the old one on the right, but git log
itself always puts older on left, newer on right.)
You have Git compare the snapshots in these two commits.
Some files in those commits are 100% identical: they have the same name and the same content. Git's internal storage system has de-duplicated these files and this makes it very easy for git diff
or git log
to decide that these files are the same, so it can skip right over them if appropriate.
Other files have the same names but different contents. Git assumes, by default, that if the two files have the same name—such as path/to/file.ext
: note that the embedded slashes are just part of the file's name—they represent the "same file", even if the contents have changed. So that file is modified, from the old / left-side commit to the new / right-side commit. If you ask for --name-status
, you'll get M
, modified, as the status for that file name.
Sometimes, the left-side commit has a file named, say, fileL
, and the right-side commit doesn't have that file at all. That file is deleted, apparently, in the change from old (left) to new (right). With --name-status
you would get D
for the status.
Sometimes, the right-side commit has a file named, say, fileR
, and the left-side commit just doesn't. That file is newly added, apparently, and with --name-status
you would get A
for the status.
But what if fileL
on the left and fileR
on the right should be considered to be "the same file"? That is, what if we renamed fileL
to fileR
? This is where Git's rename detector comes in. Given deleted/added pair like this, maybe the content of fileL
is sufficiently close to, or exactly the same as, the content of fileR
. If:
- you have turned on the rename detector, which will actually do this content-checking, and
- the content-checking says "exactly the same" (very fast to know due to the de-duplication) or "sufficiently similar" (much slower, but enabled by the same rename-detector switch),
then—and only then—Git will declare that fileL
was renamed to become fileR
. The --name-status
output will include R
, the similarity index value, and the two file names, rather than the single file name that matches in both left and right side commits.
Now that you know how the rename detector works—and that it has to be switched on—you can see how --follow
works. Remember that with git log
, you can give it a file name, and tell it not to show commits that don't modify that particular file.3 The result is that you only see commits that do modify that file: a subset of the set of all commits that git log
visits. So let's say you run git log --follow -- newpath/my-file.php
:
git log
walks through history, one commit at a time, backwards, as usual.
At each commit, it compares this commit (newer, on right) against its parent (older, on left). Without --follow
it would still do this, but just look to see if the file you named was changed (M
status, from git diff --name-status
) or added or deleted (A
, D
).4 But with --follow
, it also looks for an R
status.
If the file was changed—has M
or A
or D
status—git log
prints out this commit, but if not, it just suppresses the printout. With --follow
, we add the R
status and, if that happens, the two file names. If the status is R
, well, git log
has been looking for newpath/my-file.php
before. But now it knows that, as of the parent commit, the file was called oldpath/my-file.php
. (Note, again, that there is no folder here. The file's name is the whole string, including all the slashes.)
So, with --follow
—which turns on the rename detector—git log
can get a renamed status and therefore see that the file gets renamed. It's also looking for one specific file name, in this case, newpath/my-file.php
. If it detects a rename, git log
not only prints the commit, but also changes the one name it is looking for. Now, instead of newpath/my-file.php
, from the parent commit on backwards, it is looking for oldpath/my-file.php
.
1The --follow
code itself is ... not very good; the whole implementation needs to be reworked, which would probably fix this better than the simpler hack I'm thinking of.
2Technically, some other Git repository could have a different commit that re-uses that hash ID, as long as you never introduce the two commits to each other. In practice, you won't find one, though.
3The --follow
option can only follow one file name. Without --follow
, you can give git log
more than one name, or the name of a "directory" even though Git doesn't really store directories at all. Without --follow
the git log
code operates on generic pathspecs. With --follow
, it only handles one file name. That's a limitation imposed by the algorithm Git is using here.
4It could also have T
, type-changed, and I think that would count. The full set of status letters is ABCDMRTUX
but X
indicates a bug in Git, U
can only occur during an unfinished merge, B
can only occur with git diff
with the -B
option, and C
and R
can only occur with the --find-copies
and --find-renames
(-C
and -M
) options enabled. Note that git diff
may automatically enable --find-renames
based on your diff.renames
setting, but git log
won't.
The bugs in --follow
This process, of removing some commits from the output display from git log
, is called History Simplification. There is a long section in the documentation that describes this, and it begins with this rather odd claim:
Sometimes you are only interested in parts of the history, for example
the commits modifying a particular <path>. But there are two parts of
History Simplification, one part is selecting the commits and the other
is how to do it, as there are various strategies to simplify the
history.
What this weird phrasing—"one part is selecting the commits and the other is how to do it"—is trying to get at is that with history simplification enabled, git log
sometimes doesn't even walk some commits. In particular, consider a merge commit, where two strings-of-commits come together:
C--...--K
/ \
...--A--B M--N--O <-- branch
\ /
D--...--L
To show all commits, git log
will have to walk commit O
, then N
, then M
, then both K
and L
(in some order), then all the commits before K
and all the commits before L
going back to C
and D
, and then rejoin a single thread at commit B
and keep going from there, backwards.
If we're not going to show all commits, though, maybe—just maybe—at commit M
, we could just go back to only commit K
or only commit L
and ignore the other "side" of the merge entirely. That will save a lot of work and time, and avoid showing you stuff that's irrelevant. This is usually a really good thing.
For --follow
, however, it's often a pretty bad thing. This is one of --follow
's issues: sometimes Git will go down the "wrong leg" when doing this kind of simplification. Adding --full-history
avoids this, but we immediately stumble into another problem. The --follow
option has only one file name. If we have a rename in one of the two legs of the commit, but not in the other, and git log
goes down the rename leg first, it may look for the wrong name when it goes down the other leg.
If the file is renamed in both legs, so that it's renamed from M
back to K
and from M
back to L
, or if Git happens to go down the correct leg in the first place and you don't care about the other leg, everything works. But it's something to be aware of. (This is not the problem that's hitting you with --grep
, or it would occur without --grep
.)
I think the bug you are seeing is that --grep
is firing off "too early", as it were. The --grep
option works by eliminating, from git log
's output, any commit that has (--invert-grep
) or lacks (--grep
without --invert-grep
) some particular text in its commit message. Suppose, then, that the rename commit—the one that causes git log --follow
to know to use the name oldpath/my-file.php
—gets skipped by your --grep
option. Git won't see the R
status, and won't know to change the name from newpath/my-file.php
to oldpath/my-file.php
. So git log --follow
will keep looking for the new path, and you'll get only those commits that both meet the grep criteria and modify a file with the new name.
This bug could be fixed by having git log --follow
run the diff engine anyway, even if it's going to skip the commit for other reasons. But more generally --follow
needs a complete rewrite: it has a bunch of weird special case code threaded through the diff engine just to make this one case work. It needs to handle multiple path names and/or pathspecs, and work with --reverse
and other options. It needs a way to stack old and new names onto commit paths, so that with --full-history
, going down both legs of merges, it knows which path to be looking for. Note that this has other implications: what if, going down both legs of a merge, there are different renames? If there was a rename/rename conflict that someone fixed manually in the merge, how do we deal with that?
git log -M --oneline --all --follow -- newpath/my-file.php
?--
marks end of options. – Plywood--follow
behind the--
– Linnea--grep="rename" --invert-grep
to remove the "rename" commit, I get 0 results – Linnea--
part. – Plywood