How to REALLY show logs of renamed files with git
Asked Answered
git
H

6

188

I'm relatively new to Git. I used Subversion (SVN) before.

I noticed that most of the graphical Git front-ends and IDE plugins don't seem to be able to display the history of a file if the file has been renamed. When I use

git log --follow

on the command line, I can see the whole log across renames.

According to Linus Torvalds (alternative link) the --follow switch is a "SVN noob" pleaser; serious Git users don't use it:

--follow is a total hack, meant to just satisfy ex-SVN users who never knew anything about things like parenthood or nice revision graphs anyway.

It's not totally fundamental, but the current implementation of "--follow" is really a quick preprocessing thing bolted onto the revision walking logic, rather than being anything really integral.

It literally was designed as a "SVN noob" pleaser, not as a "real git functionality" thing. The idea was that you'd get away from the (broken) mindset of thinking that renames matter in the big picture.

How do the hardcore Git users get the history of a file when it was renamed? What is the 'real' way to do this?

Hazeghi answered 21/4, 2011 at 11:55 Comment(12)
Are you looking for something more hardcore than using git mv oldfile newfile which maintains the history for the renamed file?Rahman
@David Hall: git mv oldfile newfile doesn't cause the rename to be recorded at all - it's just the same as deleting one file and adding another. git only works out renames and copies from the state of the tree at each commit after the fact.Stepdaughter
@Mark thanks - didn't know that. But am I correct that using the mv command gives git enough of a helping hand that the history will be intact, whereas renaming in other ways (e.g. outside of git) might break the history?Rahman
@David Hall: If you rename the file with another tool outside git (e.g. /bin/mv oldfile newfile), but then do git add newfile; git rm oldfile, the result is indistinguishable from that of git mv oldfile newfile.Stepdaughter
This ideology falls apart if you ever move a file to a new repository, in which case the inability to move its entire history may be a major issue. Although of course there are limits to how much true history can really come with a file in a complex project.Vada
Note: git log --follow improves a bit with git 2.9 (June 2016): see my answer belowMargarito
As of v2.15, you may want to experiment with --color-moved when you diff.Shadbush
the arrogance of that comment is outstanding... git might be good for some thing, but it's a complete clusterfck in many other cases, as in keeping track of history when you re-organize filesJoletta
The link to Linus' email is dead. You can see it in git.661346.n2.nabble.com/…Ecotype
Torvalds built a simple-minded snapshot-based version control program with poor support for renaming and then arrogantly dismissed the valuable ideas and expectations of people who have not only worked but even implemented systems with good rename support, insinuating that they are inexperienced people who have not thought everything through like he has.Antiproton
For instance, he claimed that, aha, you idiots who want renaming have not thought about the case when two branches introduce different objects by the same name and then have to merge. But I solved the problem in the Meta-CVS version control system: there would be a conflict on the directory structure that you would resolve by giving the files different names, or possibly deprecating one of the object (so it's not in the directory structure). You could combine the files into one and choose one of the two objects to have that name and content, removing the other, and such.Antiproton
seems that git official faq ever take a related summary: Why does Git not track renames?Curtal
P
86

I think that the general drive behind Linus' point is that—and take this with a pinch of salt—hardcore Git users don't ever care about the history of a "file". You put content in a Git repository because the content as a whole has a meaningful history.

A file rename is a small special case of "content" moving between paths. You might have a function that moves between files which a Git user might track down with the "pickaxe" functionality (e.g., log -S).

Other "path" changes include combining and splitting files; Git doesn't really care which file you consider renamed and which one you consider copied (or renamed and deleted). It just tracks the complete content of your tree.

Git encourages "whole tree" thinking whereas many version control systems are very file-centric. This is why Git refers to "paths" more often than it refers to "filenames".

Perseverance answered 21/4, 2011 at 12:11 Comment(8)
Hi Charles, Thanks for your answer. It seems that I use git very much the same way I used SVN. Although I understand that git is very different than other version control systems, many concepts in git appear strange to me yet... I should probably finish that git book I recently bought.Hazeghi
Linus' point was that a "proper" gui would be able to track chunks of code across files, and he was hoping we would have such tools by now. Unfortunately, we still don't have that luxury, and --follow is still useful.Prisca
Does git actually give you a solution other than --follow for this?Rubbico
@Rubbico not currently. I suppose the "proper" way Linus is thinking of is to log <path>, show the oldest commit in the log, manually identify where the content came from, then repeat from log <old-path>. This would assume your commits are small for it to be practical - but then again, tiny commits is already a huge boost to quality of life.Facilitate
I would argue that "whole tree" thinking is enhanced by --follow being the default. What I mean is when I want to see the history of the code within a file, I really don't usually care whether the file was renamed or not, I just want to see the history of the code, regardless of renames. So in my opinion it makes sense for --follow to be the default because I don't care about individual files; --follow helps me to ignore individual file renames, which are usually pretty inconsequential.Taejon
So... if I decide to care about the "content" instead of the file, how do I print the commits relevant to the content that's currently in this file? I would be delighted if git tracked it all across different files for me and reported a log of all the changes -- I just don't see how to get it.Diao
The problem is when the filename actually has relevance. Such as a user wondering what happened to a file that existed in a project that spans 20 years of history. When such a user attempts to upgrade to a newer version and a file they had modified years ago is not there at all, how do they find out where their local changes now need to be applied?Potful
@DreadQuixadhal this is the situation I'm currently experiencing. Did you find a solution for it?Lorianne
R
42

I have exactly the same issue that you are facing. Even though I can give you no answer, I believe you can read this email Linus wrote back in 2005, it is very pertinent and might give you a hint about how to handle the problem:

…I'm claiming that any SCM that tries to track renames is fundamentally broken unless it does so for internal reasons (ie to allow efficient deltas), exactly because renames do not matter. They don't help you, and they aren't what you were interested in anyway.

What matters is finding "where did this come from", and the git architecture does that very well indeed - much better than anything else out there. …

I found it referenced by this blog post, which could also be useful for you to find a viable solution:

In the message, Linus outlined how an ideal content tracking system may let you find how a block of code came into the current shape. You'd start from the current block of code in a file, go back in the history to find the commit that changed the file. Then you inspect the change of the commit to see if the block of code you are interested in is modified by it, as a commit that changes the file may not touch the block of code you are interested in, but only some other parts of the file.

When you find that before the commit the block of code did not exist in the file, you inspect the commit deeper. You may find that it is one of the many possible situations, including:

  1. The commit truly introduced the block of code. The author of the commit was the inventor of that cool feature you were hunting its origin for (or the guilty party who introduced the bug); or
  2. The block of code did not exist in the file, but five identical copies of it existed in different files, all of which disappeared after the commit. The author of the commit refactored duplicated code by introducing a single helper function; or
  3. (as a special case) Before the commit, the file that currently contains the block of the code you are interested in itself did not exist, but another file with nearly identical contents did exist, and the block of the code you are interested in, together with all the other contents in the file existed back then, did exist in that other file. It went away after the commit. The author of the commit renamed the file while giving it a minor modification.

In git, Linus's ultimate content tracking tool does not yet exist in a fully automated fashion. But most of the important ingredients are available already.

Please, keep us posted about your progress on this.

Rive answered 12/4, 2012 at 19:48 Comment(4)
Thanks for posting those articles. It wasn't until I read them that I fully grasped the idea of content history!.I have been thinking about this the wrong way!Avruch
That email from Linus is great, thanks for posting this.Ubald
Fun fact, Git v2.15 adds --color-moved a move toward that "ideal tracking system." I was playing with it to see it track moved lines within a file, but realized accidentally that it tracks moved lines in the entire diff.Shadbush
Linus explains a complex situation. But here we have a simple situation: a file was just renamed (or moved to another directory). Thus there should be a simple solution. I think that the issue comes from the fact than contrary to Subversion, the user cannot instruct Git at commit time where a file comes from, and that --follow can be wrong (e.g., if 2 files have the same contents, or if there is a modification in addition to the file move).Discouragement
P
17

I noticed that most of the graphical git front-ends and IDE plugins don't seem to be able to display the history of a file if the file has been renamed

You'll be happy to know that some popular Git UI tools now support this. There are dozens of Git UI tools available, so I won't list them all, but for example:

  • Sourcetree, when viewing a file log (right-click on a file from a commit and choose Log selected... from the dropdown), has a checkbox "Follow renamed files" in the bottom left. If a file has had more than one rename, you need to jump to the "rename" commit and repeat this again.
  • TortoiseGit has a "follow renames" checkbox on the log window in the bottom left.

More information on Git UI tools:

Prisca answered 8/9, 2014 at 14:8 Comment(4)
Source works great for when renaming once, when renaming twice, change details is not available for commits before renaming. I reported the bug here: jira.atlassian.com/browse/SRCTREE-5715Resident
gitk works great even when a file is renamed twice in history. The command looks like this "gitk --follow path/to/file"Resident
Nice!!! Would never have spotted "Follow renamed files” in SourceTree but for your answer.Catherine
SourceTree worked great for me in 2023. The only issue was that if the file had been renamed twice or more - I had to jump to the commit with the rename to see the previous logs and then repeat that for another rename. But all the logs were retrieavable.Halfhearted
M
9

Note: git 2.9 (June2016) will improve quite a bit the "buggy" nature of git log --follow:

See commit ca4e3ca (30 Mar 2016) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit 26effb8, 13 Apr 2016)

diffcore: fix iteration order of identical files during rename detection

If the two paths 'dir/A/file' and 'dir/B/file' have identical content and the parent directory is renamed, e.g. 'git mv dir other-dir', then diffcore reports the following exact renames:

renamed:    dir/B/file -> other-dir/A/file
renamed:    dir/A/file -> other-dir/B/file

(note the inversion here: B/file -> A/file, and A/file -> B/file)

While technically not wrong, this is confusing not only for the user, but also for git commands that make decisions based on rename information, e.g. 'git log --follow other-dir/A/file' follows 'dir/B/file' past the rename.

This behavior is a side effect of commit v2.0.0-rc4~8^2~14 (diffcore-rename.c: simplify finding exact renames, 2013-11-14): the hashmap storing sources returns entries from the same bucket, i.e. sources matching the current destination, in LIFO order.
Thus the iteration first examines 'other-dir/A/file' and 'dir/B/file' and, upon finding identical content and basename, reports an exact rename.


With Git 2.31 (Q1 2021), the file-level rename detection has been improved for diffcore.

See commit 350410f (29 Dec 2020), and commit 9db2ac5, commit b970b4e, commit ac14de1, commit 5c72261, commit 81c4bf0, commit ad8a1be, commit 00b8ccc, commit 26a66a6 (11 Dec 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit a5ac31b, 25 Jan 2021)

diffcore-rename: accelerate rename_dst setup

Signed-off-by: Elijah Newren

register_rename_src() simply references the passed pair inside rename_src.

In contrast, add_rename_dst() did something entirely different for rename_dst.
Instead of copying the passed pair, it made a copy of the second diff_filespec from the passed pair, referenced it, and then set the diff_rename_dst.pair field to NULL.
Later, when a pairing is found, record_rename_pair() allocated a full diff_filepair via diff_queue() and pointed its src and dst fields at the appropriate diff_filespecs.

This contrast between register_rename_src() for the rename_src data structure and add_rename_dst() for the rename_dst data structure is oddly inconsistent and requires more memory and work than necessary.
[...] This patch accelerated the setup time by about 65%, and final write back to the output queue time by about 50%, resulting in an overall drop of 3.5% on the execution time of rebasing a few dozen patches.

Margarito answered 14/4, 2016 at 6:46 Comment(0)
T
6

Here's how you do it:

git log -M --summary | grep rename | grep BASENAME

Just put the basename in there.

If too many results then you can also serially grep each intermediate directory name.

Thready answered 10/7, 2021 at 15:39 Comment(2)
Thanks for this! For future readers: I got warning: exhaustive rename detection was skipped due to too many files. \n warning: you may want to set your diff.renameLimit variable to at least 1800 and retry the command. so I did git config --global diff.renameLimit 1800 and ran the command again.Hyperion
Also, this command does not tell the commit id so I added a --before=20 to each grep to find that in the grep context. P.S: also found that --line-buffered option for grep helps with not waiting for the full results in the pipe so that speeds up feedback too!Hyperion
S
2

On Linux, I have verified that SmartGit and GitEye is able to follow renames when following the history of a particular file. However, unlike gitk and GitEye, SmartGit shows a separate file view and repository view (which contains the directory structure, but not the list of files contained within).

Slotnick answered 2/1, 2015 at 4:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.