Can Git track the movement of a single function from 1 file to another? How?
Asked Answered
F

5

81

Several times, I have come across the statement that, if you move a single function from one file to another file, Git can track it. For example, this entry says, "Linus says that if you move a function from one file to another, Git will tell you the history of that single function across the move."

But I have a little bit of awareness of some of Git's under-the-hood design, and I don't see how this is possible. So I'm wondering ... is this is a correct statement? And if so, how is this possible?

My understanding is that Git stores each file's contents as a Blob, and each Blob has a globally unique identity which arises from the SHA hash of its contents and size. Git then represents folders as Trees. Any filename information belongs to the Tree, not to the Blob, so a file rename for example shows up as a change to a Tree, not to a Blob.

So if I have a file called "foo" with 20 functions in it, and a file called "bar" with 5 functions in it, and I move one of the functions from foo into bar (resulting in 19 and 6, respectively), how can Git detect that I moved that function from one file to another?

From my understanding, this would cause 2 new blobs to exist (one for the modified foo and one for the modified bar). I realize a diff could be calculated to show that the function was moved from one file to the other. But I don't see how history about the function could possibly become associated with bar instead of foo (not automatically, anyway).

If Git were to actually look inside of single files, and compute a blob per function (which would be crazy / infeasible, because you'd have to know how to parse any possible language), then I could see how this might be possible.

So ... is the statement correct or not? And if it is correct, then what is lacking in my understanding?

Febrifugal answered 5/2, 2011 at 17:25 Comment(7)
I don't think it tracks "functions" but rather "chunks of code" -- so if you have a 30-line function and break it into two 15-line functions, it will track that in much the same way as if you moved the whole function. Someone correct me if I'm wrong please.Ectomy
My understanding (which may very well be wrong and that's why I'm asking) is that every file corresponds to at most one Blob. So splitting one func into 2 smaller funcs in the same file would simply cause your old Blob to be replaced with a new Blob. If that's correct, then it doesn't really track "chunks of code", because it never looks inside of a file. In other words, its smallest granularity is one whole file.Febrifugal
If you're just splitting the file in two (or several chunks), then it's possible to trick the move pointers in two(+) branches to point to the same old file, so when you merge these two branches you get the same file "renamed twice" (or more times), meaning two+ files with same ancestor for their move. But merely for moving a little snippet from one large file to another large file that trick won't work, as you've observed. Only AST-based (typically language specific) tools can track refactoring like that with high precision.Clerissa
Also, it's true as some answer below says that techincally there's no parent file pointer, but if you look at gitk when you both rename and change a file in the same commit, you see something like "similarity index 95% rename from src/foo.txt rename to src/bar.txt". That comes from the git-diff-index backend. So it tracks moves by (high) textual similarity. Basically in order to help git track renames, you need to have intermediate commits with as few changes as possible, besides the file renames.Clerissa
So if you want to move a small chucnk of a file to a new one, you (1) branch, (2) rename, (3) commit [quite important] (4) delete the large part of the file leaving just the small chunk of interest (5) commit again, (6) merge back into the mainline branch. That effectively creates a proper "file move" pointer beucase there is one commit with high textual similarity (created at 3) and git has no trouble tracking any amount of deleted material if the file is not renamed in the same commit (created at 5).Clerissa
Oh, and before you do the step 6 merge, you really need to change the file on main branch, so as to force a modify/delete merge conflict. Otherwise your big file on the main branch will be entirely gone. Typically forcing that conflict is not an issue as you want the lines you've moved to the small file gone from the big file. But the order of operations is important here. So "step 5bis" is to delete the chunk in big file, before you do the merge. Git is not quite that magic, alas.Clerissa
Btw, Raymond Chen dissaproves of this method, even though it's the most intuitive (except for the part 5b perhaps, where we had to make a "forward looking" change before merging. Instead, Chen wants us to use the lower level git commit-tree and git write-tree directly to set up the right result. Which is more than the average git user can muster, I assure you.Clerissa
O
35

This functionality is provided through git blame -C <file>.

The -C option drives git into trying to find matches between addition or deletion of chunks of text in the file being reviewed and the files modified in the same changesets. Additional -C -C, or -C -C -C extend the search.

Try for yourself in a test repo with git blame -C and you'll see that the block of code that you just moved is originated in the original file where it belonged to.

From the git help blame manual page:

The origin of lines is automatically followed across whole-file renames (currently there is no option to turn the rename-following off). To follow lines moved from one file to another, or to follow lines that were copied and pasted from another file, etc., see the -C and -M options.

Oran answered 19/5, 2012 at 11:53 Comment(7)
As a test, I created a repo with three files, and added a line to file1 then committed. I then moved that line to file2, and committed again. Then to file3, and committed. git blame -C10 file3 then showed the first commit where that line was added to file1, but I really wanted to see the most recent commit which moved that line (I.e., the commit which moved the line to file2.) Is there any way to accomplish that? I got some useful information by using git log -S'my interesting line', but still not quite what I'm after.Sitarski
@Sitarski it seems that plain git blame would be suitable for this.Joscelin
@Joscelin It's 4 years later, so I don't remember what I was really trying to accomplish. But git blame would only show the most recent change to the line (whether a move or not), where my comment asked for the "most recent commit which moved that line" (presumably after some more commits changing the line have been made).Sitarski
-CC and -CCC don't seem to work... here on git version 2.15.0.rc0, I need to pass the isolated -C switch separately multiple times for it to have the documented effect. The documentation kinda indicates this, at least implicitly. Yet this answer and other comments indicate this has worked in the past. Hmmm.Sidestep
As of Git 2.15, there is, I think, a better way.Ensepulcher
This isn't incredibly useful outside of a one-man shop because: "You can tweak the git blame algorithms with options like -M and -C to get it to try harder, but in practice, you don’t often have control over those options: The git blame may be performed on a server, and the results reported back to you on a web page. " (continues)Clerissa
"Or the git blame is performed by a developer sitting at another desk (whose command line options you don’t get to control), and poor Greg has to deal with all the tickets that get assigned to him from people who used the git blame output to figure out who introduced the line that’s causing problems." So the problem with this approach is that relies entirely on the "receiver end" of git clone to figure out wtf happened, what got copied where and by whom.Clerissa
E
21

As of Git 2.15, git diff now supports detection of moved lines with the --color-moved option. It works for moves across files.

It works, obviously, for colorized terminal output. As far as I can tell, there is no option to indicate moves in plain text patch format, but that makes sense.

For default behavior, try

git diff --color-moved

The command also takes options, which currently are no, default, plain, zebra and dimmed_zebra (Use git help diff to get the latest options and their descriptions). For example:

git diff --color-moved=zebra

As to how it is done, you can glean some understanding from this email exchange by the author of the functionality.

Ensepulcher answered 9/11, 2017 at 2:39 Comment(2)
Is there a way to configure git that it apply --color-moved option by default?Sikata
@EugenKonkov Yes, use git config to set diff.colorMoved.Ensepulcher
S
8

A bit of this functionality is in git gui blame (+ filename). It shows an annotation of the lines of a file, each indicating when it was created and when last changed. For code movement across a file, it shows the commit of the original file as a creation, and the commit where it was added to the current file as last change. Try it.

What I really would want is to give git log as some argument a line number range additionally to a file path, and then it would show the history of this code block. There is no such option, if the documentation is right. Yes, from Linus' statement I too would think such a command should be readily available.

Sunken answered 6/2, 2011 at 12:48 Comment(4)
I just now saw gui blame for the first time. Nice. I'm starting to think that perhaps this is what Linus meant. Not that Git internally stores information saying that the function moved from one file to another, but that, given the information Git does store, you can determine that the function moved (like git gui blame does, or via a diff like I mentioned in the question). If so, this would mean my original understanding is right that it is all about Commits, Trees and Blobs, and Git never looks inside a file. But that's enough info to let you detect a function move via analysis. Perhaps.Febrifugal
Yes, I think this is it. The git backend does now nothing about the file contents (apart from maybe storing them a bit size-optimized as diffs), but the frontend tools have to do everything.Bala
There just seems to be one problem... how do I walk through the history in chronological order? It's a bit top-posted...Alumnus
@AgentFriday you might need to install that separately. On Ubuntu, for example, it's available in the git-gui package.Bala
X
5

git doesn't actually track renames at all. A rename is just a delete and add, that's all. Any tools who show renames reconstruct them from this history information.

As such, tracking function renames is a simple matter of analyzing the diffs of all files in each commit after the fact. There's nothing particularly impossible about it; the existing rename tracking already handles 'fuzzy' renames, in which some changes are done to the file as well as renaming it; this requires looking at the contents to the files. It would be a simple extension to look for function renames as well.

I don't know if the base git tools actually do this however - they try to be language neutral, and function identification is very much not language neutral.

Xhosa answered 5/2, 2011 at 17:53 Comment(2)
I wasn't referring to "function renames". Rather, I'm asking about the case of moving a subset of one file's text out of that file and into another file.Febrifugal
you are right but your comment is unclear and first few words would suggest (me) that you misunderstood Q, edit it or something please. on topic, git uses (system?) diff and that is all the power it has over this, it can "track" function rename but it's not particularly smart about it. It's basically just one line diff and you can track that thing.Devito
L
2

There's git diff that will show you that certain lines disappeared from foo and reappeared in bar. If there are no other changes in these files in the same commit, the change will be easy to spot.

An intellectual git client would be able to show you how lines moved from one file to another. A language-aware IDE would be able to correspond this change with a particular function.

A very similar thing happens when a file gets renamed. It just disappears under one name and reappears under another, but any reasonable tool is able to notice it and represent as a rename.

Lolly answered 5/2, 2011 at 17:48 Comment(7)
Is there an extant client that allows a person to display the history of a function?Maryellen
William: you should try "git gui blame path/to/filename.ext" or "git blame -CCCw path/to/filename.ext" (with the former having a pretty usable GUI and the latter including better diagnostics for hard moves and copies). Unfortunately, I think that there's no way to pass "-CCCw" options to git gui blame.Alvaroalveolar
Actually "git gui blame" can be used to get results of "git blame -CCCw" by using git newer than 1.5.3 and selecting "Do full copy detection" from the right mouse button context menu after loading the file (I just checked the source file at /usr/share/git-gui/lib/blame.tcl).Alvaroalveolar
@MikkoRantalainen Did -CC or -CCC ever work? They certainly don't seem to now (git version 2.15.0.rc0)Sidestep
@Sidestep Do you get a warning message of some kind? Still seems to work with git version 2.7.4 and git help blame knows about -C: "When this option is given three times, the command additionally looks for copies from other files in any commit."Alvaroalveolar
@MikkoRantalainen Yes, but that requires the option to be given multiple times, as discrete switches, i.e. -C -C or -C -C -C. Ganging up Cs in a single argument does not have the correct effect for 2 or 3 Cs in my version. As of now, -C is documented as taking an optional numerical argument, so perhaps that was not always there, and may be what is resulting in consecutive Cs not having the desired effect (e.g. git tries and fails to interpret the 2nd C as a number, etc.)Sidestep
@Sidestep I'm not sure if -CCC ever worker correctly (perhaps it just failed to output an error message in older versions even though it parsed the same way). I agree that with optional argument the syntax -CCC might not be stable in the long run. As such, one probably wants to use git blame -C -C -C -w -- path/to/file.ext instead.Alvaroalveolar

© 2022 - 2024 — McMap. All rights reserved.