git copy file, as opposed to `git mv`
Asked Answered
R

2

49

I realize that git works by diff'ing the contents of files. I have some files that I want to copy. To absolutely prevent git from ever getting confused, is there some git command that can be used to copy the files to a different directory (not mv, but cp), and stage the files as well?

Robby answered 20/11, 2017 at 22:1 Comment(8)
git mv is a thing, but git cp is not a thingRobby
How do you think git will get confused?Electroscope
It probably won't, I can just use cp I guess?Robby
Yes... there's no danger here. git mv is just a shortcut (for mv, git remove and git rm)... it's not about avoiding confusion.Electroscope
git cp would be nice for this use case: #16937859Albemarle
Possible duplicate of Record file copy operation with GitSerous
There is a solution: #16937859Kinna
There is another approach with Solution I've already tested, works well.Kinna
S
63

The short answer is just "no". But there is more to know; it just requires some background. (And as JDB suggests in a comment, I'll mention why git mv exists as a convenience.)

Slightly longer: you're right that Git will diff files, but you may be wrong about when Git does these file-diffs.

Git's internal storage model proposes that each commit is an independent snapshot of all the files in that commit. The version of each file that goes into the new commit, i.e., the data in the snapshot for that path, is whatever is in the index under that path at the time you run git commit.1

The actual implementation, to the first level, is that each snapshotted-file is captured in compressed form as a blob object in the Git database. The blob object is quite independent of every previous and subsequent version of that file, except for one special case: if you make a new commit in which no data have changed, you will re-use the old blob. So when you make two commits in a row, each of which holds 100 files, and only one file is changed, the second commit re-uses 99 previous blobs, and need only snapshot one actual file into a new blob.2

Hence the fact that Git will diff files doesn't enter into making commits at all. No commit depends on a previous commit, other than to store the previous commit's hash ID (and perhaps to re-use exactly-matching blobs, but that's a side effect of them exactly matching, rather than a fancy computation at the time you run git commit).

Now, all these independent blob objects do eventually take up an exorbitant amount of space. At this point, Git can "pack" objects into a .pack file. It will compare each object to some selected set of other objects—they may be earlier or later in history, and have the same file name or different file names, and in theory Git could even compress a commit object against a blob object or vice versa (though in practice it doesn't)—and try to find some way to represent many blobs using less disk space. But the result is still, at least logically, a series of independent objects, retrieved completely intact in their original form using their hash IDs. So even though the amount of disk space used goes down (we hope!) at this point, all of the objects are exactly the same as before.

So when does Git compare files? The answer is: Only when you ask it to. The "ask time" is when you run git diff, either directly:

git diff commit1 commit2

or indirectly:

git show commit  # roughly, `git diff commit^@ commmit`
git log -p       # runs `git show commit`, more or less, on each commit

There are a bunch of subtleties about this—in particular, git show will produce what Git calls combined diffs when run on merge commits, while git log -p normally just skips right over the diffs for merge commits—but these, along with some other important cases, are when Git runs git diff.

It's when Git runs git diff that you can (sometimes) ask it to find, or not to find, copies. The -C flag, also spelled --find-copies=<number>, asks Git to find copies. The --find-copies-harder flag (which the Git documentation calls "computationally expensive") looks harder for copies than the plain -C flag. The -B (break inappropriate pairings) option affects -C. The -M aka --find-renames=<number> option also affects -C. The git merge command can be told to adjust its level of rename detection, but—at least currently—cannot be told to find copies, nor break inappropriate pairings.

(One command, git blame, does somewhat different copy-finding and the above does not entirely apply to it.)


1If you run git commit --include <paths> or git commit --only <paths> or git commit <paths> or git commit -a, think of these as modifying the index before running git commit. In the special case of --only, Git uses a temporary index, which is a little bit complicated, but it still commits from an index—it just uses the special temporary one instead of the normal one. To make the temporary index, Git copies all the files from the HEAD commit, then overlays those with the --only files you listed. For the other cases, Git just copies the work-tree files into the regular index, then goes on to make the commit from the index as usual.

2In fact, the actual snapshotting, storing the blob into the repository, happens during git add. This secretly makes git commit much faster, since you don't normally notice the extra time it takes to run git add before you fire up git commit.


Why git mv exists

What git mv old new does is, very roughly:

mv old new
git add new
git add old

The first step is obvious enough: we need to rename the work-tree version of the file. The second step is similar: we need to put the index version of the file into place. The third, though, is weird: why should we "add" a file we just removed? Well, git add doesn't always add a file: instead, in this case it detects that the file was in the index and isn't anymore.

We could also spell that third step as:

git rm --cached old

All we're really doing is taking the old name out of the index.

But there's an issue here, which is why I said "very roughly". The index has a copy of each file that will be committed the next time you run git commit. That copy might not match the one in the work-tree. In fact, it might not even match the one in HEAD, if there is one in HEAD at all.

For instance, after:

echo I am a foo > foo
git add foo

the file foo exists in the work-tree and in the index. The work-tree contents and the index contents match. But now let's change the work-tree version:

echo I am a bar > foo

Now the index and work-tree differ. Suppose we want to move the underlying file from foo to bar, but—for some strange reason3—we want to keep the index contents unchanged. If we run:

mv foo bar
git add bar

we'll get I am a bar inside the new index file. If we then remove the old version of foo from the index, we lose the I am a foo version entirely.

So, git mv foo bar doesn't really move-and-add-twice, or move-add-and-remove. Instead, it renames the work-tree file and renames the in-index copy. If the index copy of the original file differs from the work-tree file, the renamed index copy still differs from the renamed work-tree copy.

It's very difficult to do this without a front end command like git mv.4 Of course, if you plan to git add everything, you don't need all of this stuff in the first place. And, it's worth noting that if git cp existed, it probably should also copy the index version, not the work-tree version, when making the index copy. So git cp really should exist. There also should be a git mv --after option, a la Mercurial's hg mv --after. Both should exist, but currently don't. (There's less call for either of these, though, than there is for straight git mv, in my opinion.)


3For this example, it's kind of silly and pointless. But if you use git add -p to carefully prepare a patch for an intermediate commit, and then decide that along with the patch, you would like to rename the file, it's definitely handy to be able to do that without messing up your carefully-patched-together intermediate version.

4It's not impossible: git ls-index --stage will get you the information you need from the index as it is right now, and git update-index allows you to make arbitrary changes to the index. You can combine these two, and some complex shell scripting or programming in a nicer language, to build something that implements git mv --after and git cp.

Suetonius answered 20/11, 2017 at 22:47 Comment(6)
I think you might want to work in why there's no need for a git cp, because git mv is just shorthand for mv, git add and git rm.Electroscope
@JDB: done, though I actually think "git cp" would make sense (see the new section).Suetonius
There is a git cp in git extras. I don't know enough about how git works to make any statement about whether it does sane things.Nedranedrah
@dementedhedgehog: assuming you mean github.com/tj/git-extras/blob/master/bin/git-cp (found via google search for git-extras), it does not copy the index version (which I think would be the correct action), and it does run git commit at the end (which I think is inappropriate), so it's not what I would provide. But tastes differ.Suetonius
Hmm interesting.Nedranedrah
@Suetonius you need to see the answer I am about to provide to this one... it is poetry in movement.... or rather, in branching, anyway.Supposititious
S
2

This is hackish but it can be solved by tricking git itself by doing a rename on a separate branch and forcing git to keep both files on a merge.

git checkout -b rename-branch
git mv a.txt b.txt
git commit -m "Renaming file"
# if you did a git blame of b.txt, it would _follow_ a.txt history, right?
git checkout main
git merge --no-ff --no-commit rename-branch
git checkout HEAD -- a.txt # get the file back
git commit -m "Not really renaming file"

With a straight copy, you get this:

$ git log --graph --oneline --name-status
* 70f03aa (HEAD -> master) COpying file straight
| A     new_file.txt
* efc04f3 (first) First commit for file
  A     hello_world.txt
$ git blame -s new_file.txt
70f03aab 1) I am here
70f03aab 2) 
70f03aab 3) Yes I am
$ git blame -s hello_world.txt
^efc04f3 1) I am here
^efc04f3 2) 
^efc04f3 3) Yes I am

Using the rename on the side and getting the file back you get:

$ git log --oneline --graph master2 --name-status
*   30b76ab (HEAD, master2) Not really renaming
|\  
| * 652921f Renaming file
|/  
|   R100        hello_world.txt new_file.txt
* efc04f3 (first) First commit for file
  A     hello_world.txt
$ git blame -s new_file.txt
^efc04f3 hello_world.txt 1) I am here
^efc04f3 hello_world.txt 2) 
^efc04f3 hello_world.txt 3) Yes I am
$ git blame -s hello_world.txt
^efc04f3 1) I am here
^efc04f3 2) 
^efc04f3 3) Yes I am

Rationale is that if you want to see history of the original file git will do it without issues.... if you want to do it on the copy, then git will follow the separate branch where the rename is and then it will be able to jump to the original file following the copy, just because it's done on that branch.

Supposititious answered 5/4, 2023 at 18:37 Comment(1)
There's a longer version of this solution that takes 3 branches. But this one works with fewer steps. My only difference is that I already had my changes on 'main' so I had to git checkout -b rename_branch and then git reset --hard HEAD~ on main.Then

© 2022 - 2024 — McMap. All rights reserved.