Record file copy operation with Git
Asked Answered
H

3

186

When I move a file in git using git-mv the status shows that the file has been renamed and even if I alter some portions it still considers to be almost the same thing (which is good because it lets me follow the history of it).

When I copy a file the original file has some history I'd like to associate with the new copy.

I have tried moving the file then trying to re-checkout in the original location - once moved git won't let me checkout the original location.

I have tried doing a filesystem copy and then adding the file - git lists it as a new file.

Is there any way to make git record a file copy operation in a similar way to how it records a file rename/move where the history can be traced back to the original file?

Heron answered 25/6, 2009 at 11:22 Comment(1)
You should consider accepting Robert's answer. It works perfectly.Adjective
S
128

Git does not do rename tracking nor copy tracking, which means it doesn't record renames or copies. What it does instead is rename and copy detection. You can request rename detection in git diff (and git show) by using the -M option, you can request additional copy detection in changed files by using the -C option instead, and you can request more expensive copy detection among all files with -C -C. See the git-diff manpage.

-C -C implies -C, and -C implies -M.

-M is a shortcut for --find-renames, -C means --find-copies and -C -C can also be spelled out as --find-copies-harder.

You can also configure git to always do rename detection by setting diff.renames to a boolean true value (e.g. true or 1), and you can request git to do copy detection too by setting it to copy or copies. See the git-config manpage.

Check also the -l option to git diff and the related config variable diff.renameLimit.


Note that git log <pathspec> works differently in Git: here <pathspec> is set of path delimiters, where path can be a (sub)directory name. It filters and simplifies history before rename and copy detection comes into play. If you want to follow renames and copies, use git log --follow <filename> (which currently is a bit limited, and works only for a single file).

Stoneham answered 25/6, 2009 at 12:9 Comment(7)
@allyourcode: What you are confused about? To turn on copy detection by default you set diff.renames to copies (e.g. 'git config diff.renames copies'). I agree that it is a bit counterintuitive.Ridley
One section I can't seem to parse is "and you can request to do by default also rename detection". Are you saying there's four values that diff.renames can use (true, 1, copy, copies), and that they all do the same thing?Maus
@allyourcode: I'm sorry, I haven't noticed this. Fixed now, thanks.Ridley
Ok, so Git does not record renames or copies. Now I am also interested in the question whether Git then stores everything duplicated, or whether it uses an intelligent de-duplication algorithm based on file-part hashes or similar - so that the data that was copied is stored only once in the repository?Noon
@peschü: Git uses content-addressed object database as a repository storage. File contents is stored in 'blob' contents under address that is SHA-1 hash of contents (well, type+length+contents). This means that given contents is stored only once. Nb. this automatic deduplication was the reason behind creating "bup" backup system, using git pack format.Ridley
Unlike the solution below, this doesn't work with change tracking in a range. Git log allows a range argument (git log -L123,456:file.xyz) that properly follows renames, but not copies, and you can't pass --follow in that case; also, AFAICT, this doesn't work with git blame.Bax
Unfortunately, the packaged gitk has no switch to activate --find-copies-harder, see https://mcmap.net/q/22409/-how-to-add-better-copy-detection-to-gitk/1389680 .Riemann
R
138

If for some reason (e.g. using gitk) you cannot turn on copy detection as in Jakub Narębski's answer, you can force Git to detect the history of the copied file in three commits:

  • Instead of copying, switch to a new branch and move the file to its new location there.
  • Re-add the original file there.
  • Merge the new branch to the original branch with the no-fast-forward option --no-ff.

Credits to Raymond Chen. What follows is his procedure. Say the file is named OriginalFileName.cpp, and you want the duplicate to be named DuplicateFileName.cpp:

fileOriginal=OriginalFileName.cpp
fileDuplicate=DuplicateFileName.cpp
branchName=duplicate-OriginalFileName

echo "$fileOriginal, $fileDuplicate, $branchName" # review of defined names

git checkout -b $branchName # create and switch to branch

git mv $fileOriginal $fileDuplicate # make the duplicate
git commit -m "Duplicate $fileOriginal to $fileDuplicate"

git checkout HEAD~ $fileOriginal # bring back the original
git commit -m "Restore duplicated $fileOriginal"

git checkout - # switch back to source branch
git merge --no-ff $branchName -m "Merge branch $branchName" # merge dup into source branch

Note that this can be executed on Windows in Git Bash.


2020-05-19: The above solution has the advantages of not changing the log of the original file, not creating a merge conflict, and being shorter. The former solution had four commits:

  • Instead of copying, switch to a new branch and move the file to its new location there.
  • Switch to the original branch and rename the file.
  • Merge the new branch into the original branch, resolving the trivial conflict by keeping both files.
  • Restore the original filename in a separate commit.

(Solution taken from https://mcmap.net/q/22405/-git-copy-file-preserving-history-duplicate.)

Riemann answered 29/9, 2017 at 8:33 Comment(29)
Simplicity, brevity, 100%... This answer is public service... upvoting everything in sightEuphonic
What the difference between move and rename?Maharanee
@Maharanee Are you referring to the fact that in bash you would use mv for both operations? I was using 'move' for the case that may involve changing the file's directory, and 'rename' for there case where it doesn't.Riemann
I tried to follow this (new) recipe and it didn't work. It might help if you showed the actual commands.Visitor
@RobertPollak I have tried various versions of this but they didn't work. By "move the file", do you mean git mv orig new? By "readd the original", do you mean cp new orig && git add orig?Rennes
@GregLindahl, the linked blog entry by Raymond Chen gives the actual commands. I consider this too much detail here.Riemann
@ᆼᆺᆼ, Yes, that's what I meant by 'move' and 'readd'.Riemann
So I've tried and this doesn't work for me... one of the files will have its history begin at the point it was git mv'd, even if both were git mv'd on different branches and then those branches merged together with --no-ffRennes
@ᆼᆺᆼ, do the commands given in Raymond Chen's blog post work for you?Riemann
@RobertPollak No, and I've also tried git cp from git-extras, same result... Is it possible these methods stopped working with a certain git release?Rennes
@ᆼᆺᆼ, this could be. Which Git version do you use? The current release is 2.28.0 from 2020-07-27. I have successfully used Raymond Chen's method in one of my projects yesterday with git 2.20 (from current stable Debian release 10 "buster"). Unfortunately, I currently don't have time for more testing.Riemann
@GregLindahl, which Git version did you use?Riemann
I am using git 2.28.0.windows.1 (the most recent version today) and the commands did not work. The merge just deletes the original files, just applying the git mv command... is there any default settings of Git that could make it fail?Aaronson
Can someone with problems please reproduce them in a fresh repo, then post the corresponding command history?Riemann
@Aaronson Does the four-commit version work for you?Riemann
I did not try, the history was cut by older git-mv anyway (while they were actually simple move), so that confirmed something I read elsewhere about git not recording the mv anyway, and I dropped it.Aaronson
@ArthurTacca, feel free to use whatever you want. For me, Git also won the functionality contest. I thoroughly tested both Git and Mercurial before switching from SVN. And let me point out the alternative solution of simply using --find-copies-harder instead of crafting these commits.Riemann
@JohnK Ok, nice, well, it's not nice, but real ugly (thanx to Linus), anyway that works for a single file, so that's nice =) But what if I'm refactoring some ugly legacy code and have to split one file, that contains X classes into X different files.. what then?Empathy
@EddyShterenberg What have you tried? What was the outcome?Riemann
@RobertPollak, I've tried the exact solution, that posted in the answer, i.e. create branch "dup", rename SINGLE file+commit, restore original file+commit, merge to src branch with --no-ff. My question is: if I have to split file into, say 5 files, then following this scenario I should repeat this 5 times. So, since I'm a developer - I'm lazy and trying to find more convenient/easy way to do that. P.S. google didn't help much. Also it would be nice to have the split as a single commit in the history. In TFS I just branch the file 5 times, clean the 5 copies and check-in once.Empathy
@EddyShterenberg Have you also tried using -C (with a single split commit) instead?Riemann
@RobertPollak, isn't -C an option of log and blame commands? I saw it can multiply over and over - <https://mcmap.net/q/22405/-git-copy-file-preserving-history-duplicate> . I would like to see the history line of any random file in my repository in a regular history view (i.e. UI of some kind, be it Git GUI/TortoiseGit/Visual Studio/What Ever) without having second thoughts, like "hey, may be that file was split from another file, so I'll just switch from my IDE to console and check it". If it's not possible (while staying sane) it's an acceptable fact too, then I'll just stop googling and accept the reality =)Empathy
@RobertPollak I was able to carry out these steps and everything worked as advertised. However, what I'm noticing now is that both resulting files not only share past commit history (which we wanted), but new commits on either of the files shows up in the commit history of the other file. That is, going forward, these files will have identical histories, including new commits, even though the goal is for them to diverge. Have you encountered this and if so did you find a workaround? Thanks!Ferret
a solution that was taken from the issue that is a duplicate of this one XD, niceMusicale
:-) This was asked earlier, so the other got duplicated.Riemann
After I finished these steps I got the same issue as if I had just done cp oldFile newFile it showed the file had just been added directly without any relation with the old file. Any ideas why that might be the case?Boer
@Boer you should see the original file with full history, but the 2nd one as renamed. You will not see a file history directly on renamed files (without following renames). Blame should work fine on both files by default.Hers
@ino, you forgot to rename the variables in the "Say the file" sentence.Riemann
@RobertPollak uh oh, you're right. I have just corrected it. Thanks for review AND for the sharing such easy to follow solution of tricky git task!Michaelemichaelina
S
128

Git does not do rename tracking nor copy tracking, which means it doesn't record renames or copies. What it does instead is rename and copy detection. You can request rename detection in git diff (and git show) by using the -M option, you can request additional copy detection in changed files by using the -C option instead, and you can request more expensive copy detection among all files with -C -C. See the git-diff manpage.

-C -C implies -C, and -C implies -M.

-M is a shortcut for --find-renames, -C means --find-copies and -C -C can also be spelled out as --find-copies-harder.

You can also configure git to always do rename detection by setting diff.renames to a boolean true value (e.g. true or 1), and you can request git to do copy detection too by setting it to copy or copies. See the git-config manpage.

Check also the -l option to git diff and the related config variable diff.renameLimit.


Note that git log <pathspec> works differently in Git: here <pathspec> is set of path delimiters, where path can be a (sub)directory name. It filters and simplifies history before rename and copy detection comes into play. If you want to follow renames and copies, use git log --follow <filename> (which currently is a bit limited, and works only for a single file).

Stoneham answered 25/6, 2009 at 12:9 Comment(7)
@allyourcode: What you are confused about? To turn on copy detection by default you set diff.renames to copies (e.g. 'git config diff.renames copies'). I agree that it is a bit counterintuitive.Ridley
One section I can't seem to parse is "and you can request to do by default also rename detection". Are you saying there's four values that diff.renames can use (true, 1, copy, copies), and that they all do the same thing?Maus
@allyourcode: I'm sorry, I haven't noticed this. Fixed now, thanks.Ridley
Ok, so Git does not record renames or copies. Now I am also interested in the question whether Git then stores everything duplicated, or whether it uses an intelligent de-duplication algorithm based on file-part hashes or similar - so that the data that was copied is stored only once in the repository?Noon
@peschü: Git uses content-addressed object database as a repository storage. File contents is stored in 'blob' contents under address that is SHA-1 hash of contents (well, type+length+contents). This means that given contents is stored only once. Nb. this automatic deduplication was the reason behind creating "bup" backup system, using git pack format.Ridley
Unlike the solution below, this doesn't work with change tracking in a range. Git log allows a range argument (git log -L123,456:file.xyz) that properly follows renames, but not copies, and you can't pass --follow in that case; also, AFAICT, this doesn't work with git blame.Bax
Unfortunately, the packaged gitk has no switch to activate --find-copies-harder, see https://mcmap.net/q/22409/-how-to-add-better-copy-detection-to-gitk/1389680 .Riemann
S
2

This builds on the answer from Robert.

For my use case, I needed to move several directories from one implementation to another (with all that entails for file include paths, unit tests, etc), and I found it challenging & time consuming to move each individual file.

My solution includes prompts for the the origin & destination paths.

My solution also deletes the temporary branch that was created for this purpose (if the script succeeds to the end).

Caveats:

  1. The script will attempt to make a new directory for the input you provide for the second prompt (the new destination).
  2. Both this and the original solution merge history into the CURRENT BRANCH. I suggest that you start with a new branch, or at least git stash save if you have any local modifications.
branchName=chore/temp/duplicate-file-history-by-script
currentBranchName="$(git branch --show-current)"

function copy_git_history() {
    targetToCopy=$1
    newDestination=$2

    echo "copying $targetToCopy to $newDestination and restoring it's history"

    git mv "$targetToCopy" "$newDestination"
    git commit -m "duplicating $targetToCopy to $newDestination to retain git history"

    git checkout HEAD~ "$targetToCopy"
    git commit -m "restoring moved file $targetToCopy to its original location"
}

### USER PROMPTS ###

echo "proceeding to copy files to current branch.  Please make sure you are prepared to have the current git branch modified: $currentBranchName"
# spacing to make things easier to read
printf "\n"

echo "Please enter the path to the file(s) you wish to duplicate, relative to $PWD"
read -r originalFileLoc

echo "Please enter the new path where you wish to copy the original file(s)"
read -r newFileLoc

### END: USER PROMPTS ###

# create the new branch to store the changes
git checkout -b $branchName

# create the duplicate file(s)
if [[ -d  "$originalFileLoc" ]]
then
    files="$originalFileLoc/*"
    echo "copying files from $originalFileLoc to $newFileLoc"
    mkdir -p "$newFileLoc"

    for file in $files
    do
      copy_git_history "$file" "$newFileLoc"
    done
else
  copy_git_history "$originalFileLoc" "$newFileLoc"
fi

# switch back to source branch
git checkout -
# merge the history back into the source branch to retain both copies
git merge --no-ff $branchName -m "Merging file history for copying $originalFileLoc to $newFileLoc"

# delete the branch we created for history tracking purposes
git branch -D $branchName
Sentiment answered 1/8, 2023 at 23:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.