Keep git history when splitting a file
Asked Answered
D

6

60

I want to take a function out of one file and put it into another, but keep the blame history.

cp a.php b.php

vim b.php
# delete everything but 1 function

vim a.php
# delete the 1 function

git add a.php b.php
git commit

But if I run git blame b.php I only see it blaming to this new commit.

Defend answered 8/10, 2010 at 4:38 Comment(1)
Reverse operation - Preserving Git history while merging filesScriptwriter
S
6

Perhaps this previous SO question could be informative:

How does git track source code moved between files?

To paraphrase the accepted answer: essentially, Git doesn't actually "store" moved code; when generating things like blames for moved code, that's done ex post facto by examining the state of the entire repository from commit to commit.

Suspect answered 8/10, 2010 at 4:42 Comment(3)
This answer sounds like a "no," but really it's a "sometimes." The delete appears to be what triggers Git to look at a file for history beyond other files birthdates. Splitting off one function but keeping the rest of a file as OP did might not work. But I just split one file in half, deleting the original and giving it two new names, and the blame is correctly assigned throughout both new files.Glennglenna
Woops, that was only after editing. After committing it apparently lost the blame for one of the new files. Still possibly a maybe?Glennglenna
@Glennglenna Try splitting each file in a separate branch, then merging the branches in. I think that should get git to recognize the "multiple copy".Displease
B
45

The general rule to maintaining blame history is to make a separate move commit first before any edits. It has been my experience that this allows git blame to work without the need for the -C option. So in the case of splitting the file up into new files, this can be done in two commits:

  1. Duplicate the original to the new destinations, making sure to delete the original
  2. Remove the extra sections from the duplicated files

In the example provided, this would be:

cp a.php b.php
mv a.php c.php
git add a.php b.php c.php
git commit
vim b.php  # delete everything but 1 function
vim c.php  # delete the 1 function
git add b.php c.php
git commit
Bailable answered 30/11, 2015 at 6:13 Comment(3)
The behavior of git add on a nonexistent file changed sometime around Git 1.9. You need either git rm or git add -A to reflect removed files.Chowchow
Thanks for the comment @DamianYerrick. I don't believe it should matter either way if you specify the exact files to stage though. (The change was that, as of git 2.0, "git add <path> is the same as git add -A <path>" in that it includes removals according to the release notes).Bailable
Since "a" is being renamed to "c" with mv on line two, wouldn't you get an error when trying to add "a" on line three? I did... Also, if we wanted to keep the name for "a", would we rename "c" back to "a" with mv after the last commit?Airwoman
A
14

I've slightly modified Peter's answer to another question here to create a reusable, non-interactive shell script called git-split.sh:

#!/bin/sh

if [[ $# -ne 2 ]] ; then
  echo "Usage: git-split.sh original copy"
  exit 0
fi

git mv $1 $2
git commit -n -m "Split history $1 to $2"
REV=`git rev-parse HEAD`
git reset --hard HEAD^
git mv $1 temp
git commit -n -m "Split history $1 to $2"
git merge $REV
git commit -a -n -m "Split history $1 to $2"
git mv temp $1
git commit -n -m "Split history $1 to $2"

It simply copies the source file into a new file, and both files have the same history. An explanation why this works can be seen in that other answer

Animosity answered 19/12, 2018 at 10:50 Comment(2)
Thanks -- worked like a charm, including repeating 3 times to split one large file into 4 (last iteration can just use git mv $1 $2 to rename the original file)Oaxaca
This creates a commit with an invalid state of the codebase. I'm surprised that the industry standard VCS tool doesn't have a clean way to factor out code into multiple files, something which happens fairly often.Clansman
S
6

Perhaps this previous SO question could be informative:

How does git track source code moved between files?

To paraphrase the accepted answer: essentially, Git doesn't actually "store" moved code; when generating things like blames for moved code, that's done ex post facto by examining the state of the entire repository from commit to commit.

Suspect answered 8/10, 2010 at 4:42 Comment(3)
This answer sounds like a "no," but really it's a "sometimes." The delete appears to be what triggers Git to look at a file for history beyond other files birthdates. Splitting off one function but keeping the rest of a file as OP did might not work. But I just split one file in half, deleting the original and giving it two new names, and the blame is correctly assigned throughout both new files.Glennglenna
Woops, that was only after editing. After committing it apparently lost the blame for one of the new files. Still possibly a maybe?Glennglenna
@Glennglenna Try splitting each file in a separate branch, then merging the branches in. I think that should get git to recognize the "multiple copy".Displease
L
6

try git blame -C -C b.php

Legation answered 8/10, 2010 at 4:44 Comment(3)
I want the history to actually blame to the older commits. We use git-svn so others will be on svn.Defend
Too bad. SVN doesn't support this, at all!Gaussmeter
super helpful, tx.. haven't manage to find a solution for git log yetFigurative
E
0

Just an FYI, I've published an NPM package (which you could call directly via the npx command, which does the splitting (duplicating) of the file for you. It's a bit slower than a shell script, but if you're working with npm, it's easily executable, without the need of creating the file or for checking in the file in your git repo, in order to share it with your team mates.

https://www.npmjs.com/package/swgh

You would do

npx swgh myFile.txt myDuplicatedFileWithHistory.someOtherExtensionIfYouWantTo
Eluvium answered 24/8, 2022 at 8:11 Comment(0)
A
0

this version is my iteration on Lukas' answer

it is improved by adding specific files by name, and uses git reset --soft, so there can be files in the working directory, but they won't be affected by git-split.sh.

#!/bin/sh

if [[ $# -ne 2 ]] ; then
  echo "Usage: git-split.sh original copy"
  exit 0
fi

git mv $1 $2
git add $2
git commit -m "renamed: $1 -> $2"
git branch temp-git-split
git reset HEAD~1 --soft
git mv $2 temp-git-split-file
git commit -m "renamed: $1 -> temp-git-split-file"
git merge temp-git-split
git add temp-git-split-file
git add $2
git rm $1
git commit -m "merging history"
git branch -d temp-git-split
git mv temp-git-split-file $1
git commit -m "renamed: temp-git-split-file -> $1"
Admittedly answered 17/6, 2023 at 14:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.