Move parts of file to other files and keep git history

Asked 16/9, 2021 at 14:41 Answered 18/7 at 5:52

I have a large file with complex history (many commits from many authors).

Refactoring it would suppose to split it in multiple small files, BUT, i need to keep history.

To fix the ideas, let's say I have a main file containing all my code :

function a() {}
function b() {}
function c() {}
function main() {
   a();
   b();
   c();
}

and I need to move the a and the b functions to a and b files respectively while keeping my main function in the main file -- WHILE keeping history in the three files.

I found some kind of solution there, but nothing that actually works or is practical in a production environment.

Galahad answered 16/9, 2021 at 14:41 Comment(7)

What happened when you followed the method you linked to? Did the history not show up how you expected? Or did you dislike some side-effect of it? – Graecoroman 16/9, 2021 at 14:45

@Graecoroman way too complex and time consuming. I have multiple large files that I need to split in multiple sub files – Galahad 16/9, 2021 at 14:47

I haven't read through carefully, but it looks to me like you could mostly automate Raymond's method: create the various result files, put them in a folder outside the repo, and then have a script to loop over creating a dummy branch for each one and merge them all together. – Graecoroman 16/9, 2021 at 14:55

I would recommend to not attempt to retain this sort of history. As you're discovering, it can make simple changes complicated. You spend orders of magnitude more time changing the code than doing code archeology, it doesn't make sense to optimize for code archeology. – Orang 16/9, 2021 at 15:4

If you want to later review the history, look through the history of the original file. – Ruthven 16/9, 2021 at 15:6

I agree with @WilliamPursell. In the commit message where you do the split, perhaps add a sentence explaining which file and commit ID you split from. – Herndon 16/9, 2021 at 15:17

Note that there is no such thing as "file history" in Git: the commits are the history, and that's all there is. Programs like git log and git blame attempt to conjure up a file history by reading the actual (commit) history; the extent to which they're successful lies somewhat in the eye of the beholder. – Polyhydric 16/9, 2021 at 21:19

Move the code as normal. Git can help you read the history.

Use git blame -w -n -M -C -C -C. I like to alias this as archeology.

-w ignores trivial whitespace changes.
-n shows the line number of the original commit.
-M detects moved or copied lines within a file.
-C -C -C detects lines moved or copied from other files in any commit.

Similarly, use git log -w -M -C -C -C.

You can also make the archeology easier by copying the code in one commit, and changing it in the next. Then when you're reading back through the blame history you'll hit a commit that says "split up file X".

Ultimately, you spend orders of magnitude more time changing the code than doing code archeology. It doesn't make sense to optimize your development process for code archeology. Instead, change the code as needed and use Git more effectively. And if, in the end, the archeology is a little more difficult that's fine; it's better than making development more difficult.

Sooner than you'd think, especially if you embrace change as a normal part of development, nobody will care where the original lines came from.

Orang answered 16/9, 2021 at 15:23 Comment(2)

Although it does not seem to be the priority of the OP, would your commands above make it easier to merge code when splitting up files? I ran out of time while performing a major code refactor, and I am not in the unfortunate position of merging new code into the refactored branch. Most of the renames took, so most of merges so far have not very challenging; however, there were some files that needed to be broken up. I still have more refactoring to do, so if your commands would help "guide" the merges when splitting up a file, that would be fantastic! – Polanco 22/6, 2023 at 18:34

@Polanco This would be better as a question with full details, reply with the link and I'll have a look. However, my advise is to try and split the refactoring up into smaller incremental refactors. For example, rename a file but don't change the contents and merge that. Split up one file and then merge that. – Orang 23/6, 2023 at 19:56

The overall idea

The only way I found (for now) to keep history (avoiding additional git blame arguments?), is by split keeping changes and then merging while keeping changes.

Steps

Split - The goal here is to end with an intermediate file for each target file, by creating commits with the following changes:

Rename your source/base file, and create an auxiliary file per each target (Alternate description: "Duplicate the original to the new destinations, making sure to delete the original")

git mv source source-aux # Rename the base file ("delete")
cp source-aux target1-aux # Repeat this for each target file
git add target1-aux # target2-aux target3-aux ...
                    # (all aux files per target, divided by spaces)
git commit -m 'rename original into one of the aux copies'

Clean extra sections on all aux files (Alternate description: "Remove the extra sections from the duplicated files")

# Use you preferred editor (nano, vim, vscode, ...)
# to clean extra sections on each aux file:
nano source-aux # Edit source/base file
nano target1-aux # Edit each target aux file
git commit -m 'clean copies'

Restore names and prepare for the next part

git mv source-aux source  # Rename the original/source file
                          # as it was initially ("restore")
git commit -m 'revert name of original file'

Merge - The goal here is to delete each intermediate/aux file, by creating two branches and then merging them:

Create a branch in which the original target file(s) will only be renamed

git checkout -b rename-targets
git mv target1 target1-ren # Do this for each target file
git commit -m 'rename original targets'

Go back to the previous branch, and rename each aux file related to each target file

git checkout -
git mv target1-aux target1-ren # Do this for each target file
git commit -m 'rename aux targets'

(Attempt to) Merge the other branch into current branch
```
git merge -m 'combine with renamed' rename-targets
```
As you might have noted, it is expected here to be merge conflict(s), which is ok because both branches have files with same names but completely different contents.

Resolve (merge) conflicts (See Note 1!)

# For sake of this example, let's assume that each
# conflict gets resolved by just concatenating the
# changes of both branches, as follows, per each target file:
cat "target1-ren~HEAD" "target1-ren~rename-targets" > target1-ren

Finish the merge

# Add all target files to mark its conflicts as resolved:
git add target1-ren # target2-ren target3-ren ...
                    # (all aux renamed files per
                    # target, divided by spaces)

git merge --continue

Restore original target filenames

git mv target1-ren target1 # Do this for each target file
git commit -m 'restore original target filenames'

Notes

Please use your preferred merge tool to resolve merge conflicts, according to the file type you're dealing with!
- For plain text files, like source code, there are options like meld or also in vscode, in which you use a 3-way editor to resolve conflicts. Tip: Do the least changes here, avoid adding/removing spaces/tabs, just put the lines in its final positions, your main goal here is to resolve conflicts!
- For other file types (images, PDF, ...) there might be specialized tools for them...

Dyadic answered 18/7 at 5:52 Comment(0)

The overall idea

Steps

Notes

Recommended topics

Hot tags