Move parts of file to other files and keep git history
Asked Answered
git
G

2

6

I have a large file with complex history (many commits from many authors).

Refactoring it would suppose to split it in multiple small files, BUT, i need to keep history.

To fix the ideas, let's say I have a main file containing all my code :

function a() {}
function b() {}
function c() {}
function main() {
   a();
   b();
   c();
}

and I need to move the a and the b functions to a and b files respectively while keeping my main function in the main file -- WHILE keeping history in the three files.

I found some kind of solution there, but nothing that actually works or is practical in a production environment.

Galahad answered 16/9, 2021 at 14:41 Comment(7)
What happened when you followed the method you linked to? Did the history not show up how you expected? Or did you dislike some side-effect of it?Graecoroman
@Graecoroman way too complex and time consuming. I have multiple large files that I need to split in multiple sub filesGalahad
I haven't read through carefully, but it looks to me like you could mostly automate Raymond's method: create the various result files, put them in a folder outside the repo, and then have a script to loop over creating a dummy branch for each one and merge them all together.Graecoroman
I would recommend to not attempt to retain this sort of history. As you're discovering, it can make simple changes complicated. You spend orders of magnitude more time changing the code than doing code archeology, it doesn't make sense to optimize for code archeology.Orang
If you want to later review the history, look through the history of the original file.Ruthven
I agree with @WilliamPursell. In the commit message where you do the split, perhaps add a sentence explaining which file and commit ID you split from.Herndon
Note that there is no such thing as "file history" in Git: the commits are the history, and that's all there is. Programs like git log and git blame attempt to conjure up a file history by reading the actual (commit) history; the extent to which they're successful lies somewhat in the eye of the beholder.Polyhydric
O
6

Move the code as normal. Git can help you read the history.

Use git blame -w -n -M -C -C -C. I like to alias this as archeology.

  • -w ignores trivial whitespace changes.
  • -n shows the line number of the original commit.
  • -M detects moved or copied lines within a file.
  • -C -C -C detects lines moved or copied from other files in any commit.

Similarly, use git log -w -M -C -C -C.

You can also make the archeology easier by copying the code in one commit, and changing it in the next. Then when you're reading back through the blame history you'll hit a commit that says "split up file X".

Ultimately, you spend orders of magnitude more time changing the code than doing code archeology. It doesn't make sense to optimize your development process for code archeology. Instead, change the code as needed and use Git more effectively. And if, in the end, the archeology is a little more difficult that's fine; it's better than making development more difficult.

Sooner than you'd think, especially if you embrace change as a normal part of development, nobody will care where the original lines came from.

Orang answered 16/9, 2021 at 15:23 Comment(2)
Although it does not seem to be the priority of the OP, would your commands above make it easier to merge code when splitting up files? I ran out of time while performing a major code refactor, and I am not in the unfortunate position of merging new code into the refactored branch. Most of the renames took, so most of merges so far have not very challenging; however, there were some files that needed to be broken up. I still have more refactoring to do, so if your commands would help "guide" the merges when splitting up a file, that would be fantastic!Polanco
@Polanco This would be better as a question with full details, reply with the link and I'll have a look. However, my advise is to try and split the refactoring up into smaller incremental refactors. For example, rename a file but don't change the contents and merge that. Split up one file and then merge that.Orang
D
1

The overall idea

The only way I found (for now) to keep history (avoiding additional git blame arguments?), is by split keeping changes and then merging while keeping changes.

Steps

  1. Split - The goal here is to end with an intermediate file for each target file, by creating commits with the following changes:
    1. Rename your source/base file, and create an auxiliary file per each target (Alternate description: "Duplicate the original to the new destinations, making sure to delete the original")
      git mv source source-aux # Rename the base file ("delete")
      cp source-aux target1-aux # Repeat this for each target file
      git add target1-aux # target2-aux target3-aux ...
                          # (all aux files per target, divided by spaces)
      git commit -m 'rename original into one of the aux copies'
      
    2. Clean extra sections on all aux files (Alternate description: "Remove the extra sections from the duplicated files")
      # Use you preferred editor (nano, vim, vscode, ...)
      # to clean extra sections on each aux file:
      nano source-aux # Edit source/base file
      nano target1-aux # Edit each target aux file
      git commit -m 'clean copies'
      
    3. Restore names and prepare for the next part
      git mv source-aux source  # Rename the original/source file
                                # as it was initially ("restore")
      git commit -m 'revert name of original file'
      
  2. Merge - The goal here is to delete each intermediate/aux file, by creating two branches and then merging them:
    1. Create a branch in which the original target file(s) will only be renamed
      git checkout -b rename-targets
      git mv target1 target1-ren # Do this for each target file
      git commit -m 'rename original targets'
      
    2. Go back to the previous branch, and rename each aux file related to each target file
      git checkout -
      git mv target1-aux target1-ren # Do this for each target file
      git commit -m 'rename aux targets'
      
    3. (Attempt to) Merge the other branch into current branch
      git merge -m 'combine with renamed' rename-targets
      
      As you might have noted, it is expected here to be merge conflict(s), which is ok because both branches have files with same names but completely different contents.
    4. Resolve (merge) conflicts (See Note 1!)
      # For sake of this example, let's assume that each
      # conflict gets resolved by just concatenating the
      # changes of both branches, as follows, per each target file:
      cat "target1-ren~HEAD" "target1-ren~rename-targets" > target1-ren
      
    5. Finish the merge
      # Add all target files to mark its conflicts as resolved:
      git add target1-ren # target2-ren target3-ren ...
                          # (all aux renamed files per
                          # target, divided by spaces)
      
      git merge --continue
      
    6. Restore original target filenames
      git mv target1-ren target1 # Do this for each target file
      git commit -m 'restore original target filenames'
      

Notes

  1. Please use your preferred merge tool to resolve merge conflicts, according to the file type you're dealing with!
    • For plain text files, like source code, there are options like meld or also in vscode, in which you use a 3-way editor to resolve conflicts. Tip: Do the least changes here, avoid adding/removing spaces/tabs, just put the lines in its final positions, your main goal here is to resolve conflicts!
    • For other file types (images, PDF, ...) there might be specialized tools for them...
Dyadic answered 18/7 at 5:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.