How do I combine several Git repositories without breaking file history?
Asked Answered
S

3

9

We are trying to migrate away from TFS. Using the git-tfs tool, we were able to migrate parts of the existing repo, but it crashes at certain troublesome checkins. We have been able to make a patchwork set of Git repos that cover most of the original TFS commits.

Currently have:

  • Git repo with changes from 2009 until 2011
  • Git repo with changes from 2011 until 2016
  • Git repo with changes from 2016 until current

Desired:

  • Big Git repo that covers 2009 until current
  • any file that existed that whole time would have a single file history

Is there any way for us to stitch these back together into a single Git repo? We don't care about retaining SHAs (they're all new anyway), but we can't break file history.

Sitology answered 21/12, 2017 at 16:18 Comment(1)
As far as I know, this is not possible. The problem you will have is that the last commit hash of the 2009-2011 repository will not be a parent of the 2011-2016 repository. It is possible to combine two git repositories into one, but normally those repositories have different files in them, so it's not important that one is the parent of another.Phylys
P
9

edit: recent versions of git has now extended the git replace command to do it more easily with git replace --graft <commit> <parent> (See https://git-scm.com/docs/git-replace#Documentation/git-replace.txt---graftltcommitgtltparentgt82308203 )


There is an easy way to do that using the 'graft' feature of git. it's a feature with the same goal than git replace that @torek mentioned but that is easier to use in your case.

First, import all the histories in the same repository. In the most recent repository, do for the 2 others:

  1. git remote add c:/path/toward/other/repository
  2. git fetch

Then create the git graft file .git/info/grafts following the help: https://git.wiki.kernel.org/index.php/GraftPoint (you should have 2 lines in your file)

If you use git log or any Git GUI, you now should see the history like you want it.

If you are satisfied, then rewrite the history to make it definitive with:

git filter-branch

You could now push your history to a central repository or share it.

Ps: another doc on the subject but melting grafts and replace git features : https://legacy-developer.atlassian.com/blog/2015/08/grafting-earlier-history-with-git/

Paulo answered 22/12, 2017 at 9:6 Comment(1)
Git has everything. Thanks!Sitology
C
3

Git doesn't have file history.

Git stores commits, and commits are history. They are the only history there is. (I say it's not file history because it's commit history.) Each commit has a parent commit, or if the commit is a merge, two parents (or potentially more than two if it's an octopus merge).

Other than having a parent, each commit is a stand-alone snapshot of all the files that are in that commit. There's no history here: it's just a snapshot. If you want to see what happened between the previous commit and the current commit, you have Git extract the previous commit (snapshot O for Old) and the current commit (snapshot N for New) and run diff O N. That's what changed: whatever is different between O and N.

You can ask Git to synthesize a file history, but it does so by a horrible hack: it looks for one particular changed file, in each commit, as it goes back through commit history. It prints commits where that commit changes the file when compared to that commit's parent. If the file name changes—if the commit renamed the file—and you have used --follow, Git changes which (single) file name it's looking for, so now it's looking under the previous name.

If you have a history consisting of a sequence of commits:

(history starts here, at a root commit)
  |
  v

  o--o--<branches and merges...>--o   <-- end

and a second history:

  o--o--<branches and merges...>--o   <-- end

  o--o--...--o   <-- end2
  ^
  |
(we want to replace this one)

in a single repository, you can write a "replacement" commit object (using git replace) that is just like the second root commit that we want to replace, except for one thing: it has, as its parent commit, the commit to which end points.

This replacement commit effectively splices the two histories together.

Repeat this as desired for as many splices as you would like to add, for as many separate commit chains as you have in a single repository. Then you can run git filter-branch over this repository, which copies every commit, but follows the replacements. This has the effect of cementing the grafts in place. See What does git filter-branch with no arguments do? or Rebase entire git branch onto orphan branch while keeping commit tree intact for example.

Conciseness answered 21/12, 2017 at 22:13 Comment(0)
R
0

Based on Eric Lee's blog post:

  1. Create a new empty repository New.
  2. Make an initial commit because we need one before we do a merge.
  3. Add a remote to old repository OldA.
  4. Merge OldA/master to New/master.
  5. Make a subdirectory OldA.
  6. Move all files into subdirectory OldA.
  7. Commit all of the file moves.
  8. Repeat 3-6 for OldB.
Roop answered 21/12, 2017 at 16:35 Comment(1)
I don't think that approach will keep a contiguous history for a specific file across the different repositories. I'll end up with one file per repo in each of the subdirs OldA/, OldB/, etc.Sitology

© 2022 - 2024 — McMap. All rights reserved.