Change the root commit parent to point to another commit (connecting two independent git repositories)
Asked Answered
M

1

5

I have a project that has more than 3 years of history in the svn repository. It was migrated to git, but the guy who did this, just take the last version and throw out all these 3 years of history.

Now the project has the last 3-4 months of history in one repository, and I've imported the other 3 years of svn history into a new git repository.

Is there some way to connect the root commit of the second repository into the last commit of the first one?

It is something like this:

  *   2017-04-21 - last commit on master
  |   
  *   2017-03-20 - merge branch Y into master
  |\  
  | * 2017-03-19 - commit on branch Y
  | | 
  * | 2017-03-18 - merge branch X into master
 /| * 2017-02-17 - commit on another new branch Y
* |/  2017-02-16 - commit on branch X
| *   2017-02-15 - commit on master branch
* |   2017-01-14 - commit on new branch X
 \|   
  *   2017-01-13 - first commit on new repository
  |   
  *   2017-01-12 - init new git project with the last version of the code in svn repository
  .   
  .   
There is no relationship between the two different repositories yet, this is what I wanna
do. I want to connect the root commit of 2nd repository with the last commit of the first
one.
  .
  .   
  *   2017-01-09 - commit
  |   
  *   2017-01-08 - commit
  |   
  *   2017-01-07 - merge
 /|   
* |   2016-01-06 - 2nd commit the other branch
| *   2016-01-05 - commit on trunk
* |   2016-01-04 - commit on new branch
 \|   
  *   2015-01-03 - first commit
  |   
  *   2015-01-02 - beggining of the project

Update:

I just learn that I need to do a git rebase, but how? Please, let's consider the commit dates like it was the SHA-1 codes... The answer was to use git filter-branch with --parent-filter option, not a git rebase.

Update 2:

I tried the command git filter-branch --parent-filter 'test $GIT_COMMIT = 443aec8880e898710796a1c4fb4decea1ca5ff66 && echo "-p 98e2b95e07b84ad1e40c3231e66840ea910e9d66" || cat' HEAD and it didn't work:

PS D:\git\rebase-test\rep2cc> git filter-branch --parent-filter 'test $GIT_COMMIT = 443aec8880e898710796a1c4fb4decea1ca5ff66 && echo "-p 98e2b95e07b84ad1e40c3231e66840ea910e9d66" || cat' HEAD
fatal: ambiguous argument '98e2b95e07b84ad1e40c3231e66840ea910e9d66 || cat': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Update 3:

It didn't work on Windows CMD or PowerShell, but it did work in Git Bash on windows.

Matelote answered 19/5, 2017 at 19:34 Comment(4)
Well, have you considered fetching both inside the same repository then rebasing one history on top of another? This will of course rewrite all the commits of the history you're rebasing.Cathartic
Lasse got it right. Just set up a new repo with svn cloned correctly, add a remote with this busted repo, fetch and cherry-pick into the correctly cloned svn repo the history that was done on git after cloning from svn.Salesmanship
I'm new on git, I just learn that the magic word that I was looking for is 'rebase'Matelote
What do I need to do? Lets supose that the commit dates on the example are the SHA-1... git rebase 2017-01-09 2017-04-21 ?Matelote
A
6

First things first: you need a single repo that has all the available history.

Make a clone of the repo with the recent history. Add the repo with the old history as a remote. I recommend this clone be a "mirror" and that you finish by replacing your origin repo with this one. But alternately you can leave --mirror off, and you'll finish by pushing (possibly force-pushing depending on which approach you use) all refs back to origin.

git clone --mirror url/of/current/repo
cd repo
git remote add history url/of/historical/repo
git fetch history

The next thing you need to do is figure out where you'll be splicing the history. The terminology to describe this is a bit fuzzy I think... what you want is to find the two commits that correspond to the most recent SVN revision for which both histories have a commit. For example your SVN repo contained versions 1, 2, 3, and 4. Now you have

Recent-History Repo

C --- D --- E --- F <--(master)

Old-History Repo

A --- B --- C' --- D'

where A represents version 1, B represents version 2, C and C' represent version 3, and D and D' represent version 4. E and F are work created after the original migration. So you want to splice the commits whose parent is D (E in this example) onto D'.

Now, I can think of two approaches, each with pros and cons.

Rewriting The Recent History

IMO the best way if you can coordinate a cut-over of all developers to a new repo (meaning you arrange a time when they all agree that all outstanding work is pushed, so they discard their clones; then you do the conversion; then they all re-clone) is to (effectively) rebase the recent history onto the old history.

If there is really just a single branch, then you can literally use rebase

git rebase --onto D' D master

(where D and D' are replaced with the SHA ID of the commits).

More likely you have some branches and merges in the recent history; in that case a rebase operation will start becoming a problem very quickly. On the other hand, you can take advantage of the fact that D has the same tree as D' -- so a rebase and a re-parent are more or less equivalent.

So you can use git filter-branch with a --parent-filter to do the rewrite. Based on the examples in the docs at https://git-scm.com/docs/git-filter-branch you would do something like

git filter-branch --parent-filter 'test $GIT_COMMIT = D && echo "-p D'" || cat' HEAD

(where again D and D' are replaced with the SHA ID of the commits).

This creates "backup" refs that you'll need to clean up. In the end you'll get

A --- B --- C' --- D' --- E' --- F' <--(master)

It's the fact that F was replace by F' which creates the need for a hard cut-over (more or less).

Now if you made a mirror clone back at step 1, you can consider wiping the reflog, dropping the remotes, and running gc, and then this is a new ready-to-use origin repo.

If you made a regular clone, then you'll need to push -f all the refs to the origin, and this will likely leave behind some clutter on the origin repo.

Using a "replacement commit"

The other option doesn't create a hard cut-over, but it leaves you with small headaches to deal with forever. You can use git replace. In your combined repo

git replace `D` `D'`

By default, when generating log output or whatever, if git finds D, it will substitute D' (and its history) in the output.

There are some known glitches. There may be unknown glitches. And by default the "replacement refs" that make this all work aren't shared, so you have to push and fetch them deliberately.

Annadiane answered 19/5, 2017 at 20:5 Comment(3)
I tried the git replace, it is not what I want. And I do have a lot of branches and merges in history, so, the rebase operation do became a problem, like you said.Matelote
This third option, I don't understand... what is P and P'? And the command, git filter-branch --parent-filter 'test $GIT_COMMIT = D && echo "-p D'" || cat' HEAD will it work on windows?Matelote
Sorry, "P and P'" should've been D and D' as elsewhere; I've updated. The command will work on windows in a git bash shell; just remember to replace D and D' with proper commit references (e.g. SHA ids)Annadiane

© 2022 - 2024 — McMap. All rights reserved.