Merge git repository in subdirectory
Asked Answered
S

8

103

I'd like to merge a remote git repository in my working git repository as a subdirectory of it. I'd like the resulting repository to contain the merged history of the two repositories and also that each file of the merged-in repository retain its history as it was in the remote repository. I tried using the subtree strategy as mentioned in How to use the subtree merge strategy, but after following that procedure, although the resulting repository contains indeed the merged history of the two repositories, individual files coming from the remote one haven't retained their history (`git log' on any of them just shows a message "Merged branch...").

Also I don't want to use submodules because I do not want the two combined git repositories to be separate anymore.

Is it possible to merge a remote git repository in another one as a subdirectory with individual files coming from the remote repository retaining their history?

Thanks very much for any help.

EDIT: I'm currently trying out a solution that uses git filter-branch to rewrite the merged-in repository history. It does seem to work, but I need to test it some more. I'll return to report on my findings.

EDIT 2: In hope I make myself more clear I give the exact commands I used with git's subtree strategy, which result in apparent loss of history of the files of the remote repository. Let A be the git repo I'm currently working in and B the git repo I'd like to incorporate into A as a subdirectory of it. It did the following:

git remote add -f B <url-of-B>
git merge -s ours --no-commit B/master
git read-tree --prefix=subdir/Iwant/to/put/B/in/ -u B/master
git commit -m "Merge B as subdirectory in subdir/Iwant/to/put/B/in."

After these commands and going into directory subdir/Iwant/to/put/B/in, I see all files of B, but git log on any one of them shows just the commit message "Merge B as subdirectory in subdir/Iwant/to/put/B/in." Their file history as it is in B is lost.

What seems to work (since I'm a beginner on git I may be wrong) is the following:

git remote add -f B <url-of-B>
git checkout -b B_branch B/master  # make a local branch following B's master
git filter-branch --index-filter \ 
   'git ls-files -s | sed "s-\t\"*-&subdir/Iwant/to/put/B/in/-" |
        GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
                git update-index --index-info &&
        mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD 
git checkout master
git merge B_branch

The command above for filter-branch is taken from git help filter-branch, in which I only changed the subdir path.

Seasick answered 21/6, 2011 at 13:41 Comment(5)
What does gitk say about the history? I've used git subtree merge successfully in the past. Perhaps you can reveal your exact commands? I'm not sure git-filter-branch is the right approach. I might recommend trying git-fast-export and git-fast-import to synthesize a new history.Hardback
After doing the subtree procedure gitk shows the two repos merged on their tips and unrelated in their initial commits. (Would it help if I post screenshots of gitk's history view? Can I?) Unfortunately individual files of the remote repository haven't retained their history if I do in the terminal git log <file-from-remote-repo>. I look into git-fast-export and git-fast-import; I'm very new to git. I'll edit my question to show exactly what commands I used with git subtree. Thanks very much for your reply.Seasick
@christosc: your second method worked beautifully and very simply, Thank's a lot! I just had to change subdir/Iwant/to/put/B/in/ and to make it a oneliner (because msysgit on Windows seems to not support line returns in commands with ): git filter-branch --index-filter 'git ls-files -s | sed "s-\t\"*-&subdir/Iwant/to/put/B/in/-" | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEADZacek
@user1121352 Glad to have been of help to you.Seasick
I normally follow this answer: https://mcmap.net/q/13849/-how-to-import-existing-git-repository-into-anotherSearcy
H
49

After getting the fuller explanation of what is going on, I think I understand it and in any case at the bottom I have a workaround. Specifically, I believe what is happening is rename detection is being fooled by the subtree merge with --prefix. Here is my test case:

mkdir -p z/a z/b
cd z/a
git init
echo A>A
git add A
git commit -m A
echo AA>>A
git commit -a -m AA
cd ../b
git init
echo B>B
git add B
git commit -m B
echo BB>>B
git commit -a -m BB
cd ../a
git remote add -f B ../b
git merge -s ours --no-commit B/master
git read-tree --prefix=bdir -u B/master
git commit -m "subtree merge B into bdir"
cd bdir
echo BBB>>B
git commit -a -m BBB

We make git directories a and b with several commits each. We do a subtree merge, and then we do a final commit in the new subtree.

Running gitk (in z/a) shows that the history does appear, we can see it. Running git log shows that the history does appear. However, looking at a specific file has a problem: git log bdir/B

Well, there is a trick we can play. We can look at the pre-rename history of a specific file using --follow. git log --follow -- B. This is good but isn't great since it fails to link the history of the pre-merge with the post-merge.

I tried playing with -M and -C, but I wasn't able to get it to follow one specific file.

So, the solution, I feel, is to tell git about the rename that will be taking place as part of the subtree merge. Unfortunately git-read-tree is pretty fussy about subtree merges so we have to work through a temporary directory, but that can go away before we commit. Afterwards, we can see the full history.

First, create an "A" repository and make some commits:

mkdir -p z/a z/b
cd z/a
git init
echo A>A
git add A
git commit -m A
echo AA>>A
git commit -a -m AA

Second, create a "B" repository and make some commits:

cd ../b
git init
echo B>B
git add B
git commit -m B
echo BB>>B
git commit -a -m BB

And the trick to making this work: force Git to recognize the rename by creating a subdirectory and moving the contents into it.

mkdir bdir
git mv B bdir
git commit -a -m bdir-rename

Return to repository "A" and fetch and merge the contents of "B":

cd ../a
git remote add -f B ../b
git merge -s ours --no-commit B/master
# According to Alex Brown and pjvandehaar, newer versions of git need --allow-unrelated-histories
# git merge -s ours --allow-unrelated-histories --no-commit B/master
git read-tree --prefix= -u B/master
git commit -m "subtree merge B into bdir"

To show that they're now merged:

cd bdir
echo BBB>>B
git commit -a -m BBB

To prove the full history is preserved in a connected chain:

git log --follow B

We get the history after doing this, but the problem is that if you are actually keeping the old "b" repo around and occasionally merging from it (say it is actually a third party separately maintained repo) you are in trouble since that third party will not have done the rename. You must try to merge new changes into your version of b with the rename and I fear that will not go smoothly. But if b is going away, you win.

Hardback answered 22/6, 2011 at 15:1 Comment(7)
Indeed that works @Seth! And I didn't have to resort to history rewriting as with filter-branch, which makes for a somewhat deceptive history (e.g. while viewing git log --stat). Also I hadn't noticed the --follow switch in git log's documentation; seems very handy with renames. Thank you very much for your so detailed and informative reply!Seasick
This response would be much more helpful if the example code were broken into readable lines instead of a single semi-colon-separated one-liner. ;)Matisse
I'd like to merge "b" into "a" with keeping its full history. How could I do that?Convict
@Emerald214 Nothing to do with the original question being asked, but your answer is git checkout a; git merge b. Git always keeps full history unless you explicitly delete or rewrite that history. Next time, you should post a new question, or search. One related question is #28407520 I recommend using gitk --all to understand what is going on before and after merges.Hardback
See #37938484 for bugfixSect
As @AlexBrown mentioned, on new versions of git this produces fatal: refusing to merge unrelated histories and so you must run git merge -s ours --allow-unrelated-histories --no-commit B/master instead.Criseyde
The downside of using subtree merges is difficulties with inspecting history. They're not good if you're going to make changes to source repository's files in the target repository. If that is not your case, then git log --follow -- a is your friend (where a is an unprefixed path). More on subtree merges here. Considering that you might want to take another approach.Quartered
J
83

git-subtree is a script designed for exactly this use case of merging multiple repositories into one while preserving history (and/or splitting history of subtrees, though that is seems to be irrelevant to this question). It is distributed as part of the git tree since release 1.7.11.

To merge a repository <repo> at revision <rev> as subdirectory <prefix>, use git subtree add as follows:

git subtree add -P <prefix> <repo> <rev>

git-subtree implements the subtree merge strategy in a more user friendly manner.

The downside is that in the merged history the files are unprefixed (not in a subdirectory). Say you merge repository a into b. As a result git log a/f1 will show you all the changes (if any) except those in the merged history. You can do:

git log --follow -- f1

but that won't show the changes other then in the merged history.

In other words, if you don't change a's files in repository b, then you need to specify --follow and an unprefixed path. If you change them in both repositories, then you have 2 commands, none of which shows all the changes.

More on it here.

Jehanna answered 20/9, 2015 at 21:32 Comment(3)
Nice! This is exactly what I needed in one line. Thanks, the future!Edelweiss
This is the perfect solution to merge another repository into my repository in a sub direction.Julianjuliana
Note that this will not work with existing subdirectories at <prefix>. E.g. in order to merge a subdirectory that has been moved manually into its own repository somewhen, and you want to merge it back in.Irruptive
H
49

After getting the fuller explanation of what is going on, I think I understand it and in any case at the bottom I have a workaround. Specifically, I believe what is happening is rename detection is being fooled by the subtree merge with --prefix. Here is my test case:

mkdir -p z/a z/b
cd z/a
git init
echo A>A
git add A
git commit -m A
echo AA>>A
git commit -a -m AA
cd ../b
git init
echo B>B
git add B
git commit -m B
echo BB>>B
git commit -a -m BB
cd ../a
git remote add -f B ../b
git merge -s ours --no-commit B/master
git read-tree --prefix=bdir -u B/master
git commit -m "subtree merge B into bdir"
cd bdir
echo BBB>>B
git commit -a -m BBB

We make git directories a and b with several commits each. We do a subtree merge, and then we do a final commit in the new subtree.

Running gitk (in z/a) shows that the history does appear, we can see it. Running git log shows that the history does appear. However, looking at a specific file has a problem: git log bdir/B

Well, there is a trick we can play. We can look at the pre-rename history of a specific file using --follow. git log --follow -- B. This is good but isn't great since it fails to link the history of the pre-merge with the post-merge.

I tried playing with -M and -C, but I wasn't able to get it to follow one specific file.

So, the solution, I feel, is to tell git about the rename that will be taking place as part of the subtree merge. Unfortunately git-read-tree is pretty fussy about subtree merges so we have to work through a temporary directory, but that can go away before we commit. Afterwards, we can see the full history.

First, create an "A" repository and make some commits:

mkdir -p z/a z/b
cd z/a
git init
echo A>A
git add A
git commit -m A
echo AA>>A
git commit -a -m AA

Second, create a "B" repository and make some commits:

cd ../b
git init
echo B>B
git add B
git commit -m B
echo BB>>B
git commit -a -m BB

And the trick to making this work: force Git to recognize the rename by creating a subdirectory and moving the contents into it.

mkdir bdir
git mv B bdir
git commit -a -m bdir-rename

Return to repository "A" and fetch and merge the contents of "B":

cd ../a
git remote add -f B ../b
git merge -s ours --no-commit B/master
# According to Alex Brown and pjvandehaar, newer versions of git need --allow-unrelated-histories
# git merge -s ours --allow-unrelated-histories --no-commit B/master
git read-tree --prefix= -u B/master
git commit -m "subtree merge B into bdir"

To show that they're now merged:

cd bdir
echo BBB>>B
git commit -a -m BBB

To prove the full history is preserved in a connected chain:

git log --follow B

We get the history after doing this, but the problem is that if you are actually keeping the old "b" repo around and occasionally merging from it (say it is actually a third party separately maintained repo) you are in trouble since that third party will not have done the rename. You must try to merge new changes into your version of b with the rename and I fear that will not go smoothly. But if b is going away, you win.

Hardback answered 22/6, 2011 at 15:1 Comment(7)
Indeed that works @Seth! And I didn't have to resort to history rewriting as with filter-branch, which makes for a somewhat deceptive history (e.g. while viewing git log --stat). Also I hadn't noticed the --follow switch in git log's documentation; seems very handy with renames. Thank you very much for your so detailed and informative reply!Seasick
This response would be much more helpful if the example code were broken into readable lines instead of a single semi-colon-separated one-liner. ;)Matisse
I'd like to merge "b" into "a" with keeping its full history. How could I do that?Convict
@Emerald214 Nothing to do with the original question being asked, but your answer is git checkout a; git merge b. Git always keeps full history unless you explicitly delete or rewrite that history. Next time, you should post a new question, or search. One related question is #28407520 I recommend using gitk --all to understand what is going on before and after merges.Hardback
See #37938484 for bugfixSect
As @AlexBrown mentioned, on new versions of git this produces fatal: refusing to merge unrelated histories and so you must run git merge -s ours --allow-unrelated-histories --no-commit B/master instead.Criseyde
The downside of using subtree merges is difficulties with inspecting history. They're not good if you're going to make changes to source repository's files in the target repository. If that is not your case, then git log --follow -- a is your friend (where a is an unprefixed path). More on subtree merges here. Considering that you might want to take another approach.Quartered
C
17

I wanted to

  1. keep a linear history without explicit merge, and
  2. make it look like the files of the merged repository had always existed in the subdirectory, and as a side effect make git log -- file work without --follow.

Step 1: Rewrite history in the source repository to make it look like all files always existed below the subdirectory.

Create a temporary branch for the rewritten history.

git checkout -b tmp_subdir

Then use git filter-branch as described in How can I rewrite history so that all files, except the ones I already moved, are in a subdirectory?:

git filter-branch --prune-empty --tree-filter '
if [ ! -e foo/bar ]; then
    mkdir -p foo/bar
    git ls-tree --name-only $GIT_COMMIT | xargs -I files mv files foo/bar
fi'

Step 2: Switch to the target repository. Add the source repository as remote in the target repository and fetch its contents.

git remote add sourcerepo .../path/to/sourcerepo
git fetch sourcerepo

Step 3: Use merge --onto to add the commits of the rewritten source repository on top of the target repository.

git rebase --preserve-merges --onto master --root sourcerepo/tmp_subdir

You can check the log to see that this really got you what you wanted.

git log --stat

Step 4: After the rebase you’re in “detached HEAD” state. You can fast-forward master to the new head.

git checkout -b tmp_merged
git checkout master
git merge tmp_merged
git branch -d tmp_merged

Step 5: Finally some cleanup: Remove the temporary remote.

git remote rm sourcerepo
Cochlea answered 23/7, 2018 at 9:52 Comment(4)
git rebase doesn't seem to allow the specified options together: "error: cannot combine interactive options (--interactive, --exec, --rebase-merges, --preserve-merges, --keep-empty, --root + --onto) with am options ( --committer-date-is-author-date)"Paediatrician
Interesting! Try to drop --committer-date-is-author-date. The check for incompatible options was added recently in git v2.19.0 (github.com/git/git/commit/…). From the description it sounds as if --committer-date-is-author-date was silently ignored before anyway.Cochlea
Rather than use the old filter-branch command, use git filter-repo --to-subdirectory-filter <dir>, it's way faster and easier.Jackofalltrades
With the above, in the git rebase step I get: "warning: git rebase --preserve-merges is deprecated. Use --rebase-merges instead." Always some updating to do with git :)Hollowell
A
5

If you are really wanting to stitch things together, look up grafting. You should also be using git rebase --preserve-merges --onto. There is also an option to keep the author date for the committer information.

Affectionate answered 22/6, 2011 at 1:5 Comment(5)
@adymitruk Thanks, for your reply. I'm really new to git, so I will look into the solution you propose. I tried git filter-branch and it seems to work, but maybe yours is better. I'll try it out.Seasick
@adymitruk Can I use rebase with two repositories that aren't inter-related as branches? I mean the two repositories I want to merge haven't common initial commits...Seasick
Thanks @adymitruk. I wasn't sure if rebasing could be done with two unrelated repositories. It certainly will be useful…Seasick
But don't be afraid of filter-branch. It's saved us many times. Just make another branch prior and you can always go back. That, or use the reflog.Affectionate
I see… In any case I better do some reading of the docs on these git concepts and commands. Having only but little experience in VCSs, namely svn, I'm kind of overwhelmed by git. Its power though seems to be worth it.Seasick
A
4

I found the following solution workable for me. First I go into project B, create a new branch in which already all files will be moved to the new sub directory. I then push this new branch to origin. Next I go to project A, add and fetch the remote of B, then I checkout the moved branch, I go back into master and merge:

# in local copy of project B
git checkout -b prepare_move
mkdir subdir
git mv <files_to_move> subdir/
git commit -m 'move files to subdir'
git push origin prepare_move

# in local copy of project A
git remote add -f B_origin <remote-url>
git checkout -b from_B B_origin/prepare_move
git checkout master
git merge from_B

If I go to sub directory subdir, I can use git log --follow and still have the history.

I'm not a git expert, so I cannot comment whether this is a particularly good solution or if it has caveats, but so far it seems all fine.

Ascocarp answered 22/12, 2012 at 14:11 Comment(1)
People seem to be upvoting this approach here: #1684031Burnette
Q
4

Say you want to merge repository a into b (I'm assuming they're located alongside one another):

cd a
git filter-repo --to-subdirectory-filter a
cd ..
cd b
git remote add a ../a
git fetch a
git merge --allow-unrelated-histories a/master
git remote remove a

For this you need git-filter-repo installed (filter-branch is discouraged).

An example of merging 2 big repositories, putting one of them into a subdirectory: https://gist.github.com/x-yuri/9890ab1079cf4357d6f269d073fd9731

More on it here.

Quartered answered 30/5, 2020 at 15:42 Comment(0)
G
4

Similar to hfs' answer I wanted to

  • keep a linear history without explicit merge and
  • make it look like the files of the merged repository had always existed in the subdirectory, and as a side effect make git log -- file work without --follow.

However, I chose the more modern filter-repo (assuming the new repo exists and is checked out):

git clone git@host/repo/old.git
cd old
git checkout -b tmp_subdir
git filter-repo --to-subdirectory-filter old

cd ../new
git remote add old ../old
git fetch old
git rebase --rebase-merges --onto main --root old/tmp_subdir --committer-date-is-author-date

you might need to fix conflicts (manually) or change the rebase command to include --merge -s recursive -X theirs if you want to try solving it with theirs version:

git rebase --rebase-merges --onto main --root old/tmp_subdir --committer-
date-is-author-date --merge -s recursive -X theirs

you end up on a detached HEAD, so create a new branch and merge it to main note that modern repositories should not use a "master" branch but a "main"

branch for a more inclusive language.
git checkout -b old_merge
git checkout main
git merge old_merge

cleanup

git branch -d old_merge
git remote rm old
Gnotobiotics answered 15/12, 2020 at 17:49 Comment(2)
This worked great, thank you.Homophile
Worked great! Thanks! I've reproduced this for several packages spread across monorepos to a different existent monorepo.Myogenic
O
2

Have you tried adding the extra repository as a git submodule? It won't merge the history with the containing repository, in fact, it will be an independent repository.

I mention it, because you haven't.

Overseas answered 21/6, 2011 at 13:46 Comment(1)
Thanks for the answer Abizern. Actually I do want the two repository histories to be merged into one; I don't want them to be separate anymore, that's why I didn't mention submodules.Seasick

© 2022 - 2024 — McMap. All rights reserved.