Split large Git repository into many smaller ones
Asked Answered
R

6

96

After successfully converting an SVN repository to Git, I now have a very large Git repository that I want to break down into multiple smaller repositories and maintain history.

So, can someone help with breaking up a repo that might look like this:

MyHugeRepo/
   .git/
   DIR_A/
   DIR_B/
   DIR_1/
   DIR_2/

Into two repositories that look like this:

MyABRepo/
   .git
   DIR_A/
   DIR_B/

My12Repo/
   .git
   DIR_1/
   DIR_2/

I've tried following directions in this previous question but it doesn't really fit when trying to put multiple directories into a separate repo (Detach (move) subdirectory into separate Git repository).

Rodenhouse answered 11/10, 2010 at 22:32 Comment(2)
When you're happy with an answer, please mark it as accepted.Coprophilous
For anyone looking to split out multiple (nested) directories into a new repo (instead of looking to remove multiple directories, which might be harder on some projects), this answer was helpful for me: https://mcmap.net/q/12784/-extract-multiple-directories-using-git-filter-branchColorant
S
81

This will setup MyABRepo; you can do My12Repo similarly of course.

git clone MyHugeRepo/ MyABRepo.tmp/
cd MyABRepo.tmp
git filter-branch --prune-empty --index-filter 'git rm --cached --ignore-unmatch DIR_1/* DIR_2/*' HEAD 

A reference to .git/refs/original/refs/heads/master remains. You can remove that up with:

cd ..
git clone MyABRepo.tmp MyABRepo

If all went well you can then remove MyABRepo.tmp.


If for some reason you get an error regarding .git-rewrite, you can try this:

git clone MyHugeRepo/ MyABRepo.tmp/
cd MyABRepo.tmp
git filter-branch -d /tmp/git-rewrite.tmp --prune-empty --index-filter 'git rm --cached --ignore-unmatch DIR_1/* DIR_2/*' HEAD 
cd ..
git clone MyABRepo.tmp MyABRepo

This will create and use /tmp/git-rewrite.tmp as a temporary directory, instead of .git-rewrite. Naturally, you can substitute any path you wish instead of /tmp/git-rewrite.tmp, so long as you have write permission, and the directory does not already exist.

Stickle answered 11/10, 2010 at 23:37 Comment(8)
'git filter-branch' manpage recommends to create a fresh clone of rewritten repository instead of the last step mentioned above.Churchgoer
I tried this and got an error when it was trying to delete the .git-rewrite folder at the end.Rodenhouse
-d <path-on-another-physical-disk> worked for me and eliminated stange 'mv' failures within --tree-filter.Cochrane
Do you have an idea how to get the very first commit out, if it is related to an excluded path (like DIR_A, for instance)?Deannadeanne
@bitmask: Have you tried git rebase -i --root $tip (for git 1.7.12+), or this older method?Stickle
I did not realize the full ramifications of filter-branch. For those not aware, it re-writes history, so if you plan to push the repo after you have done this, the commit hashes will be different now and it will not work.Colorant
.git/refs/original/refs/heads/master what is this file?what would happen if it is remained?Myrmeco
Which history will remain in the new repository with this method? The history for all the existing files in the new repository? The history for all changes (including deleted files) in the directories DIR_A and DIR_B? What happens to the history of external file moves, e.g. from DIR_1/README.md -> DIR_A/README.md? Whatabout internal file moves such as DIR_A/MyClass.java -> DIR_1/NewNameClass.java?Epp
S
9

You could use git filter-branch --index-filter with git rm --cached to delete the unwanted directories from clones/copies of your original repository.

For example:

trim_repo() { : trim_repo src dst dir-to-trim-out...
  : uses printf %q: needs bash, zsh, or maybe ksh
  git clone "$1" "$2" &&
  (
    cd "$2" &&
    shift 2 &&

    : mirror original branches &&
    git checkout HEAD~0 2>/dev/null &&
    d=$(printf ' %q' "$@") &&
    git for-each-ref --shell --format='
      o=%(refname:short) b=${o#origin/} &&
      if test -n "$b" && test "$b" != HEAD; then 
        git branch --force --no-track "$b" "$o"
      fi
    ' refs/remotes/origin/ | sh -e &&
    git checkout - &&
    git remote rm origin &&

    : do the filtering &&
    git filter-branch \
      --index-filter 'git rm --ignore-unmatch --cached -r -- '"$d" \
      --tag-name-filter cat \
      --prune-empty \
      -- --all
  )
}
trim_repo MyHugeRepo MyABRepo DIR_1 DIR_2
trim_repo MyHugeRepo My12Repo DIR_A DIR_B

You will need to manually delete each repository’s unneeded branches or tags (e.g. if you had a feature-x-for-AB branch, then you probably want to delete that from the “12” repository).

Superficies answered 12/10, 2010 at 0:1 Comment(6)
: is not a comment character in bash. You should use # instead.Bootery
@Daenyth, : is a traditional built-in command ( also specified in POSIX). It is included in bash, but it is not a comment. I specifically used it in preference to # because not all shells take # as a comment introducer in all contexts (e.g. interactive zsh without the INTERACTIVE_COMMENTS option enabled). Using : makes the whole text suitable for pasting into any interactive shell as well as saving in a script file.Superficies
Brilliant! Only solution I found that keeps all the branches intactSidney
Odd, for me it stops with git remote rm origin, which always seems to return 1. Hence I replaced the && by ; for this line.Diagonal
Nice, $@ works for more than two dirs when needed. When finished I call git remote add origin $TARGET; git push origin master.Dido
File renames history is lost, however it is how git handles renaming. Anyway, if you want to keep some directory and remove rest, there is "official" way too, with git subtree. See stackoverflow.com/questions/359424/…Rome
E
6

The git_split project is a simple script that does exactly what you are looking for. https://github.com/vangorra/git_split

Turn git directories into their very own repositories in their own location. No subtree funny business. This script will take an existing directory in your git repository and turn that directory into an independent repository of its own. Along the way, it will copy over the entire change history for the directory you provided.

./git_split.sh <src_repo> <src_branch> <relative_dir_path> <dest_repo>
        src_repo  - The source repo to pull from.
        src_branch - The branch of the source repo to pull from. (usually master)
        relative_dir_path   - Relative path of the directory in the source repo to split.
        dest_repo - The repo to push to.
Eurythmic answered 6/1, 2016 at 2:45 Comment(1)
this would be the same as the filter-branch answer above, yes? if so I assume it has the similar issue that it rewrites the entire history?Soda
B
6

Although at the time of the question utunbu's answer was best you could get, these days even git itself recommends https://github.com/newren/git-filter-repo

It is orders of magnitude faster and comparatively very easy to use

For example here you would do

git clone MyHugeRepo/ MyABRepo.tmp/
cd MyABRepo.tmp
git filter-repo --path DIR_A/ --path DIR_B/

You can see more examples at https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES

Beuthen answered 19/1, 2021 at 14:26 Comment(0)
M
3

Here is a ruby script that will do it. https://gist.github.com/4341033

Mv answered 19/12, 2012 at 22:26 Comment(0)
R
1

Thanks for your answers but I ended up just copying the repository twice then deleting the files I didn't want from each. I am going to use the filter-branch at a later date to strip out all the commits for the deleted files since they are already version controlled elsewhere.

cp -R MyHugeRepo MyABRepo
cp -R MyHugeRepo My12Repo

cd MyABRepo/
rm -Rf DIR_1/ DIR_2/
git add -A
git commit -a

This worked for what I needed.

EDIT: Of course, the same thing was done in the My12Repo against the A and B directory. This gave me two repos with identical history up to the point I deleted the unwanted directories.

Rodenhouse answered 20/10, 2010 at 17:34 Comment(4)
This does not preserve commit history.Bootery
how so? I still have all the history, even for the deleted files.Rodenhouse
Since your requirement wasn't that repo A must pretend repo B never existed, I think this (leaving record of commits that only affected B) is an appropriate solution. Better to duplicate a little history than mangle it.Midrib
"I am going to use the filter-branch at a later date" means, that not only will you then change all the history up till the split, but also all the history that came after that (as in, the commit hashes). thus, it is a generally to be avoided way of handling this situation.Mariannamarianne

© 2022 - 2024 — McMap. All rights reserved.