How to extract a git subdirectory and make a submodule out of it?
Asked Answered
T

5

126

I started a project some months ago and stored everything within a main directory. In my main directory "Project" there are several subdirectories containing different things: Project/paper contains a document written in LaTeX Project/sourcecode/RailsApp contains my rails app.

"Project" is GITified and there have been a lot of commits in both "paper" and "RailsApp" directory. Now, as I'd like to use cruisecontrol.rb for my "RailsApp" I wonder if there is a way to make a submodule out of "RailsApp" without losing the history.

Trudy answered 28/5, 2009 at 10:18 Comment(2)
Also a very good answer: stackoverflow.com/questions/359424/…Luthuli
Possible duplicate of Detach (move) subdirectory into separate Git repositoryBespatter
A
130

Nowadays there's a much easier way to do it than manually using git filter-branch: git subtree

Installation

NOTE git-subtree is now part of git (if you install contrib) as of 1.7.11, so you might already have it installed. You may check by executing git subtree.


To install git-subtree from source (for older versions of git):

git clone https://github.com/apenwarr/git-subtree.git

cd git-subtree
sudo rsync -a ./git-subtree.sh /usr/local/bin/git-subtree

Or if you want the man pages and all

make doc
make install

Usage

Split a larger into smaller chunks:

# Go into the project root
cd ~/my-project

# Create a branch which only contains commits for the children of 'foo'
git subtree split --prefix=foo --branch=foo-only

# Remove 'foo' from the project
git rm -rf ./foo

# Create a git repo for 'foo' (assuming we already created it on github)
mkdir foo
pushd foo
git init
git remote add origin [email protected]:my-user/new-project.git
git pull ../ foo-only
git push origin -u master
popd

# Add 'foo' as a git submodule to `my-project`
git submodule add [email protected]:my-user/new-project.git foo

For detailed documentation (man page), please read git-subtree.txt.

Ammonic answered 20/8, 2009 at 18:22 Comment(7)
But isn't the point of git-subtree to avoid using submodules? I mean, you're indeed the git-subtree's author (unless there's a nickname collision), but it looks like git-subtree changed, even though the command you show seems still valid. Am I getting this right?Bobo
git-subtree is now part of git (if you install contrib) as of 1.7.11Attainment
Well git rm -rf ./foo removes foo from HEAD but doesn't filter my-project's full history. Then, git submodule add [email protected]:my-user/new-project.git foo only makes foo a submodule starting from HEAD. In that respect, scripting filter-branch is superior as it permits to achieve "do as if if subdir was a submodule from the very beginning"Birecree
thx for this -- git subtree docs just a bit baffling, and this is (for me) the most obviously useful thing I wanted to do with it...Matney
Looks like this is Mac only :(Checklist
Note also that the submodule might not even be a necessary step, if for example you want to extract it to be a separate package to be used within the current project, you'd instead want to use the appropriate package manager -- e.g. npm link && cd ../orig-proj && npm link extracted-moduleStonewall
What happens if I want to keep commits for a folder that exist on multiple branches and those branches aren't yet ready to merge? For example, if I am using the Git Flow methodology for branching - I want to move my subfolder into a new repository and retain the Git Flow structure with its various commits on each branchMotorcade
K
41

Checkout git filter-branch.

The Examples section of the man page shows how to extract a sub-directory into it's own project while keeping all of it's history and discarding history of other files/directories (just what you're looking for).

To rewrite the repository to look as if foodir/ had been its project root, and discard all other history:

   git filter-branch --subdirectory-filter foodir -- --all

Thus you can, e.g., turn a library subdirectory into a repository of its own.
Note the -- that separates filter-branch options from revision options, and the --all to rewrite all branches and tags.

Kleper answered 30/5, 2009 at 17:28 Comment(3)
This worked well for me. Only downside I noticed was the result was a single master branch with all the commits.Dana
@aceofspades: why is that a downside?Phocomelia
For me the whole point of extracting commits from a git repo is that I want to retain the history.Dana
H
13

One way of doing this is the inverse - remove everything but the file you want to keep.

Basically, make a copy of the repository, then use git filter-branch to remove everything but the file/folders you want to keep.

For example, I have a project from which I wish to extract the file tvnamer.py to a new repository:

git filter-branch --tree-filter 'for f in *; do if [ $f != "tvnamer.py" ]; then rm -rf $f; fi; done' HEAD

That uses git filter-branch --tree-filter to go through each commit, run the command and recommit the resulting directories content. This is extremely destructive (so you should only do this on a copy of your repository!), and can take a while (about 1 minute on a repository with 300 commits and about 20 files)

The above command just runs the following shell-script on each revision, which you'd have to modify of course (to make it exclude your sub-directory instead of tvnamer.py):

for f in *; do
    if [ $f != "tvnamer.py" ]; then
        rm -rf $f;
    fi;
done

The biggest obvious problem is it leaves all commit messages, even if they are unrelated to the remaining file. The script git-remove-empty-commits, fixes this..

git filter-branch --commit-filter 'if [ z$1 = z`git rev-parse $3^{tree}` ]; then skip_commit "$@"; else git commit-tree "$@"; fi'

You need to use the -f force argument run filter-branch again with anything in refs/original/ (which basically a backup)

Of course this will never be perfect, for example if your commit messages mention other files, but it's about as close a git current allows (as far as I'm aware anyway).

Again, only ever run this on a copy of your repository! - but in summary, to remove all files but "thisismyfilename.txt":

git filter-branch --tree-filter 'for f in *; do if [ $f != "thisismyfilename.txt" ]; then rm -rf $f; fi; done' HEAD
git filter-branch -f --commit-filter 'if [ z$1 = z`git rev-parse $3^{tree}` ]; then skip_commit "$@"; else git commit-tree "$@"; fi'
Harald answered 30/5, 2009 at 18:29 Comment(1)
git filter-branch has (nowadays?) a built-in option to remove empty commits, namely --prune-empty. A better guide to git filter-branch is in the answers to this question: stackoverflow.com/questions/359424/…Bobo
H
4

Both CoolAJ86 and apenwarr answers are very similar. I went back and forth between the two trying to understand bits that were missing from either one. Below is a combination of them.

First navigate Git Bash to the root of the git repo to be split. In my example here that is ~/Documents/OriginalRepo (master)

# move the folder at prefix to a new branch
git subtree split --prefix=SubFolderName/FolderToBeNewRepo --branch=to-be-new-repo

# create a new repository out of the newly made branch
mkdir ~/Documents/NewRepo
pushd ~/Documents/NewRepo
git init
git pull ~/Documents/OriginalRepo to-be-new-repo

# upload the new repository to a place that should be referenced for submodules
git remote add origin [email protected]:myUsername/newRepo.git
git push -u origin master
popd

# replace the folder with a submodule
git rm -rf ./SubFolderName/FolderToBeNewRepo
git submodule add [email protected]:myUsername/newRepo.git SubFolderName/FolderToBeNewRepo
git branch --delete --force to-be-new-repo

Below is a copy of above with the customize-able names replaced and using https instead. Root folder is now ~/Documents/_Shawn/UnityProjects/SoProject (master)

# move the folder at prefix to a new branch
git subtree split --prefix=Assets/SoArchitecture --branch=so-package

# create a new repository out of the newly made branch
mkdir ~/Documents/_Shawn/UnityProjects/SoArchitecture
pushd ~/Documents/_Shawn/UnityProjects/SoArchitecture
git init
git pull ~/Documents/_Shawn/UnityProjects/SoProject so-package

# upload the new repository to a place that should be referenced for submodules
git remote add origin https://github.com/Feddas/SoArchitecture.git
git push -u origin master
popd

# replace the folder with a submodule
git rm -rf ./Assets/SoArchitecture
git submodule add https://github.com/Feddas/SoArchitecture.git
git branch --delete --force so-package
Hilaire answered 18/2, 2019 at 20:28 Comment(0)
P
3

If you want to transfer some subset of files to a new repository but keep the history, you're basically going to end up with a completely new history. The way this would work is basically as follows:

  1. Create new repository.
  2. For each revision of your old repository, merge the changes to your module into the new repository. This will create a "copy" of your existing project history.

It should be somewhat straightforward to automate this if you don't mind writing a small but hairy script. Straightforward, yes, but also painful. People have done history rewriting in Git in the past, you can do a search for that.

Alternatively: clone the repository, and delete the paper in the clone, delete the app in the original. This would take one minute, it's guaranteed to work, and you can get back to more important things than trying to purify your git history. And don't worry about the hard drive space taken up by redundant copies of history.

Pukka answered 28/5, 2009 at 10:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.