How to rewrite Git history so that all files are in a subdirectory?
Asked Answered
S

3

8

I would like to merge multiple Git repositories (let's say repoA and repoB) into one new repository. The new repository (repoNew) should contain each repoA and repoB in a separate subdirectory. Since I have up to now only worked locally I can do whatever I want to the repositories.

Under those circumstances it seems the standard approach would be to use git filter-branch to rewrite the history of each repoA and repoB to make it seem as if they had always been in a subfolder, and then merge them into repoNew.

The first step is what is bothering me. I am well aware of SO answers such as in How can I rewrite history so that all files, except the ones I already moved, are in a subdirectory? (Dan Moulding's answer), which is exactly what I want.

He suggested something along the lines of the following:

git filter-branch --prune-empty --tree-filter '
if [[ ! -e repoA ]]; then
    mkdir -p repoA
    git ls-tree --name-only $GIT_COMMIT | xargs -i mv {} repoA
fi'

The result should be that the folder structure under <repoA-GIT-base> should now be in <repoA-GIT-base>/repoA. However, this is not the case. The above command fails randomly at different commits with a message like "mv: cannot move 'src' into 'repoA/src'

How can avoid those wrong commits when rewriting the history as described?

EDIT:

You should consider excluding the .gitignore from the move like so:

git filter-branch --prune-empty --tree-filter '
if [[ ! -e repoA ]]; then 
    mkdir -p repoA;
    git ls-tree --name-only $GIT_COMMIT | 
    grep -ve '^.gitignore$' | 
    xargs -i mv {} repoA; 
fi'

The command still fails seemingly at random. I tried it several times and the failure "unable to move" occured at different commits each time. I observed that when I excluded the .gitignore the chance of making it through all commits seemed to increase. I was able to consecutively perform the move on all of my three different repositories without failure. When I tried it again just for fun on another throw-away copy of one of the repositories it failed again.

Since I also had difficulty sometimes to delete my throw-away copies due to a process allegedly using some files, the problem could have something to do with Windows 7 file access handling, but I am not in a position to make serious assumptions there.

To keep trying until it succeeds is of course ridiculous and will probably not work on repositories with a lot of commits (mine only had ~30).

Info: I used git-bash with git version 1.7.10.msysgit.1 on Windows 7 64-Bit Enterprise.

Sitting answered 4/3, 2014 at 11:2 Comment(2)
I post my answer to pretty much the same question from yesterday, just as an alternative to actually smashing the repos together.Baden
This is the way I originally wanted to do it, but I would like to preserve single file history. I also edited the question since I forgot something important, sorry.Sitting
W
2

I suspect you're looking for something along the lines of git subhistory. It's a very small project and doesn't seem to be well maintained, but it's also designed to do almost exactly what you describe. Give it a try!

Whatsoever answered 4/3, 2014 at 15:10 Comment(1)
Very nice project. This one did the trick for me. Only defect I found was that it can't merge a subproject into an empty repo (i.e. merging multiple projects into a new empty repo, each in a subdirectory), but git itself is flaky in this respect.Schwitzer
F
1

I have written a program based on libgit2 to filter git branches for another purpose which I changed slightly to do what you want here. You could try it.

It is in the subdir branch of git_filter at github:

https://github.com/slobobaby/git_filter/tree/subdir

I just tested it on our 100000 commit repository and it took 43 seconds.

I wrote the program because git filter-branch based solutions took days to weeks to finish.

The example configuration filters a "test" repository and puts everything in the "test" subdirectory - you can change this to do what you want.

Flann answered 4/3, 2014 at 15:7 Comment(0)
G
0

If you want to move all java files from the root folder to the repoA subfolder, you can use the following command.

c="mkdir -p repoA; find . -maxdepth 1  -name '*.java' -exec mv -f '{}' repoA/ \;"
FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch --prune-empty --tree-filter "$c"

A few places to watch out

  1. mkdir -p repoA: with -p, this command will not throw error if repoA already exists.
  2. You are better to use find rather than mv *.java repoA. If there are no java files, Bash does not expand the glob, so mv moves the literal file named *.java. shopt -s nullglob doesn't help because this option evaluate *.java to empty string, so that the mv command becomes mv repoA which is invalid.
  3. Use the -f option with mv to permit overwriting.
  4. Save the while tree-filter command in a variable c to avoid quoting issues.
  5. Set temporary envariable variable FILTER_BRANCH_SQUELCH_WARNING=1 to skip git's warning.
Glutamine answered 25/5 at 4:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.