Despite involving two subparts, I'm asking this as a combined question because the way it's broken down into parts isn't what's important. I'm open to different ways to achieve what I want as long as the end result retains all the meaningful history and ability to check out, study, and build/test historical versions. The goal is to retire hg and the subrepo model that's been used so far and move to a unified tree in git, but without sacrificing history.
What I'm starting with is a Mercurial repository that consists of some top-level code and a number of subrepositories where the bulk of interesting history lies. The subrepos have some branching/merges, but nothing too crazy. The final result I want to achieve is a single git repository, with no submodules, such that:
For each commit in the original top-level hg repo, there is a git commit that checks out exactly the same tree as you'd get checking out the corresponding hg commit with all its references subrepo commits.
These git commits corresponding to successive top-level hg commits are descendants of each other, with commits corresponding to all relevant subrepo commits in between.
The basic idea I have for how to achieve this is to iterate over all top-level hg commits, and for each top-level commit that changes .hgsubstate
, also iterate over all paths from the old revision to the new revision for the submodule (possibly involving branching). At each step:
- Check out the appropriate hg revisions for top-level and all subrepos.
- Delete everything from the git index.
- Stage everything checked out from hg to the git index.
- Use
git-write-tree
andgit-commit-tree
to generate a commit with the desired parents, using authorship, date, and commit message from the corresponding hg commit. - Record the correspondence between the new git commit and hg commits for use in generating future commits' parents.
Should this work? Is there a better way to achieve what I want, perhaps doing the subrepo collapse with hg first? The biggest thing I'm not clear on is how to perform the desired iteration, so practical advice for how to achieve it would be great.
One additional constraint: the original repos involve content which can't be published (this an additional git-filter-branch
step once the basic conversion is done) so solutions that involve uploading the repo for processing by a third party are not viable.
git fast-import
was made for jobs like this. – Geum.hgsubs
and.hgsubstate
to find the subrepositories, and recursively import them into your main git repository, starting from e.g.hg manifest --debug
output. Once you've got them all in one git repo, you can construct arbitrary additional histories any way you want. This is going to be very much faster than the read-tree/write-tree manipulations. The elaboration needed at that point is only exactly what do you want as your resulting history? – Geum