Mercurial -> Git migration with preserving commit hashes
Asked Answered
L

1

5

Is there any existing tool that is able to export a mercurial repository to a git repository while preserving the commit hashes?

I'm aware of hg-git or fast-export.git, but those create new commits with new hashes (and there doesn't seem to be any option to configure this). We have hundreds of mercurial repositories hosted on Bitbucket with large amount of hooks, download links etc. dependent on exact hashes. Being able to preserve hashes would save us considerable amount of efforts needed to update all external resources.

Loach answered 28/10, 2019 at 18:4 Comment(0)
H
10

It's not possible.

The hash ID of a Git object is a cryptographic checksum of the underlying object data. In the case of a commit object, that's a cryptographic checksum of the string commit, a space, the size in bytes of the rest of the data expressed in decimal, an ASCII NUL, and then the headers, log message text, and trailers.

The hash ID of a Mercurial commit is a cryptographic checksum of an appropriate part of the Mercurial data for that commit (Mercurial's data structures are different so some commit data do not participate in the checksum).

The only known way today to construct a specific hash ID from some known data—as you would have in a Git commit—is to add a "junk" data area, then spend many CPU-years computing hashes with different contents in the junk-data. The team that created shattered used 110 GPU-years of compute-time to find one duplicate hash ID.

Hoogh answered 28/10, 2019 at 18:21 Comment(7)
But does git relay on this fact in any way? If I'd fork the git source; change the way how it generates hashes (to be able to inject them); create (or import) repo this way, would anything break in such a repo if used from proper git afterwardsLoach
Git relies on this in every way. (Mercurial relies on the commit hashes for distribution. Git relies on hashes not just for commit transfer, but also for commit existence and contents: commits contain tree hashes, which contain more hashes, and so on.)Hoogh
Hmm - too bad for us :-/. Before I accept. Is there any way to export the list of hashes (ideally just the hashes) in deterministic chronological way - so that we can at least build reliable map of hg hash -> git hash?Loach
Note, by the way, that there is an ongoing project to move Git from SHA-1 to one of the SHA-256 varieties. This is a major internal change with a lot of ramifications. Once it's done, Git might be able to support N different hashes, and perhaps you could add a "mercurial hash" that acts as an auxiliary entity. I'd bet this would be pretty difficult though.Hoogh
And even if you could, it wouldn't be a Git repository anymore. It would be a something else repository. GitHub would reject this, for instance, because it would try to verify the hashes that you used, and reject your push because it was corrupt.Denitrify
Ah - as for exporting the hashes, use git rev-list to find them all. Add --parents and build a graph. Mercurial's graph should translate directly into Git's graph, in general, so that would allow you to build a map. (But note that hg tags require an extra commit, so some depends on whether your importer imports the extra commit.)Hoogh
Thanks for the tip. We use tags extensively - so will need to find exporter that export those as separate commits in git (so that easy 1:1 hashes mapping is preserved)Loach

© 2022 - 2024 — McMap. All rights reserved.