What are the merge semantics of git fast-import streams?
Asked Answered
M

1

9

I wrote, and maintain, an open-source tool called reposurgeon that edits version-control repository histories and can be used to move project histories between VCSes. Recently I shipped full support for reading Subversion dump files and repos. But there is one thing reposurgeon doesn't do very well yet, and that is translate Subversion branch merges done by copying to git-style DAG merges.

In order to get this part right, I need to understand the semantics of merge commits in a git fast-import stream much better than I do. My questions are about which version of content is supposed to be visible after a merge commit.

Of course, file modifications attached the merge commit make their content visible there. My questions are about paths not touched by the commit.

  1. If a path only has content on only one commit chain ancestral to the merge, I assume that content is supposed to be visible. Is that correct?

  2. If a path has content in more than one commit chain ancestral to the merge, which version will be visible?

  3. If a file is deleted along some paths to the merge, what rule predicts when it will be deleted in the merge revision?

Maggs answered 5/11, 2012 at 1:53 Comment(0)
P
8

if I understand your question, you're wondering exactly what shortcuts fast-import lets you take when streaming the contents of a commit into it.

As far as I can tell from reading git/fast-import.c and the manual page, fast-import initializes the tree for a new commit from the tree that was provided in the "from" command. "filemodify" and friends begin from that state to construct the new tree that will be committed at the end.

The fast-import command does not appear to change the tree at all when encountering "merge" commands; if you want to include changes from parents other than the first, you need to specify exactly which files you want to bring in. You can use marks or object hashes to name the other-branch files for "filemodify" though.


edit: Ah, let's go deeper into the git model.

In git, a commit points to a tree that represents the entire contents of the directory hierarchy being tracked, as it stood at the time of that commit. Commits do not carry any information about how they're different from their parents; the theory is that you can reconstruct the diff if you need it by comparing these trees.

A merge commit is distinguished from non-merges only by the fact that it has two or more parents. It still has a single tree, recording exactly what's in the version that resulted from performing the merge. It still does not record anything about how its author combined the parents into a merged version. The git "porcelain" commands like git log and git diff do magic to reconstruct a useful description of what happened.

Conceptually, to create a new commit object, you need to describe the complete mapping of paths to file contents that goes in that commit. (Much cleverness goes into making that efficient and simple instead of awful.)

The git fast-import command provides a shortcut for the common case: Usually the VCS you're exporting from can tell you how this commit was formed as some kind of diff from the most recent commit on the same branch. In that case, you can effectively encode the diff into fast-import's stream format for a simpler and faster import.

But you have to remember it's only a shortcut for re-constructing the entire tree from scratch.

Piping answered 5/11, 2012 at 2:19 Comment(9)
Er, if fast-import doesn't change the tree in response to a merge command, then what does the merge command mean?Maggs
In fast-import, it just adds second (or third, fourth, ...) parent commit to the commit object you're currently constructing.Piping
But what does adding a parent commit mean if it doesn't result in a change to the tree at the merge revision? What information is that parent link carrying?Maggs
@Maggs It's carrying the information "this branch (actually the commit at the tip of that branch) was merged in here", so that you can see it in the history, and so that the commits on the branch are still referred to by something even if you later delete the branch ref itself.Vinegar
@ESR, I've added a deeper explanation of git's model that I hope will help.Piping
A merge commit is just like any other commit except for having multiple parents; like any other commit it refers to a tree that represents the state of the files in the repo after the commit; in the case of a merge it's that tree that holds the results of any automatic or manual merging of the parents. In the case of fast-import, you would feed that info in with filemodify (M) commands. If you use from + merge then you can specify changes relative to the from ancestor; if you use merge + merge then you start with an empty tree and have to provide the states of all files.Vinegar
hobbs: OK, I think you just answered the question I was trying to ask. The model is completely different than I assumed.Maggs
I'm off to write a patch for the git fast-import page now. This needs to be documented so nobody else goes astray the way I did.Maggs
@Maggs thanks. I think Jamey deserves credit for coming up with the right answer at about the same time as I did (and putting it into an answer instead of a comment); if you see anything he missed, let him know so he can earn your checkmark.Vinegar

© 2022 - 2024 — McMap. All rights reserved.