Is it possible to tell Github that my branch was merged into upstream master?
Asked Answered
D

4

16

I used my local branch feature to create a PR for a github repo (I don't have write access to it). Later I decided I want to separate its last commit into a standalone PR, so I moved feature one commit back:

git checkout feature
git branch feature2
git reset --hard @~
git push -f

The first PR is merged upstream, so now I want to create the second PR:

git checkout master
git pull upstream master
git push origin
git checkout feature2
git rebase master

Unfortunately, it turns out that git lacks the information that feature was merged into master. Therfore, it doesn't realize that the nearest common base of feature2 and master is very close: it's just feature. Instead, rebase goes back all the way to common base of feature and master as if they were never merged. As a result, git rebase master becomes unnecessarily messy.

Why did Github lose the information that feature was merged into master through an upstream PR? Is there any way to provide Github with that information?

In the end, I had to resort to:

git checkout master
git checkout -b feature2_new
git cherry-pick feature2

Luckily I only needed to take care of a single commit. And even with a single commit, I think that a merge with the true base (if git knew about it) would be better than the cherry-pick because git would be able to use its knowledge of history to resolve more conflicts automatically.

Note that if I were to merge feature into master locally instead of doing a github PR, no information would have been lost. Of course, then my master would not be in sync with the upstream repo, so it would be pointless.

Demodulation answered 21/4, 2017 at 15:0 Comment(4)
Are you sure that upstream remote has proper URL set? Check with git remote -vTinderbox
@Tinderbox Yes. I see the merge commit in my local master. Even though that commit corresponds to a (squashed) version of my local commits in feature, there's no history linking them.Demodulation
Ah well maybe the maintainers of Github didn't merge your original PR as it was but used the recently introduced Github feature which does squash & rebase; then the commits that land in the repo have different SHA1s than your original commits, so rebase has to reapply every commit.Tinderbox
@Tinderbox yes, in the upstream master, the commit that was created by the PR has only one parent.Demodulation
D
7

Github now supports 3 techniques to merge pull requests:

  • Merge: creates a merge commit (no fast forward) + fetches all the original commits from the PR branch
  • Squash and merge: creates a single commit
  • Rebase and merge: creates as many commits as the PR branch, but they are rebased onto master

Only the regular merge preserves the knowledge that my local commits were part of the PR merged into master. If it was used, I wouldn't have encountered the problem I described in the question.

The other two techniques lose that knowledge - and there's nothing I can do to create it retroactively (without modifying the upstream master). That's the price to pay for a simpler history.

Intuitively, in order for git to know that an upstream master commit U is related to my local commit L, there needs to be an arrow pointing from U to L.

Conceptually, there are two ways to achieve this.

First, U can have two parents: one connecting it to L, the other connecting it to all the previous commits on the upstream master. This is precisely what Github merge technique does.

Second, U can have L as its sole parent, if L already points to all the previous commits on the upstream master. Github could have supported this by allowing fast-forward with its merge technique, but it chose not to.

If a Github PR is merged with either squash and merge or rebase and merge, all commits created on the upstream master have only one parent; there are no arrows between them and my local commits.

Edit:

Also I now believe that the loss of history I was asking about was no big deal in the first place. IIUC, the conflicts I would encounter with git cherry-pick are actually the same as the ones with git rebase if master was connected to feature2 through a regular merge commit. And if I had more than 1 commit split into a standalone PR, cherry-pick would handle that easily too.

Demodulation answered 21/4, 2017 at 16:39 Comment(4)
Any clarification as to why my self-answer is wrong would be welcome; if you make a separate answer to provide such clarification, even better.Demodulation
I think the problem is that the answer is yes, it is certainly possible to tell git that your commits were merged, but whoever did the merge took explicit steps to hide it, to erase the act from git's history and make the content changes appear completely unrelated.Maines
@Maines but whoever did the merge took explicit steps to hide it: the owners of the upstream repo followed one of the standard github workflow options for a merge (specifically, "squash and merge"). I suppose you can count that as "explicit steps to hide it", but that's precisely what my question is about: how is it possible to avoid hiding it (either through my efforts as a contributor or through the efforts of the upstream repo owners). I updated the question to say github instead of git, to emphasize that it's not about an arbitrary git repo.Demodulation
@DeepMistry you're welcome; hopefully my answer is correct (I was originally worried it might be wrong because it had 2 downvotes and no upvotes).Demodulation
M
8

To answer the questions as asked,

Is it possible to tell Github that my branch was merged into upstream master? [...] Why did Github lose the information that feature was merged into master through an upstream PR?

Yes, it's certainly possible to record the merge. That's usual.

Someone chose to tell git (and github) not to record this one, so the effects appeared on the upstream master branch without a trace of where they came from.

What you're looking at is the results of someone choosing to divorce the mainline history from the history you offered. Why they chose to do that, you'll have to find out from them. It's a common and widely-used option, for reasons any number of people will be very happy to opine about. Linearizing history looks nice but involves tradeoffs, there's downsides either way and different people and different situations will tilt the balance in their own ways.

Regardless, after fetching the result you've now got an unrecorded merge, which is fine, clean and dandy so long as you never try to merge subsequent work still based on the unrecorded parent.

What to do about it?

The option you chose, which could also and more flexibly (it handles multiple-commit feature1..feature2 histories) be full-spelled as

git rebase --onto master feature1 feature2

is generally cleanest: upstream abandoned your feature1 history, so you abandon it, rebasing your feature2 on the content they have now.

If for some reason you really don't want to abandon the feature1 history in your own repository -- maybe you've got more than just feature2 based off the old feature1 tip and the rebasing would start to get tedious -- you can also add a local record of the merge.

echo $(git rev-parse origin/master origin/master~ feature1) >>.git/info/grafts

This tells the local git that the current origin/master commit has as parents both its recorded first-parent commit and also the local feature1 commit. Since upstream has abandoned the feature1 commit and you haven't, all of git's machinery now works properly both here and upstream.

Different projects have different rules for what pull-request histories should be based on, some want everything based on some latest tip, others want everything based off a maintenance-base tag, others want bugfixes based on the commit that introduced the bug (I think this should be far more common). Some don't care much because everybody's so conversant with the code base that rebase-as-desired is still simplest.

But the important part here is that rebase-before-pushing is your last opportunity to be sure that what you're pushing is exactly right. It's an excellent habit to get into, and grafts work beautifully in that workflow.

Maines answered 11/5, 2017 at 18:5 Comment(1)
Cool, I thought the parent information is hard-coded in the (immutable) commits, so there's no way to change it. But I guess grafts represent a separate layer of arrows of the git DAG, above the layer of normal arrows in commits? And I assume this "grafts layer" is not used in git push, so it remains limited to my local repo? I did see some comments about grafts being a bit hacky, but it does seem that they are a perfect (and only) way to add the extra parentage information.Demodulation
D
7

Github now supports 3 techniques to merge pull requests:

  • Merge: creates a merge commit (no fast forward) + fetches all the original commits from the PR branch
  • Squash and merge: creates a single commit
  • Rebase and merge: creates as many commits as the PR branch, but they are rebased onto master

Only the regular merge preserves the knowledge that my local commits were part of the PR merged into master. If it was used, I wouldn't have encountered the problem I described in the question.

The other two techniques lose that knowledge - and there's nothing I can do to create it retroactively (without modifying the upstream master). That's the price to pay for a simpler history.

Intuitively, in order for git to know that an upstream master commit U is related to my local commit L, there needs to be an arrow pointing from U to L.

Conceptually, there are two ways to achieve this.

First, U can have two parents: one connecting it to L, the other connecting it to all the previous commits on the upstream master. This is precisely what Github merge technique does.

Second, U can have L as its sole parent, if L already points to all the previous commits on the upstream master. Github could have supported this by allowing fast-forward with its merge technique, but it chose not to.

If a Github PR is merged with either squash and merge or rebase and merge, all commits created on the upstream master have only one parent; there are no arrows between them and my local commits.

Edit:

Also I now believe that the loss of history I was asking about was no big deal in the first place. IIUC, the conflicts I would encounter with git cherry-pick are actually the same as the ones with git rebase if master was connected to feature2 through a regular merge commit. And if I had more than 1 commit split into a standalone PR, cherry-pick would handle that easily too.

Demodulation answered 21/4, 2017 at 16:39 Comment(4)
Any clarification as to why my self-answer is wrong would be welcome; if you make a separate answer to provide such clarification, even better.Demodulation
I think the problem is that the answer is yes, it is certainly possible to tell git that your commits were merged, but whoever did the merge took explicit steps to hide it, to erase the act from git's history and make the content changes appear completely unrelated.Maines
@Maines but whoever did the merge took explicit steps to hide it: the owners of the upstream repo followed one of the standard github workflow options for a merge (specifically, "squash and merge"). I suppose you can count that as "explicit steps to hide it", but that's precisely what my question is about: how is it possible to avoid hiding it (either through my efforts as a contributor or through the efforts of the upstream repo owners). I updated the question to say github instead of git, to emphasize that it's not about an arbitrary git repo.Demodulation
@DeepMistry you're welcome; hopefully my answer is correct (I was originally worried it might be wrong because it had 2 downvotes and no upvotes).Demodulation
R
2

The underlying root cause of your woes is that when the pull request for feature (the feature branch with one commit rolled back) completes, it results with a merge commit going into master. Here is a diagram showing what master and feature2 look like after the feature pull request into master has completed:

master:   ... A -- B -- C -- M
                         \
feature:                  D
                           \
feature2:                   E

Here, we can see that feature branched off from master at commit C, and feature2 is simply a continuation of feature with one extra commit. Merge commit M sits on the top of master and it represents all the extra work done in feature. Note that this is a merge commit, and hence has nothing to do with the history of feature2.

Next, you ran the following rebase of feature2 on master:

git checkout feature2
git rebase master

After this rebase, feature2 will look like this:

feature2: ... A -- B -- C -- M -- D' -- E'

Note carefully that the merge commit remains a part of the history. Even though functionally speaking it might seem unnecessary because commit D contains everything needed to make that merge commit, this commit still appears.

If you are wondering what you can do to avoid this, one option would be to have kept the history of master linear. The flaw was the pull request which ended with the merge commit. If, instead you had played the commits from feature directly on top of master then you would not have had this problem. Consider the following commands:

git checkout feature
git rebase master

Then, do a fast forward merge of master with feature:

git checkout master
git merge feature

This would have left master and feature2 looking like this:

master:   ... A -- B -- C -- D
feature2: ... A -- B -- C -- D -- E

Now, if you were to merge feature2 into master, Git would simply play the E commit, rather than going back to the original point whence master and feature diverged.

Ravish answered 21/4, 2017 at 15:19 Comment(11)
The problem here is the "merge" commit is actually a squash of feature, so git doesn't know how to associate the N commits to the single squashed commit. git properly handles merge commits, but the OP is describing a "merge" which actually rewrites the history of feature (in a subtle way).Microfilm
@AnthonySottile This is precisely what my answer says. The OP is being bluffed by thinking that rebase can untangle a merge commit, when in fact it can't. The handling of merge commits is not the real issue here.Ravish
So out of the the three options that git provides for merging PRs, which ones preserve the history information? I assume if the history is preserved, the git rebase master would apply just one commit.Demodulation
Merge commits appear, at least from what I see, as being the root cause of your problem. Merge commits and rebase are like oil and water, or maybe Democrats and Republicans. They can be mixed together, but they generally don't play friendly with each other. So, if you can let GitHub fast forward master during a pull request if possible, then this would keep the history linear, and your original rebase would have resulted in just that one commit being played on top of master.Ravish
@TimBiegeleisen A "merge commit" means a very specific thing in git, what OP is describing is not a "merge commit" but a squash.Microfilm
@AnthonySottile I want to separate its last commit into a standalone PR ... do you have any reason to believe that either this or the subsequent PRs did not result in a merge commit? Keep in mind that GitHub's default behavior is to make a merge commit during a pull request.Ravish
@TimBiegeleisen OP's comment above "Even though that commit corresponds to a (squashed) version of my local commits in feature, there's no history linking them."Microfilm
The commit created in the upstream master as a result of my PR has only one parent. In fact, as far as I can tell now, my problem wouldn't have occurred with the proper merge commits.Demodulation
@Demodulation Unless that commit has the same SHA-1 hash as the source commit in feature, you will still have the problem my answer describes. If you had multiple commits in feature going into this one commit in master, then you will certainly get the behavior you are seeing.Ravish
It can't possibly have the same SHA-1, of course (created at a different time, with different parentage); but if it had the SHA-1 of my commit as a parent, I think I wouldn't have encountered this issue.Demodulation
+1 - thank you, your answer and the discussion in the comments clarified some things for me, but after researching what github does, I disagree with you on one point. The only way to linearize the master branch in github upstream repo is to use "rebase and merge", and it will not get rid of the problem I described. OTOH, a simple "merge" would get rid of it, even though it doesn't result in a linear history. I'll summarize my understanding in my own answer.Demodulation
D
1

Yes normally. But git answers this question by looking at the history, not the changes. If you use Github's squash (i.e. nuke the history), then you forfeit the ability to leverage this history; namely git's ability to detect whether part of that history already exists in the upstream.

Building on the diagram created by Tim Biegeleisen:

master:   ... A -- B -- C -- M
                         \  /
feature:                  D
                           \
feature2:                   E

When you go to rebase feature2 onto master you should actually see this history:

master:   ... A -- B -- C -- M
                         \  /  \
feature:                  D     \
                                 \
feature2:                         E'

Because rebase will never recreate a commit that already exists in the destination. It will know that D is already in master's history.

When you rebased feature2 on master, you saw commits that logically were already present in master. This can only happen in the following scenario.

Prior to the merge, you rewrite some of the commits in feature:

                 D'   feature
               / 
... A -- B -- C 
               \  
                D 
                 \
                  E  feature2

perform the merge:

                 D'   feature
               /   \
... A -- B -- C --- M   master
               \  
                D 
                 \
                  E  feature2

then tried rebasing feature2 on top of master:

                 D'   feature
               /   \
... A -- B -- C --- M   master
                     \  
                      D'' 
                       \
                        E'  feature2
Diversification answered 11/5, 2017 at 15:0 Comment(4)
Prior to the merge, you rewrite some of the commits in feature - no, I did not. The only thing that I did was run the git commands that I posted in my question. I think @TimBiegeleisen answer explains well why the problem occurred.Demodulation
If the feature was squashed and then merged, then master contains a rewritten history of feature, i.e. the shared history between feature and feature2 is never merged into master (though the changes are), so when you rebase, git doesn't know that the history of feature2 is already in master and you see the commits.Diversification
Yes, and both my own answer and @TimBiegeleisen state that. The part in which these two answers disagree is under what conditions this shared history between feature and feature2 is merged into master. IIUC TimBiegeleisen suggested to linearize the master branch; my answer claims that it's not going to work, and that only a regular github merge would achieve that goal. Do you propose any different solution?Demodulation
There's one more interesting bit. If you're seeing commits like D'', those will only be included (in the rebase) if the inclusion of D'' has any effect (or has changes). So it makes me curious, why you are seeing those commits. If you want advice: Git gives you complete control over the structure of your history, rewrite it, make it useful for yourself in a few months; rarely squash, rarely merge, rebase often, only merge when you want to show a logical union.Diversification

© 2022 - 2024 — McMap. All rights reserved.