How to automatically rebase all children branches onto master after squashing and merging the parent branch?
Asked Answered
S

2

8

Building on this question, I have a workflow where I'm constantly making PRs on top of PRs to make it easier for others to review my work. Goal is to have smaller PR sizes. So I often end up with situations like the following:

                  G--H--I   <-- branch3
                 /    
          D--E--F   <-- branch2
         /    
  A--B--C       <-- branch1
 /
M          <-- master

And so on for N branches after branch3. The problem is, after I squash and merge branch1, I have to manually rebase branches 2, 3...N:

                  G--H--I   <-- branch3
                 /    
          D--E--F   <-- branch2
         /    
  A--B--C 
 /
M--S       <-- master, origin/master (branch1 changes are squashed in S)

In the above case, I have to run:

git checkout branch2 git rebase --onto master (SHA-1 of C)

git checkout branch3 git rebase --onto branch2 (SHA-1 of F)

And so on...

Is there a way to automate this process by rebasing all branches automatically with a script? What I can't figure out is a way to automatically detect the correct SHA-1 to pass as parameter for each rebase.

Splint answered 12/10, 2020 at 20:1 Comment(1)
The problem seems to be squashing commits that are part of other branches IMO. Destructive operations should only be used for fully encapsulated commits.Edify
F
0

There are a couple of fundamental problems, or maybe one fundamental problem, depending on how you look at it. That is:

  • branches do not have parent/child relationships, and/or
  • branches, in the sense you mean the word, don't exist. All that we have are branch names. The branches themselves are mirages, or something. (This doesn't really seem like the right way to look at it, but it helps shake one loose from the more rigid view of branches that most non-Git systems take.)

Let's start with a question that seems straightforward, but because Git is Git, is actually a trick question: which branch holds commits A-B-C?

Is there a way to automate this process by rebasing all branches automatically with a script? What I can't figure out is a way to automatically detect the correct SHA-1 to pass as parameter for each rebase.

There isn't a general solution to this problem. If you have exactly the situation you have drawn, however, there is a specific solution to your specific situation—but you'll have to write it yourself.

The answer to the trick question is that commits A-B-C are on every branch except master. A branch name like branch3 just identifies one particular commit, in this case commit I. That commit identifies another commit, in this case, commit H. Each commit always identifies some previous commit—or, in the case of a merge commit, two or more previous commits—and Git simply works backwards from the end. "The end" is precisely that commit whose hash ID is stored in the branch name.

Branch names lack parent/child relationships because every branch name can be moved or destroyed at any time without changing the hash ID stored in each other branch. New names can be created at any time too: the only constraint on creating a new name is that you must pick some existing commit for that name to point-to.

The commits have parent/child relationships, but the names do not. This leads to the solution to this specific situation, though. If commit Y is a descendant of commit X, that means there's some backwards path where we start at Y and can work our way back to X. This relationship is ordered—mathematically speaking, it forms a partial order over the set of commits—so that XY (X precedes Y, i.e., X is an ancestor of Y), then YX (Y succeeds X: Y is a descendant of X).

So we take our set of names, translate each name to a commit hash ID, and perform these is-ancestor tests. Git's "is-ancestor" operator actually tests for ≼ (precedes or is equal to), and the is-equal case occurs with:

...--X   <-- name1, name2

where both names select the same commit. If that could occur we would have to analyze what our code might do with that case. It turns out that this usually doesn't require any special work at all (though I won't bother proving this).

Having found the "last" commit—the one for which every commit comes "before" the commit in question—we now need to do our rebase operation. We have:

                  G--H--I   <-- branch3
                 /    
          D--E--F   <-- branch2
         /    
  A--B--C 
 /
M--S       <-- master, origin/master (branch1 changes are squashed in S)

just as you showed, and we know that S represents the A-B-C sequence because we picked commit C (via the name branch1) when we made S. Since the last commit is commit I, we want to copy—as rebase does—every commit from D through I, with the copies landing after S. It might be best if Git didn't move any of these branch names at all, during the copying operation, and we can get that to happen using Git's detached HEAD mode:

git checkout --detach branch3  # i.e., commit `I`

or:

git checkout <hash-of-I>       # detach and get to commit `I`

or:

git switch --detach ...        # `git switch` always requires the --detach

which gets us:

                  G--H--I   <-- branch3, HEAD
                 /    
          D--E--F   <-- branch2
         /    
  A--B--C 
 /
M--S       <-- master, origin/master

We now run git rebase --onto master branch1 if the name branch1 is still available, or git rebase --onto master <hash-of-C> if not. This copies everything as desired:

                  G--H--I   <-- branch3
                 /    
          D--E--F   <-- branch2
         /    
  A--B--C 
 /
M--S       <-- master, origin/master
    \
     D'-E'-F'
            \
             G'-H'-I'  <-- HEAD

Now all (?) we need to do is go back through those same sets of branch names and count how far they are along the chain of original commits. Because of the way Git works—backwards—we'll do this starting from wherever they end and working backwards to commit C. For this particular drawing, that's 3 for branch2 and 6 for branch3. We count how many commits we copied as well, which is also of course 6. So we subtract 3 from 6 for branch2, and 6 from 6 for branch3. That tells us where we should move those branch names now: zero steps back from I' for branch3, and three steps back from I' for branch2. So now we make one last loop through each name and re-set each name as appropriate.

(Then we probably should pick some name to git checkout or git switch to.)

There are some challenges here:

  • Where did we get this set of names? The names are branch1, branch2, branch3, and so on, but in reality they won't be so obviously related: why do we move branch fred but not branch barney?

  • How did we know that branch1 is the one that we shouldn't use here, but should use as the "don't copy this commit" argument to our git rebase-with-detached-HEAD?

  • How exactly do we do this is-ancestor / is-descendant test?

    This question actually has an answer: git merge-base --is-ancestor is the test. You give it two commit hash IDs and it reports whether the left-hand one is an ancestor of the right-hand one: git merge-base --is-ancestor X Y tests XY. Its result is its exit status, suitable for use in shell scripts with the if built in.

  • How do we count commits?

    This question also has an answer: git rev-list --count stop..start starts at the start commit and works backwards. It stops working backwards when it reaches stop or any of its ancestors. It then reports a count of the number of commits visited.

  • How do we move a branch name? How do we figure out which commit to land on?

    This one is easy: git branch -f will let us move an existing branch name, as long as we do not have that name currently checked-out. As we are on a detached HEAD after the copying process, we have no name checked-out, so all names can be moved. Git itself can do the counting-back, using the tilde and numeric suffix syntax: HEAD~0 is commit I', HEAD~1 is commit H', HEAD~2 is commit G', HEAD~3 is commit F', and so on. Given a number $n we just write HEAD~$n, so git branch -f $name HEAD~$n does the job.

You still have to solve the first two questions. The solution to that will be specific to your particular situation.

Worth pointing out, and probably the reason no one has written a proper solution for this—I wrote my own approximate solution many years ago but abandoned it many years ago as well—is that this whole process breaks down if you don't have this very specific situation. Suppose that instead of:

                  G--H--I   <-- branch3
                 /    
          D--E--F   <-- branch2
         /    
  A--B--C       <-- branch1
 /
M          <-- master

you begin with:

               G--H--I   <-- branch3
              /    
          D--E--F   <-- branch2
         /    
  A--B--C       <-- branch1
 /
M          <-- master

This time, ending at commit I and copying all commits that reach back through, but do not include, commit C fails to copy commit F. There is no F' to allow you to move branch name branch2 after copying D-E-G-H-I to D'-E'-G'-H'-I'.

This problem was pretty major, back in the twenty-aughts and twenty-teens. But git rebase has been smartened up a bunch, with the newfangled -r (--rebase-merges) interactive rebase mode. It now has almost all the machinery for a multi-branch rebase to Just Work. There are a few missing pieces that are still kind of hard here, but if we can solve the first two problems—how do we know which branch names to multi-rebase in the first place—we could write a git multirebase command that would do the whole job.

Ferocious answered 12/10, 2020 at 21:24 Comment(2)
Thank you so much for such a detailed answer! Very helpful. This begs the question, though: What is your workflow today? In my case, I'm trying to optimize for small, easy-to-review PRs; This forces me to have many PRs built one on top of another. After the first one is merged, it's tiresome to update all the others that follow manually. Is there a better approach?Splint
I'm not really sure there's anything better. My experience with two or three step features (that call for 2 or 3 branches like this) has been that during the review of the earlier phases, enough stuff gets changed anyway that the later ones have to be rewritten, not just rebased. For that process, I tend to have a foo.0 (original), foo.1 (rewrite #1), foo.2 (rewrite #2), etc., and I make the next branch and then use git rebase -i to do the copying. The older ones remain left behind if / as needed.Ferocious
C
7

Intro

I know this thread is old, but there is a new option to solve this problem specifically. This was included in Git 2.38, October 2022.

The new kid in the block

It is called --update-refs, and it is specifically designed to solve this problem.

Solution

So, instead of your example:

git checkout branch2
git rebase --onto master
# switch branches and keep on rebasing

Now, you can do the following:

git checkout branch3 # you need to be on the 'farthest' branch
git rebase master --update-refs # this will rebase the whole tree
git push origin : --force-with-lease # this will push the updated branches

Walla! The whole tree will accommodate.

Just make sure you are using GIT 2.39 or later since version 2.38 had a bug on it option.

Set this option as default too

If you wish this to be the default behavior when you rebase, to avoid adding the flag each time, you can set the option rebase.updateRefs in .gitconfig

git config --global --add rebase.updateRefs true
Camel answered 27/3, 2023 at 16:45 Comment(0)
F
0

There are a couple of fundamental problems, or maybe one fundamental problem, depending on how you look at it. That is:

  • branches do not have parent/child relationships, and/or
  • branches, in the sense you mean the word, don't exist. All that we have are branch names. The branches themselves are mirages, or something. (This doesn't really seem like the right way to look at it, but it helps shake one loose from the more rigid view of branches that most non-Git systems take.)

Let's start with a question that seems straightforward, but because Git is Git, is actually a trick question: which branch holds commits A-B-C?

Is there a way to automate this process by rebasing all branches automatically with a script? What I can't figure out is a way to automatically detect the correct SHA-1 to pass as parameter for each rebase.

There isn't a general solution to this problem. If you have exactly the situation you have drawn, however, there is a specific solution to your specific situation—but you'll have to write it yourself.

The answer to the trick question is that commits A-B-C are on every branch except master. A branch name like branch3 just identifies one particular commit, in this case commit I. That commit identifies another commit, in this case, commit H. Each commit always identifies some previous commit—or, in the case of a merge commit, two or more previous commits—and Git simply works backwards from the end. "The end" is precisely that commit whose hash ID is stored in the branch name.

Branch names lack parent/child relationships because every branch name can be moved or destroyed at any time without changing the hash ID stored in each other branch. New names can be created at any time too: the only constraint on creating a new name is that you must pick some existing commit for that name to point-to.

The commits have parent/child relationships, but the names do not. This leads to the solution to this specific situation, though. If commit Y is a descendant of commit X, that means there's some backwards path where we start at Y and can work our way back to X. This relationship is ordered—mathematically speaking, it forms a partial order over the set of commits—so that XY (X precedes Y, i.e., X is an ancestor of Y), then YX (Y succeeds X: Y is a descendant of X).

So we take our set of names, translate each name to a commit hash ID, and perform these is-ancestor tests. Git's "is-ancestor" operator actually tests for ≼ (precedes or is equal to), and the is-equal case occurs with:

...--X   <-- name1, name2

where both names select the same commit. If that could occur we would have to analyze what our code might do with that case. It turns out that this usually doesn't require any special work at all (though I won't bother proving this).

Having found the "last" commit—the one for which every commit comes "before" the commit in question—we now need to do our rebase operation. We have:

                  G--H--I   <-- branch3
                 /    
          D--E--F   <-- branch2
         /    
  A--B--C 
 /
M--S       <-- master, origin/master (branch1 changes are squashed in S)

just as you showed, and we know that S represents the A-B-C sequence because we picked commit C (via the name branch1) when we made S. Since the last commit is commit I, we want to copy—as rebase does—every commit from D through I, with the copies landing after S. It might be best if Git didn't move any of these branch names at all, during the copying operation, and we can get that to happen using Git's detached HEAD mode:

git checkout --detach branch3  # i.e., commit `I`

or:

git checkout <hash-of-I>       # detach and get to commit `I`

or:

git switch --detach ...        # `git switch` always requires the --detach

which gets us:

                  G--H--I   <-- branch3, HEAD
                 /    
          D--E--F   <-- branch2
         /    
  A--B--C 
 /
M--S       <-- master, origin/master

We now run git rebase --onto master branch1 if the name branch1 is still available, or git rebase --onto master <hash-of-C> if not. This copies everything as desired:

                  G--H--I   <-- branch3
                 /    
          D--E--F   <-- branch2
         /    
  A--B--C 
 /
M--S       <-- master, origin/master
    \
     D'-E'-F'
            \
             G'-H'-I'  <-- HEAD

Now all (?) we need to do is go back through those same sets of branch names and count how far they are along the chain of original commits. Because of the way Git works—backwards—we'll do this starting from wherever they end and working backwards to commit C. For this particular drawing, that's 3 for branch2 and 6 for branch3. We count how many commits we copied as well, which is also of course 6. So we subtract 3 from 6 for branch2, and 6 from 6 for branch3. That tells us where we should move those branch names now: zero steps back from I' for branch3, and three steps back from I' for branch2. So now we make one last loop through each name and re-set each name as appropriate.

(Then we probably should pick some name to git checkout or git switch to.)

There are some challenges here:

  • Where did we get this set of names? The names are branch1, branch2, branch3, and so on, but in reality they won't be so obviously related: why do we move branch fred but not branch barney?

  • How did we know that branch1 is the one that we shouldn't use here, but should use as the "don't copy this commit" argument to our git rebase-with-detached-HEAD?

  • How exactly do we do this is-ancestor / is-descendant test?

    This question actually has an answer: git merge-base --is-ancestor is the test. You give it two commit hash IDs and it reports whether the left-hand one is an ancestor of the right-hand one: git merge-base --is-ancestor X Y tests XY. Its result is its exit status, suitable for use in shell scripts with the if built in.

  • How do we count commits?

    This question also has an answer: git rev-list --count stop..start starts at the start commit and works backwards. It stops working backwards when it reaches stop or any of its ancestors. It then reports a count of the number of commits visited.

  • How do we move a branch name? How do we figure out which commit to land on?

    This one is easy: git branch -f will let us move an existing branch name, as long as we do not have that name currently checked-out. As we are on a detached HEAD after the copying process, we have no name checked-out, so all names can be moved. Git itself can do the counting-back, using the tilde and numeric suffix syntax: HEAD~0 is commit I', HEAD~1 is commit H', HEAD~2 is commit G', HEAD~3 is commit F', and so on. Given a number $n we just write HEAD~$n, so git branch -f $name HEAD~$n does the job.

You still have to solve the first two questions. The solution to that will be specific to your particular situation.

Worth pointing out, and probably the reason no one has written a proper solution for this—I wrote my own approximate solution many years ago but abandoned it many years ago as well—is that this whole process breaks down if you don't have this very specific situation. Suppose that instead of:

                  G--H--I   <-- branch3
                 /    
          D--E--F   <-- branch2
         /    
  A--B--C       <-- branch1
 /
M          <-- master

you begin with:

               G--H--I   <-- branch3
              /    
          D--E--F   <-- branch2
         /    
  A--B--C       <-- branch1
 /
M          <-- master

This time, ending at commit I and copying all commits that reach back through, but do not include, commit C fails to copy commit F. There is no F' to allow you to move branch name branch2 after copying D-E-G-H-I to D'-E'-G'-H'-I'.

This problem was pretty major, back in the twenty-aughts and twenty-teens. But git rebase has been smartened up a bunch, with the newfangled -r (--rebase-merges) interactive rebase mode. It now has almost all the machinery for a multi-branch rebase to Just Work. There are a few missing pieces that are still kind of hard here, but if we can solve the first two problems—how do we know which branch names to multi-rebase in the first place—we could write a git multirebase command that would do the whole job.

Ferocious answered 12/10, 2020 at 21:24 Comment(2)
Thank you so much for such a detailed answer! Very helpful. This begs the question, though: What is your workflow today? In my case, I'm trying to optimize for small, easy-to-review PRs; This forces me to have many PRs built one on top of another. After the first one is merged, it's tiresome to update all the others that follow manually. Is there a better approach?Splint
I'm not really sure there's anything better. My experience with two or three step features (that call for 2 or 3 branches like this) has been that during the review of the earlier phases, enough stuff gets changed anyway that the later ones have to be rewritten, not just rebased. For that process, I tend to have a foo.0 (original), foo.1 (rewrite #1), foo.2 (rewrite #2), etc., and I make the next branch and then use git rebase -i to do the copying. The older ones remain left behind if / as needed.Ferocious

© 2022 - 2024 — McMap. All rights reserved.