There are a couple of fundamental problems, or maybe one fundamental problem, depending on how you look at it. That is:
- branches do not have parent/child relationships, and/or
- branches, in the sense you mean the word, don't exist. All that we have are branch names. The branches themselves are mirages, or something. (This doesn't really seem like the right way to look at it, but it helps shake one loose from the more rigid view of branches that most non-Git systems take.)
Let's start with a question that seems straightforward, but because Git is Git, is actually a trick question: which branch holds commits A-B-C
?
Is there a way to automate this process by rebasing all branches automatically with a script? What I can't figure out is a way to automatically detect the correct SHA-1 to pass as parameter for each rebase.
There isn't a general solution to this problem. If you have exactly the situation you have drawn, however, there is a specific solution to your specific situation—but you'll have to write it yourself.
The answer to the trick question is that commits A-B-C
are on every branch except master
. A branch name like branch3
just identifies one particular commit, in this case commit I
. That commit identifies another commit, in this case, commit H
. Each commit always identifies some previous commit—or, in the case of a merge commit, two or more previous commits—and Git simply works backwards from the end. "The end" is precisely that commit whose hash ID is stored in the branch name.
Branch names lack parent/child relationships because every branch name can be moved or destroyed at any time without changing the hash ID stored in each other branch. New names can be created at any time too: the only constraint on creating a new name is that you must pick some existing commit for that name to point-to.
The commits have parent/child relationships, but the names do not. This leads to the solution to this specific situation, though. If commit Y is a descendant of commit X, that means there's some backwards path where we start at Y and can work our way back to X. This relationship is ordered—mathematically speaking, it forms a partial order over the set of commits—so that X ≺ Y (X precedes Y, i.e., X is an ancestor of Y), then Y ≻ X (Y succeeds X: Y is a descendant of X).
So we take our set of names, translate each name to a commit hash ID, and perform these is-ancestor tests. Git's "is-ancestor" operator actually tests for ≼ (precedes or is equal to), and the is-equal case occurs with:
...--X <-- name1, name2
where both names select the same commit. If that could occur we would have to analyze what our code might do with that case. It turns out that this usually doesn't require any special work at all (though I won't bother proving this).
Having found the "last" commit—the one for which every commit comes "before" the commit in question—we now need to do our rebase operation. We have:
G--H--I <-- branch3
/
D--E--F <-- branch2
/
A--B--C
/
M--S <-- master, origin/master (branch1 changes are squashed in S)
just as you showed, and we know that S
represents the A-B-C
sequence because we picked commit C
(via the name branch1
) when we made S
. Since the last commit is commit I
, we want to copy—as rebase does—every commit from D
through I
, with the copies landing after S
. It might be best if Git didn't move any of these branch names at all, during the copying operation, and we can get that to happen using Git's detached HEAD mode:
git checkout --detach branch3 # i.e., commit `I`
or:
git checkout <hash-of-I> # detach and get to commit `I`
or:
git switch --detach ... # `git switch` always requires the --detach
which gets us:
G--H--I <-- branch3, HEAD
/
D--E--F <-- branch2
/
A--B--C
/
M--S <-- master, origin/master
We now run git rebase --onto master branch1
if the name branch1
is still available, or git rebase --onto master <hash-of-C>
if not. This copies everything as desired:
G--H--I <-- branch3
/
D--E--F <-- branch2
/
A--B--C
/
M--S <-- master, origin/master
\
D'-E'-F'
\
G'-H'-I' <-- HEAD
Now all (?) we need to do is go back through those same sets of branch names and count how far they are along the chain of original commits. Because of the way Git works—backwards—we'll do this starting from wherever they end and working backwards to commit C
. For this particular drawing, that's 3 for branch2
and 6 for branch3
. We count how many commits we copied as well, which is also of course 6. So we subtract 3 from 6 for branch2
, and 6 from 6 for branch3
. That tells us where we should move those branch names now: zero steps back from I'
for branch3
, and three steps back from I'
for branch2
. So now we make one last loop through each name and re-set each name as appropriate.
(Then we probably should pick some name to git checkout
or git switch
to.)
There are some challenges here:
Where did we get this set of names? The names are branch1
, branch2
, branch3
, and so on, but in reality they won't be so obviously related: why do we move branch fred
but not branch barney
?
How did we know that branch1
is the one that we shouldn't use here, but should use as the "don't copy this commit" argument to our git rebase
-with-detached-HEAD?
How exactly do we do this is-ancestor / is-descendant test?
This question actually has an answer: git merge-base --is-ancestor
is the test. You give it two commit hash IDs and it reports whether the left-hand one is an ancestor of the right-hand one: git merge-base --is-ancestor X Y
tests X ≼ Y
. Its result is its exit status, suitable for use in shell scripts with the if
built in.
How do we count commits?
This question also has an answer: git rev-list --count stop..start
starts at the start
commit and works backwards. It stops working backwards when it reaches stop
or any of its ancestors. It then reports a count of the number of commits visited.
How do we move a branch name? How do we figure out which commit to land on?
This one is easy: git branch -f
will let us move an existing branch name, as long as we do not have that name currently checked-out. As we are on a detached HEAD after the copying process, we have no name checked-out, so all names can be moved. Git itself can do the counting-back, using the tilde and numeric suffix syntax: HEAD~0
is commit I'
, HEAD~1
is commit H'
, HEAD~2
is commit G'
, HEAD~3
is commit F'
, and so on. Given a number $n
we just write HEAD~$n
, so git branch -f $name HEAD~$n
does the job.
You still have to solve the first two questions. The solution to that will be specific to your particular situation.
Worth pointing out, and probably the reason no one has written a proper solution for this—I wrote my own approximate solution many years ago but abandoned it many years ago as well—is that this whole process breaks down if you don't have this very specific situation. Suppose that instead of:
G--H--I <-- branch3
/
D--E--F <-- branch2
/
A--B--C <-- branch1
/
M <-- master
you begin with:
G--H--I <-- branch3
/
D--E--F <-- branch2
/
A--B--C <-- branch1
/
M <-- master
This time, ending at commit I
and copying all commits that reach back through, but do not include, commit C
fails to copy commit F
. There is no F'
to allow you to move branch name branch2
after copying D-E-G-H-I
to D'-E'-G'-H'-I'
.
This problem was pretty major, back in the twenty-aughts and twenty-teens. But git rebase
has been smartened up a bunch, with the newfangled -r
(--rebase-merges
) interactive rebase mode. It now has almost all the machinery for a multi-branch rebase to Just Work. There are a few missing pieces that are still kind of hard here, but if we can solve the first two problems—how do we know which branch names to multi-rebase in the first place—we could write a git multirebase
command that would do the whole job.