It seems likely that you were doing fast-forward operations before. The git merge
command will do this instead of merging, provided conditions are correct:
- A fast-forward needs to be possible.
- You are required to avoid the
--no-ff
option, which would disable the fast-forward.
This all started right after I reverted commit a13adadf
, fixed it and republished it.
This must have created a branch. There's a problem with this word—"branch", that is—that will lead you astray here, but the graph snippet you show in your question indicates that this is in fact what happened.
How can I get my repo to 'ignore' (i.e. stop displaying) the merge commits?
If you just want to avoid displaying them, there may be some option to your viewer to do this.
If you want to go back to not making them—the situation you were in before—you need to eliminate the branch you made.
Long: What's going on here (and why the word "branch" is problematic)
The first thing to keep in mind is that Git is all about commits. People new to Git, or even those who have been using it for quite a while, often think that Git is about files, or branches. But it isn't, really: it's about commits.
Each commit is numbered, but the numbers are not simple counting numbers. Instead, each commit gets a random-looking—but not actually random at all—hash ID. These things are big and ugly, and Git will abbreviate them at times (as for instance your a13adadf
), but each one of these is a numeric ID for some Git object—in this case, for a Git commit.
Git has a big database of all of its objects, which it can look up by ID. If you give Git a commit number, it finds that commit's contents, by the ID.
The contents of a commit come in two parts:
First, there's a snapshot of all the files that Git knows about. This tends to be the bulk of most commits, except for one thing: the files are stored in a special, read-only, Git-only, compressed and de-duplicated format. When you make a new commit in which most of the files are mostly the same as some previous commit, the new commit doesn't actually store the files again. It just re-uses the existing files. In other words, a particular version of a particular file gets amortized across however many commits re-use it. The re-use is safe because the files are read-only.
Besides the saved snapshot, each commit stores some metadata: information about the commit itself. This includes the name and email address of the person who made the commit, and some date-and-time information, and so on. Notably, the metadata for each commit also stores, for Git's use, the commit number—the hash ID—of the commit or commits that come right before this particular commit. Git calls this the parent or, for a merge commit, parents of the commit.
What this does is allow Git to work backwards. So that is how Git does work, backwards. If we have a long string of commits, all in a row, like this:
... <-F <-G <-H
where H
stands for the actual hash ID of the last commit in the chain, Git will start with commit H
, reading it out of its object database. Inside commit H
, Git will find all the saved files, and also the hash ID of earlier commit G
. If Git needs it, Git will use this hash ID to read commit G
out of the object database. That gives Git the earlier snapshot, and also the hash ID of even-earlier commit F
.
If Git needs to, Git will use hash ID F
(as stored in G
) to read F
, and of course F
contains another parent hash ID as well. So in this manner, Git can start with the last commit and work backwards.
This leaves Git with one problem: how will it quickly find the hash ID of the last commit in the chain? This is where branch names come in.
A branch name just holds the hash ID of the last commit
Given the above—and getting a bit lazy on purpose and drawing the connection from commit to commit as a line, instead of an arrow going from child to parent—we can now draw the master
branch in like this:
...--F--G--H <-- master
The name master
simply contains the actual hash ID of existing commit H
.
Let's add another name, develop
, that also contains hash ID H
, like this:
...--F--G--H <-- develop, master
Now we have a small problem: which name are we going to use? Here, Git uses the special name HEAD
to remember which branch name to use, so let's update the drawing a bit:
...--F--G--H <-- develop, master (HEAD)
This represents the result after git checkout master
: the current branch name is now master
, and master
selects commit H
, so that's the commit we're using (and the branch name that we're using too).
If we run git checkout develop
now, Git will switch to that branch. That name still identifies commit H
, so there's nothing else to change, but now we have:
...--F--G--H <-- develop (HEAD), master
If we now make a new commit, Git will:
- package up all the files it knows about (this is where Git's index or staging area comes in, but we won't cover it at all here);
- add appropriate metadata, including your name as author and committer and "now" as the time stamps, but importantly, making commit
H
the parent of the new commit;
- use all of this to make a new commit, which we'll call
I
.
There's one more thing Git will do but let's draw this part now. The result is:
...--F--G--H
\
I
What about the two names? That's the one more thing: Git will write I
's hash ID into the current name. If that's develop
, we get this:
...--F--G--H <-- master
\
I <-- develop (HEAD)
Note that master
stayed in place, but the name develop
has moved to point to the newest commit.
When two names identify the same commit, either name selects that commit
Note that initially, when master
and develop
both selected commit H
, it didn't matter, in one sense, which one you used with git checkout
. Either way you got commit H
as the current commit. But when you make the new commit, now it matters, because Git is only going to update one branch name. No one knows what the new commit's hash ID will be (because it depends in part on the exact second at which you make the commit), but once it's made, develop
will hold that hash ID, if develop
is the current name.
Note that if you now git checkout master
and make another new commit, the name master
will be the one updated this time:
...--F--G--H--J <-- master (HEAD)
\
I <-- develop
Let's assume for the moment that you have not done this, though.
Fast-forward
With the earlier picture in mind, let's run git checkout master
now, and go back to working with commit H
:
...--F--G--H <-- master (HEAD)
\
I <-- develop
In this state, let's run git merge develop
now.
Git will do the things it does for git merge
—see below—and find that the merge base is commit H
, which is also the current commit. The other commit, I
, is ahead of commit H
. These are the conditions under which Git can do a fast-forward operation.
A fast-forward is not an actual merge. What happens is that Git says to itself: If I did a real merge, I'd get a commit whose snapshot matches commit I
. So instead, I'll take a short cut, and just check out commit I
while dragging the name master
along with me. The result looks like this:
...--F--G--H
\
I <-- develop, master (HEAD)
and there is now no reason to keep the kink in the drawing—we could make this all one straight row.
Real merges
Sometimes, the above kind of fast-forward-instead-of-merge trick just doesn't work. Suppose you start with:
...--G--H <-- develop, master (HEAD)
and make two new commits I-J
:
I--J <-- master (HEAD)
/
...--G--H <-- develop
Now you git checkout develop
and make two more commits K-L
:
I--J <-- master
/
...--G--H
\
K--L <-- develop (HEAD)
At this point, no matter which name you give to git checkout
, if you run git merge
on the other name, there's no way to go forward from J
to L
, or vice versa. From J
, you have to back up to I
, then go down to shared commit H
, before you can go forward to K
and then L
.
This kind of merge, then, cannot be a fast-forward operation. Git will instead do a real merge.
To perform a merge, Git uses:
- the current (
HEAD
) commit: let's make that J
by doing git checkout master
first;
- the other commit you name: let's use
git merge develop
to choose commit L
;
- and one more commit, that Git finds on its own.
This last—or really, first—commit is the merge base, and the merge base is defined in terms of a graph operation known as Lowest Common Ancestor, but the short and understandable version is that Git works backwards from both commits to find the best shared common ancestor. In this case, that's commit H
: the point where the two branches diverge. While commits G
and earlier are also shared, they're not as good as commit H
.
So Git will now:
- compare the merge base
H
snapshot with the HEAD
/J
snapshot, to see what we changed on master
;
- compare the merge base
H
snapshot with the other/L
snapshot, to see what they changed on develop
; and
- combine the two sets of changes, and apply those to the merge base snapshot.
This is the process of merging, or to merge as a verb. Git will do all of this on its own, if it can. If it succeeds, Git will make a new commit, which we will call M
:
I--J
/ \
...--G--H M <-- master (HEAD)
\ /
K--L <-- develop
Note that new commit M
points back to both commits J
and L
. This is in fact what makes this new commit a merge commit. Because a fast-forward is literally not possible, Git must make this commit, in order to achieve the merge.
You were initially doing fast-forwards
You started out with this kind of situation:
...--G--H <-- master, develop (HEAD)
which then produced:
...--G--H <-- master
\
I <-- develop (HEAD)
You used git checkout master; git merge develop
or similar to get:
...--G--H--I <-- master (HEAD), develop
after which you can repeat the process, making first develop
, then both develop
and master
, name new commit J
:
...--G--H--I--J <-- master (HEAD), develop
But at this point you did something different: you did a git revert
while on master
.
The git revert
command makes a new commit. The new commit's snapshot is like the previous snapshot with one commit backed-out, as it were, so now you have:
K <-- master (HEAD)
/
...--G--H--I--J <-- develop
The snapshot in K
probably matches that in I
(so it re-uses all those files), but the commit number is all-new.
From here, you did git checkout develop
and wrote a better commit than J
, which we can call L
:
K <-- master
/
...--G--H--I--J--L <-- develop (HEAD)
Then you went back to master
and ran git merge develop
. This time, Git had to make a new merge commit. So it did just that:
K--M <-- master (HEAD)
/ /
...--G--H--I--J--L <-- develop
Now, when you go back to develop
and make new commits, you get the same pattern:
K--M <-- master
/ /
...--G--H--I--J--L--N <-- develop (HEAD)
When you switch back to master
and git merge develop
, Git must once again make a new merge commit. Fast-forwarding is not possible, and instead you get:
K--M--O <-- master (HEAD)
/ / /
...--G--H--I--J--L--N <-- develop
What you can do about this
Suppose you now run git checkout develop && git merge --ff-only master
. The first step selects develop
as the current branch. The second asks to merge with master
. This extra flag, --ff-only
, tells Git: but only do that if you can do it as a fast-forward.
(We already believe that Git can do this as a fast-forward, so this --ff-only
flag is just a safety check. I think it's a good idea, though.)
Since a fast-forward is possible, you'll get this:
K--M--O <-- master, develop (HEAD)
/ / /
...--G--H--I--J--L--N
Note how the name develop
has moved forward, to point to commit O
, without adding a new merge commit. This means that the next commit you make on develop
will have O
as its parent, like this:
P <-- develop (HEAD)
/
K--M--O <-- master
/ / /
...--G--H--I--J--L--N
If you now git checkout master; git merge develop
you'll get a fast-forward, with both names identifying new commit P
, and you'll be back in that situation in which committing on develop
allows a fast-forward.
Note that by doing this, you're essentially claiming that you don't need the name develop
after all
If your work-pattern is:
- make new commit
- drag
master
forward to match
then all you need to do is make your new commits while on master
.
There's nothing inherently wrong with doing the new commits on another name, and if this is only sometimes your work pattern, that's probably a good habit: using lots of branch names will help you later, and being in the habit of making a new name before starting on work is good. You might want to consider using a name more meaningful than just develop
, though.
In any case, note that what Git cares about here are the commits. The branch names are just ways you can have Git help you find specific commits: the commit found by each name is the point at which you're doing work with that name. The actual branching, if there is any, is a function of the commits you make.
To put it another way: To make commits form into branches, you need branch names, but having branch names alone does not make commits form into branches. That is:
...--F--G--H <-- master
\
I--J <-- develop
gives you two "last" commits, but a single linear chain ending at commit J
. In one sense, there are two branches, one of which ends at H
and one of which ends at J
, but in another, there is only one branch, that ends at J
. We can add more names, pointing to existing commits:
...--F <-- old
\
G--H <-- master
\
I--J <-- develop
and now there are three names (and three "last" commits) but the actual set of commits in the repository has not changed. We just drew F
on a line by itself so as to make the name old
point to it.