Why do I suddenly have a merge commit in my pushes?
Asked Answered
C

1

5

Well, I seem to have gone and mucked something up.

Until just recently, I used to be able to do a merge commit and then push to origin without that separate commit showing up. Now that it does, the merge commit is all I can see at my pipeline:

Pipeline commits after

Before this started, only the manual commit was pushed to origin (or at least showed as such):

Pipeline commits before

Here's Team Explorer (VS 2019 v16.6.5), after the behavior changed:

Team Explorer

...and here's my local branch history:

Branch history

See the change?

This all started right after I reverted commit a13adadf, fixed it and republished it. Now I've got some sort of weird branching effect going on, and I don't know how to get things back to where they were before. (I tried researching the problem, but the signal-to-noise ratio is very low when searching on anything related to merge commit.)

How can I get my repo to 'ignore' (i.e. stop displaying) the merge commits?

(Note: I'm the only dev working on this repo.)

Copepod answered 13/8, 2020 at 23:8 Comment(0)
N
11

It seems likely that you were doing fast-forward operations before. The git merge command will do this instead of merging, provided conditions are correct:

  1. A fast-forward needs to be possible.
  2. You are required to avoid the --no-ff option, which would disable the fast-forward.

This all started right after I reverted commit a13adadf, fixed it and republished it.

This must have created a branch. There's a problem with this word—"branch", that is—that will lead you astray here, but the graph snippet you show in your question indicates that this is in fact what happened.

How can I get my repo to 'ignore' (i.e. stop displaying) the merge commits?

If you just want to avoid displaying them, there may be some option to your viewer to do this.

If you want to go back to not making them—the situation you were in before—you need to eliminate the branch you made.

Long: What's going on here (and why the word "branch" is problematic)

The first thing to keep in mind is that Git is all about commits. People new to Git, or even those who have been using it for quite a while, often think that Git is about files, or branches. But it isn't, really: it's about commits.

Each commit is numbered, but the numbers are not simple counting numbers. Instead, each commit gets a random-looking—but not actually random at all—hash ID. These things are big and ugly, and Git will abbreviate them at times (as for instance your a13adadf), but each one of these is a numeric ID for some Git object—in this case, for a Git commit.

Git has a big database of all of its objects, which it can look up by ID. If you give Git a commit number, it finds that commit's contents, by the ID.

The contents of a commit come in two parts:

  • First, there's a snapshot of all the files that Git knows about. This tends to be the bulk of most commits, except for one thing: the files are stored in a special, read-only, Git-only, compressed and de-duplicated format. When you make a new commit in which most of the files are mostly the same as some previous commit, the new commit doesn't actually store the files again. It just re-uses the existing files. In other words, a particular version of a particular file gets amortized across however many commits re-use it. The re-use is safe because the files are read-only.

  • Besides the saved snapshot, each commit stores some metadata: information about the commit itself. This includes the name and email address of the person who made the commit, and some date-and-time information, and so on. Notably, the metadata for each commit also stores, for Git's use, the commit number—the hash ID—of the commit or commits that come right before this particular commit. Git calls this the parent or, for a merge commit, parents of the commit.

What this does is allow Git to work backwards. So that is how Git does work, backwards. If we have a long string of commits, all in a row, like this:

... <-F <-G <-H

where H stands for the actual hash ID of the last commit in the chain, Git will start with commit H, reading it out of its object database. Inside commit H, Git will find all the saved files, and also the hash ID of earlier commit G. If Git needs it, Git will use this hash ID to read commit G out of the object database. That gives Git the earlier snapshot, and also the hash ID of even-earlier commit F.

If Git needs to, Git will use hash ID F (as stored in G) to read F, and of course F contains another parent hash ID as well. So in this manner, Git can start with the last commit and work backwards.

This leaves Git with one problem: how will it quickly find the hash ID of the last commit in the chain? This is where branch names come in.

A branch name just holds the hash ID of the last commit

Given the above—and getting a bit lazy on purpose and drawing the connection from commit to commit as a line, instead of an arrow going from child to parent—we can now draw the master branch in like this:

...--F--G--H   <-- master

The name master simply contains the actual hash ID of existing commit H.

Let's add another name, develop, that also contains hash ID H, like this:

...--F--G--H   <-- develop, master

Now we have a small problem: which name are we going to use? Here, Git uses the special name HEAD to remember which branch name to use, so let's update the drawing a bit:

...--F--G--H   <-- develop, master (HEAD)

This represents the result after git checkout master: the current branch name is now master, and master selects commit H, so that's the commit we're using (and the branch name that we're using too).

If we run git checkout develop now, Git will switch to that branch. That name still identifies commit H, so there's nothing else to change, but now we have:

...--F--G--H   <-- develop (HEAD), master

If we now make a new commit, Git will:

  • package up all the files it knows about (this is where Git's index or staging area comes in, but we won't cover it at all here);
  • add appropriate metadata, including your name as author and committer and "now" as the time stamps, but importantly, making commit H the parent of the new commit;
  • use all of this to make a new commit, which we'll call I.

There's one more thing Git will do but let's draw this part now. The result is:

...--F--G--H
            \
             I

What about the two names? That's the one more thing: Git will write I's hash ID into the current name. If that's develop, we get this:

...--F--G--H   <-- master
            \
             I   <-- develop (HEAD)

Note that master stayed in place, but the name develop has moved to point to the newest commit.

When two names identify the same commit, either name selects that commit

Note that initially, when master and develop both selected commit H, it didn't matter, in one sense, which one you used with git checkout. Either way you got commit H as the current commit. But when you make the new commit, now it matters, because Git is only going to update one branch name. No one knows what the new commit's hash ID will be (because it depends in part on the exact second at which you make the commit), but once it's made, develop will hold that hash ID, if develop is the current name.

Note that if you now git checkout master and make another new commit, the name master will be the one updated this time:

...--F--G--H--J   <-- master (HEAD)
            \
             I   <-- develop

Let's assume for the moment that you have not done this, though.

Fast-forward

With the earlier picture in mind, let's run git checkout master now, and go back to working with commit H:

...--F--G--H   <-- master (HEAD)
            \
             I   <-- develop

In this state, let's run git merge develop now.

Git will do the things it does for git merge—see below—and find that the merge base is commit H, which is also the current commit. The other commit, I, is ahead of commit H. These are the conditions under which Git can do a fast-forward operation.

A fast-forward is not an actual merge. What happens is that Git says to itself: If I did a real merge, I'd get a commit whose snapshot matches commit I. So instead, I'll take a short cut, and just check out commit I while dragging the name master along with me. The result looks like this:

...--F--G--H
            \
             I   <-- develop, master (HEAD)

and there is now no reason to keep the kink in the drawing—we could make this all one straight row.

Real merges

Sometimes, the above kind of fast-forward-instead-of-merge trick just doesn't work. Suppose you start with:

...--G--H   <-- develop, master (HEAD)

and make two new commits I-J:

          I--J   <-- master (HEAD)
         /
...--G--H   <-- develop

Now you git checkout develop and make two more commits K-L:

          I--J   <-- master
         /
...--G--H
         \
          K--L   <-- develop (HEAD)

At this point, no matter which name you give to git checkout, if you run git merge on the other name, there's no way to go forward from J to L, or vice versa. From J, you have to back up to I, then go down to shared commit H, before you can go forward to K and then L.

This kind of merge, then, cannot be a fast-forward operation. Git will instead do a real merge.

To perform a merge, Git uses:

  • the current (HEAD) commit: let's make that J by doing git checkout master first;
  • the other commit you name: let's use git merge develop to choose commit L;
  • and one more commit, that Git finds on its own.

This last—or really, first—commit is the merge base, and the merge base is defined in terms of a graph operation known as Lowest Common Ancestor, but the short and understandable version is that Git works backwards from both commits to find the best shared common ancestor. In this case, that's commit H: the point where the two branches diverge. While commits G and earlier are also shared, they're not as good as commit H.

So Git will now:

  • compare the merge base H snapshot with the HEAD/J snapshot, to see what we changed on master;
  • compare the merge base H snapshot with the other/L snapshot, to see what they changed on develop; and
  • combine the two sets of changes, and apply those to the merge base snapshot.

This is the process of merging, or to merge as a verb. Git will do all of this on its own, if it can. If it succeeds, Git will make a new commit, which we will call M:

          I--J
         /    \
...--G--H      M   <-- master (HEAD)
         \    /
          K--L   <-- develop

Note that new commit M points back to both commits J and L. This is in fact what makes this new commit a merge commit. Because a fast-forward is literally not possible, Git must make this commit, in order to achieve the merge.

You were initially doing fast-forwards

You started out with this kind of situation:

...--G--H   <-- master, develop (HEAD)

which then produced:

...--G--H   <-- master
         \
          I   <-- develop (HEAD)

You used git checkout master; git merge develop or similar to get:

...--G--H--I   <-- master (HEAD), develop

after which you can repeat the process, making first develop, then both develop and master, name new commit J:

...--G--H--I--J   <-- master (HEAD), develop

But at this point you did something different: you did a git revert while on master.

The git revert command makes a new commit. The new commit's snapshot is like the previous snapshot with one commit backed-out, as it were, so now you have:

                K   <-- master (HEAD)
               /
...--G--H--I--J   <-- develop

The snapshot in K probably matches that in I (so it re-uses all those files), but the commit number is all-new.

From here, you did git checkout develop and wrote a better commit than J, which we can call L:

                K   <-- master
               /
...--G--H--I--J--L   <-- develop (HEAD)

Then you went back to master and ran git merge develop. This time, Git had to make a new merge commit. So it did just that:

                K--M   <-- master (HEAD)
               /  /
...--G--H--I--J--L   <-- develop

Now, when you go back to develop and make new commits, you get the same pattern:

                K--M   <-- master
               /  /
...--G--H--I--J--L--N   <-- develop (HEAD)

When you switch back to master and git merge develop, Git must once again make a new merge commit. Fast-forwarding is not possible, and instead you get:

                K--M--O   <-- master (HEAD)
               /  /  /
...--G--H--I--J--L--N   <-- develop

What you can do about this

Suppose you now run git checkout develop && git merge --ff-only master. The first step selects develop as the current branch. The second asks to merge with master. This extra flag, --ff-only, tells Git: but only do that if you can do it as a fast-forward.

(We already believe that Git can do this as a fast-forward, so this --ff-only flag is just a safety check. I think it's a good idea, though.)

Since a fast-forward is possible, you'll get this:

                K--M--O   <-- master, develop (HEAD)
               /  /  /
...--G--H--I--J--L--N

Note how the name develop has moved forward, to point to commit O, without adding a new merge commit. This means that the next commit you make on develop will have O as its parent, like this:

                        P   <-- develop (HEAD)
                       /
                K--M--O   <-- master
               /  /  /
...--G--H--I--J--L--N

If you now git checkout master; git merge develop you'll get a fast-forward, with both names identifying new commit P, and you'll be back in that situation in which committing on develop allows a fast-forward.

Note that by doing this, you're essentially claiming that you don't need the name develop after all

If your work-pattern is:

  • make new commit
  • drag master forward to match

then all you need to do is make your new commits while on master.

There's nothing inherently wrong with doing the new commits on another name, and if this is only sometimes your work pattern, that's probably a good habit: using lots of branch names will help you later, and being in the habit of making a new name before starting on work is good. You might want to consider using a name more meaningful than just develop, though.

In any case, note that what Git cares about here are the commits. The branch names are just ways you can have Git help you find specific commits: the commit found by each name is the point at which you're doing work with that name. The actual branching, if there is any, is a function of the commits you make.

To put it another way: To make commits form into branches, you need branch names, but having branch names alone does not make commits form into branches. That is:

...--F--G--H   <-- master
            \
             I--J   <-- develop

gives you two "last" commits, but a single linear chain ending at commit J. In one sense, there are two branches, one of which ends at H and one of which ends at J, but in another, there is only one branch, that ends at J. We can add more names, pointing to existing commits:

...--F   <-- old
      \
       G--H   <-- master
           \
            I--J   <-- develop

and now there are three names (and three "last" commits) but the actual set of commits in the repository has not changed. We just drew F on a line by itself so as to make the name old point to it.

Noakes answered 14/8, 2020 at 4:47 Comment(7)
Holy cow! Now that's a fountain of knowledge. Excellent. Beautiful. Clear as a bell. Solved my problem exactly. Worth a bounty, so have one on me. Thank you very much.Copepod
"It seems likely that you were doing fast-forward operations before." It turns out that's correct, although I didn't realize it at the time. Now with your help I know what to look out for if/when this happens again. But I noticed something... shouldn't this syntax git checkout master; git merge develop be git checkout master && git merge develop instead? I tried the former and received some Git error messages. The latter ran fine.Copepod
"You might want to consider using a name more meaningful than just develop, though." You are correct. FYI I normally do so, but in this single case I'm working on code that must be tested in a "production" environment (i.e. after installation). Thus, I'm committing and pushing every few minutes or more often; multiple branches would multiply the job's complexity exponentially. In other words, I need to stay "close to the metal," as it were. That said, much more of this and I might get fed up and just switch to master for the duration (now that I have your solution as perspective).Copepod
...or I could create a feature branch and temporarily set that branch as a build trigger in my pipeline. Hm, I'll have to give that some thought. That might be the smart thing to do.Copepod
@InteXX: The sh / bash syntax cmd1 && cmd2 means run cmd2 if and only if cmd1 returns a successful exit status, and is generally a good idea. I have no idea how to do that in Windows shells though. :-) The cmd1; cmd2 means run cmd1, then run cmd2 even if cmd1 fails, so yes, the && is better here (because git checkout can fail and if it does fail, that will stop the git merge).Noakes
"The sh / bash syntax..." Aha. I suspected as much, especially given the semicolon. In Windows the syntax only requires one &, but the behavior is the same as bash's ;—one command right after the other, regardless of pass or fail. I don't think it's possible in Windows to do the if-then-else you describe, but I could be mistaken. Anyway, thanks again for the detailed explanation—it helps a lot. Several things came clear to me while reading it. Fine job there.Copepod
How long did it take you to write this?Copepod

© 2022 - 2024 — McMap. All rights reserved.