Git says branch is merged, but changes are apparently not present
Asked Answered
S

1

20

I've worked myself into a situation that is not making sense to me. I'll try to describe it as best I can.

I have a development branch and I've merged master into it via git checkout develpment && git merge master. I didn't get any merge conflicts here.

There is a specific commit that I'm interested in, let's say it's abcd123. When I run git branch --contains abcd123, it reports that both development and master contain abcd123. When I do git log, it shows abcd123 in the list of commits, both on development and on master.

git show abcd123 shows that contains changes to two files. But I can't seem to find these changes. When I look at the files, I don't see those changes, neither on development nor on master. When I inspect git log -- path/to/file1, I don't see abcd123, same for git log -- path/to/file2.

What's going on here? How can the commit be present, but the changes are apparently not there?

It is possible that abcd123 was originally introduced in another branch (other than development) that was merged into master. I don't know if that could make a difference.

By the way, when I try git checkout master && git merge development (after merging master into development as shown above) I get a lot of merge conflicts, including file1 and file2. So that seems to show that master was not actually merged into development -- shouldn't git merge development succeed if git merge master was already executed? To cause more confusion, git branch --merged development says that master has been merged into development. I guess that is consistent with git merge master ....

Any advice at this point is much appreciated.

EDIT: At this point it appears that the problem is due to a merge that failed or was messed up in some way. If anyone is still reading, I think the direction that torek's answer is going seems most fruitful.

Slipshod answered 11/5, 2017 at 23:20 Comment(7)
Is it possible, that the changes in abcd123 are overwritten in newer commits?Settler
Maybe you can run git log --pretty=format:"%h %s" --graph to understand better how the merges were doneTangy
@Settler I believe that the changes have not been overwritten.Slipshod
@FabriPautasso Thanks for the suggestion. I've looked at a commit graph and it seems to show that abcd123 is an ancestor of HEAD on both development and master. I don't know what else I might look for.Slipshod
Did you try git log --full-history -- file_name to see if the commit appears?Tangy
@FabriPautasso Hmm, yes, when I try full-history, the commit appears. But it does not appear if full-history is omitted. man git-log says only that full-history "does not prune some history". How does default git log decide what history to prune? EDIT: Reading further in man git-log, I see there is some discussion about history pruning, although it's not clear to me yet how it applies in this case.Slipshod
From git-scm.com/docs/git-log: Default mode Simplifies the history to the simplest history explaining the final state of the tree. Simplest because it prunes some side branches if the end result is the same (i.e. merging branches with the same content) --full-history Same as the default mode, but does not prune some history.Tangy
B
43

This answer is long, because there is a lot going on here. The TL;DR summary, though, is probably that you want --full-history.

There are multiple separate issues here that need to be untangled:

  • The phrase "show changes", or what you see in git log -p or git show, often leads people down the wrong path in interpreting what Git stores.
  • The git log command sometimes lies to you (especially around merges), purely in the interest of not overwhelming you with useless information.
  • What git merge does can be a bit tricky. It's straightforward in principle, but most people don't get it right away.

Ordinary commits

Let's look first at Git's most common, ordinary commits. A commit, in Git, is a snapshot (of file-contents by file names). You make one commit, then you change a few things and git add a changed file or two and make a second commit, and the second commit has all the same files as the first commit, except for the ones overwritten by the git add.

It's worth drawing these as parts of Git's commit graph. Note that each commit has its own unique hash ID (one of those impossible-to-remember strings like 93aefc or badf00d or cafedad), plus the ID of a parent (or previous) commit. The parent commit hash lets Git string these things together, in a backwards fashion:

... <-E <-F <-G ...

where each uppercase letter stands in for a hash ID, and the arrows cover the idea that each commit "points back" to its parent. Normally we don't need to draw in the internal arrows (they're not very interesting in the end) so I draw these as:

...--E--F--G   <-- master

The name master, however, still deserves an arrow, because the commit to which this arrow points will change over time.

If we pick a commit like G and view it without using git log -p or git show, we will see every file in full, exactly as it is stored in the commit. In fact, that's what happens when we use git checkout to check it out: we extract all the files in full, into the work-tree, so that we can see and work on them. But when we view it with git log -p or git show, Git doesn't show us everything; it only shows us what changed.

To do this, Git extracts both the commit and its parent commit, and then runs a big git diff on the pair. Whatever is different between the parent F and the child G, that's what changed, so that's what git log -p or git show shows you.

Merge commits

This is all well and good for ordinary, single-parent commits, but it doesn't work for merge commits. A merge commit is simply any commit with two (or more, but we won't worry about this case) parent commits. You get these by doing a successful git merge, and we might draw that like this. We start with the two branches (which fork off from some starting-point):

       H--I   <-- development (HEAD)
      /
...--E--F--G   <-- master

and then we run git merge master.1 Git now tries to combine the two branches. If it succeeds, it makes one new commit that has two parents:

       H--I--J   <-- development (HEAD)
      /     /
...--E--F--G   <-- master

The name development now points to the new merge commit J. The parenthesized (HEAD) here denotes that this is our current branch. That tells us which name gets moved: we make a new commit—including any new merge commit—and development is the branch-name that changes to point to the new commit.

If we don't worry about how the contents (the various committed files) of the merge commit are determined, this is all pretty straightforward. The merge commit is like any other commit: it has a complete snapshot, a bunch of files with contents. We check out the merge commit, and those contents get in our work-tree as usual.

But when we go to view the merge commit ... well, Git normally diffs a commit against its parent. The merge has two parents, one for each branch. Which one should Git diff against, to show you changes?

Here, git log and git show take different approaches. When you view the commit with git log, it shows nothing at all by default. It won't choose I-vs-J, and it won't choose G-vs-J either! It just shows nothing at all, for git log -p.


1In some Git workflows, merging from master into any other branch is discouraged. It can work, though, and since you did, let's run with it.


Viewing merge commits

The git show command does something different and better. It runs two git diffs, one for I-vs-J and one for G-vs-J. It then tries to combine the two diffs, showing you only what changed in both. That is, where J is different from I but not in a particularly interesting way, Git suppresses the difference. Where J is different from G but not in a particularly interesting way, Git suppresses this difference as well. This is probably the most useful mode, so it's what git show shows. It's still quite imperfect, but nothing you can do here is perfect for all purposes.

You can make git log do this same thing by adding --cc to the git log -p options. Or, you can change how either git log or git show shows a merge commit by using -m (note one dash for -m, two for --cc, by the way).

The -m option tells Git that for viewing purposes, it should split the merge. Now Git compares I to J, to show you everything you brought in through merging G. Then Git compares G to the split-off extra version of J, to show you everything you brought in through merging I. The resulting diff is usually very large but (or because) it shows you everything.

There are more ways to try to find what happened to some file, but we need to hold off a moment before getting to your:

git log -- path/to/file1

command. Just as we saw git log skipping merges, it may skip even more things here (but there are ways to stop Git from doing that).

Actually making merges: how Git builds the merge's contents

Let's look at that pre-merge graph again:

       H--I   <-- development (HEAD)
      /
...--E--F--G   <-- master

Here, there are two commits on branch development that are not on branch master, and two commits on master that are not on development. Commit E (along with all earlier commits) is on both branches. Commit E is special, though: it's the most recent2 commit that's on both branches. Commit E is what Git calls the merge base.

To perform a merge, Git effectively runs two git diff commands:

git diff E I
git diff E G

The first produces a set of changes to various files, which are "what we did on branch development". It is, in effect, the sum of H and I if they are treated as patches. The second produces a—probably different—set of changes to various files, "what we did on master", and as before it's effectively the sum of F and G as patches.

Git then tries to combine these two diffs. Whatever is completely independent between them, it takes both sets of changes, applies them to the contents of commit E, and uses that as the result. Wherever the two change-sets touch the same line in the same file, Git tries to see if it can just take one copy of that change. If both fixed the spelling of a word on line 33 of file README, Git can just take one copy of the spelling fix. But wherever the two change-sets touch the same line of the same file, but make a different change, Git declares a "merge conflict", throws its metaphorical hands in the air, and makes you fix up the resulting mess.

If you (or whoever does the merge) wants to, they can stop Git from committing the merge result even if Git thinks it all went swimmingly: git merge --no-commit master makes Git stop after combining everything. At this point, you can open work-tree files in your editor, change them, write them back, git add the changed file, and git commit the merge to put something in the merge that did not come from any of the three inputs (base and two branch-tips).

In any case, the key to understanding all of this is the concept of the merge base commit. If you sit down and draw the commit graph, the merge base is usually pretty obvious unless the graph gets way out of hand (which happens a lot, actually). You can also have Git find the merge base for you—or merge bases, plural, in some cases:

git merge-base --all master development

This prints out a hash ID. In our hypothetical case here, that would be the hash ID of commit E. Once you have that, you can run git diff manually, to see what happened to every file. Or you can run an individual-file git diff:

git diff E development -- path/to/file1
git diff E master -- path/to/file1

Note that if you replace the names master and development with the hash IDs of the commits that were current before you did a git merge, this works even after the merge. That will tell you what Git thought it should combine for path/to/file1. That, in turn, will tell you whether Git did not see the change, or whether whoever made the merge overrode Git, or handled a conflicting merge incorrectly.

Once you have a merge, a subsequent merge will find a different merge base:

       H--I--J----K   <-- development
      /     /
...--E--F--G--L--M   <-- master

We look now at both branch tips and work our way backwards through history (in the leftward direction), following both forks of a merge like J, to find the first commit we can get to from both branch tips. Starting at K, we go back to J, then to both I and G. Starting at M, we go back to L, then to G. We find G to be on both branches, so commit G is the new merge base. Git will run:

git diff G K
git diff G M

to get the two change-sets to apply to merge-base commit G.


2"Most recent" here refers to commit graph order, rather than time stamps, although it's probably also the commit with the newest time stamp that is on both branches.


Git log and simplifications that lie, again

We already saw that git log -p just skips right over merge commits. You don't see any diff at all, as if the merge were totally magic. But when you run:

git log -- path/to/file1

something else, even more insidious, happens. This is described, albeit rather opaquely, in the (long) git log documentation under the section titled History Simplification.

In our example above, suppose git log is walking from K backwards. It finds J, which is a merge. It then inspects both I and G, comparing each to J after excluding all but the one file you are looking at. That is, it's just comparing path/to/file1 in the three commits.

If one side of the merge doesn't show any change to path/to/file1, that means the merge result J was no different from the input (I or G). Git calls this "TREESAME". If the merge J, after being stripped down to this one file, matches I or G similarly stripped-down, then J is TREESAME to I or G (or perhaps both). In this case, Git picks the, or any one of the, TREESAME parent(s) and looks only at that path. Let's say it picks I (along the top row) rather than G (along the bottom).

What this means in our case is that if someone dropped the ball during a merge, losing a change that was supposed to come in to J from F, git log never shows it. The log command looks at K, then J, then looks at but drops G, and looks only at I, then H, then E, and then any earlier commits. It never looks at commit F at all! So we don't see the change to path/to/file1 from F.

The logic here is simple; I'll quote the git log documentation directly but add some emphasis:

[Default mode] Simplifies the history to the simplest history explaining the final state of the tree. Simplest because it prunes some side branches if the end result is the same ...

Since the changes in F were dropped, Git declares them to be irrelevant. You don't need to see them! We'll just ignore that side of the merge entirely!

You can defeat this completely with --full-history. That tells Git not to prune either side of a merge: it should look down both histories. This will find commit F for you. Adding -m -p should also find where the changes were dropped, since it will find all commits that touch the file:

git log -m -p --full-history -- path/to/file1

If the changes were there (in commit F in our example) and are no longer, there are only two ways they were lost:

  • They were reverted (either with git revert, or manually) in an ordinary commit. You would see this as a commit that touches path/to/file1 in the history even without -m; -p will show you the diff.

  • Or, they were lost by being dropped during a merge. You would see the merge even without -m, but not know for sure that whoever did the merge dropped the ball here; but -m -p will show both parent diffs, including the one that should have (but did not) take the change.

Bittner answered 12/5, 2017 at 3:0 Comment(9)
Thanks for the info, this is tremendous, and I'm sure it will be helpful to others as well. What do you mean by "dropped the ball during the merge"? How does that happen exactly? What I mean is, what command or commands could be executed that have the effect which is viewed in retrospect as "dropping the ball". It does appear that the state I'm seeing is the result of a merge which was failed or messed up in some way.Slipshod
Well, suppose there's a conflict in file orders.txt: Alice added a line saying that widgets can be painted green as well as beige, and Bob added a line saying that widgets can be ordered made of either plastic or wood. When Carol merged the two, which conflicted, she didn't realize that both changes were true, so she just kept the thing about color. Exactly how isn't all that important (maybe it was through git checkout --theirs, maybe it was just a screwup in the editor while merging). Git itself doesn't care either. It assumes whoever manually merges, got it right.Bittner
"The git show command does something different and better." If I merge two branches, I think I would prefer the default to be the exact opposite: to show only the changes that did not occur on both branches. If both branches fix the same thing, I'm curious how that would be "interesting" to either of the development flows. Do you have some examples / clarifications in that regard that you could perhaps add to your (excellent btw!) answer?Verbid
@levantpied: A combined diff shows files (and diff-hunks, depending on --cc vs -c) that, in Git's opinion, "differ from all parents". That's pretty close to what you just suggested. The actual implementation is a little cheesy, for performance reasons: Git can easily tell if some file is the same in two commits because it has the same hash in both, so this is based on hash comparisons.Bittner
Got that, I was just curios why you think the default (i.e. to show only common changes) is better. Consider: I'm working on branch A and merge in branch B. I think I would always want to see all changes on B except the changes that I made as well on branch A. Contrary to your assessment, the "show identical changes only" default in my mind seems like the worst option, as it gives you no useful information ("somebody changed the same thing") and omits the useful information ("how will the other changes play together after merge"). What cases am I missing in favor of the current default?Verbid
@levantpied: I'm not sure what you are suggesting at this point. I noted that git log simply shows nothing by default, i.e., it's a merge, this is too hard, I'm skipping on... git show shows something by default, and I say that this is better than doing nothing. Whether Git's combined diff is the best thing is another question entirely. But in practice, using git show -m (which is the other obvious possible default for it) seems too verbose.Bittner
If you mean: it would be nice if git show had a mode that, in effect, subtracted one diff from another diff ... well, that might be true, but it doesn't, and you'd have to define precisely how this would work. Its existing combined diffs are as close as it gets to doing that.Bittner
@Bittner That explains the problem under the hood. But such drops happen 'only' with a merge conflict? If so, can we assume that such a problem can be avoided on careful merge? Also in that case, why does the conflict resolutions doesn't showing the deletion/dropping of the changes that we there in the branch earlier?Schuster
@KannanRamamoorthy: Every Git commit holds a full snapshot of all of the files. That's true regardless of whether the commit is a non-merge (ordinary) commit, or a merge (two or more parents) commit. Someone could discard your code in a non-merge commit—but when you use git log -p or git show, you'd see them dropping your code and using their code instead. It's more of a problem with merges because git show shows combined diffs, and git log -p shows nothing, so that you can't see it when they drop your code at a merge.Bittner

© 2022 - 2024 — McMap. All rights reserved.