How to find whether a given commit is on the first-parent chain of a branch
Asked Answered
A

3

6

I'm trying to script a way to tell if a given commit is on the first-parent chain of a given branch. So, for example, merge-base won't fly, as the commit might have been merged in. I want to know whether the exact commit was ever the tip of the branch.

Note: The branch in question is subject to a no-fast-forward merge strategy.

Afghan answered 28/3, 2017 at 20:6 Comment(0)
P
3

The fancy way

A simple "is ancestor" test obviously won't do, as commits down 2nd-or-later parent chains are also ancestors:

...o--o--A--o--o--o--T
    \            /
 ...-o--*--B----o
         \
          C

Both A and B are ancestors of T, but you want to accept A while rejecting B and C. (Assume --first-parent is the top line here.)

Using git merge-base, however, will actually do part of the job. You don't need the --is-ancestor mode of git merge-base, though, and do need some additional processing.

Note that, regardless of the path between T and some ancestor, the merge base of T and that ancestor (such as A or B) is either the ancestor itself (A or B respectively here), or some ancestor of the ancestor, such as commit * if we look at T and C as a pair. (This holds even in the case of multiple merge bases, although I leave constructing a proof of that to you.)

If the, or any arbitrarily chosen one of the set of all, merge base(s) of the test commit and the branch-tip is not already the test-commit, we have a case like C and can reject it out of hand. (Or, we can use --is-ancestor to reject it, or ... well, see below.) If not, we must enumerate the commits in the ancestry path between the commit in question and the branch tip. For A this is:

         o--o--*--T

and for B this is:

               *--T
              /
             o

If any such commit is a merge commit, as is the one marked *, we need to make sure that the first parent includes one of the commit(s) listed along this path. The toughest cases are those topologically similar to:

       o--o
      /    \
...--A      o--T
      \    /
       o--o

since the --ancestry-path between these includes a merge and two ways to reach A, one of which is a first-parent path and one of which is not. (This is true if T itself is a merge as well.)

We don't actually need to find the merge base in the first place, though. We're only using the merge base in order to examine the ancestry path. If the merge base is not the test commit itself, then the test commit is not an ancestor of the tip commit, and testcommit..tipcommit will not include testcommit itself. Moreover, adding --ancestry-path—which discards all commits that are not themselves children of the left hand side here—will then discard all commits in the git rev-list output: a case like C has no descendants that are ancestors of T (if it did, C would be a merge base).

Hence, what we want is to examine the commits in git rev-list --ancestry-path testcommit..branchtip. If this list is empty, the test commit is not an ancestor of the branch tip in the first place. We have a case like commit C; so we have our answer. If the list is non-empty, reduce it to its merge components (run again with --merges, or feed the list to git rev-list --stdin --merges, to produce the shrunken list). If this list is non-empty, check each merge by finding its --first-parent ID and making sure the result is in the first list.

In actual (albeit untested) shell-script code:

TF=$(mktemp) || exit 1
trap "rm -f $TF" 0 1 2 3 15
git rev-list --ancestry-path $testcommit..$branch > $TF
test -s $TF || exit 1  # not ancestor
git rev-list --stdin --merges < $TF | while read hash; do
    parent1=$(git rev-parse ${hash}^1)
    grep "$parent1" $TF >/dev/null || exit 1 # on wrong path
done
exit 0 # on correct path

The brute-force way

The above tests as few commits as possible, but it would be a lot more practical, in a sense, to just run:

git rev-list --first-parent ${testcommit}^@..$branch

If the output includes $testcommit itself, then $testcommit is reachable, by first-parent only, from branch. (We use ^@ to exclude all parents of $testcommit so that this works even for the root commit; for other commits, ${testcommit}^ suffices since we're using --first-parent.) Moreover, if we make sure this is done in topological order, the last commit ID emitted from the git rev-list command will be $testcommit itself if and only if $testcommit is reachable from $branch. Hence:

hash=$(git rev-parse "$testcommit") || exit 1
t=$(git rev-list --first-parent --topo-order $branch --not ${hash}^@ | tail -1)
test $hash = "$t"

should do the trick. The quotes around $t are in case it expands to the empty string.

Parang answered 28/3, 2017 at 21:48 Comment(7)
The brute-force way was my first run at it as well. I really expected this to work: git rev-list --first-parent $hash --not "$hash^@" $branch (that is, if it produces a result, then it's not on the chain). But no. I'm reading about --topo-order, but I don't see how it really helps here. It looks to me like the final element will always be the hash of interest (or not). I was really hoping somebody would know a clever way to do this that was fast and understandable. But alas.Afghan
What --topo-order guarantees is that Git will not sort commits according to commit-date, but rather will keep them topologically sorted. This used to be clearer before someone fiddled with the Git source code to try to speed it up: you could post-date a commit to 2038 and it would always be printed first, regardless of where it was in the history.Parang
It's not clear why --first-parent $hash ^$hash^@ ^$branch does not work right, but I have seen this myself as well. (As you say, logically, it should emit nothing if and only if $hash is reachable via --first-parent arcs from $branch, else emit $hash.)Parang
I could not get the syntax with ${hash}^@..$branch to work with git 2.28, and when reading the git-rev-parse man page I come to the conclusion that this is not a legal revision range. I ended up using $branch --not ${hash}^@ and edited your answer, I hope this is fine with you...Larena
The ^@ trick should always work - but note that Windows cmd.exe requires doubling the ^ character.Parang
I regard the --first-parent --ancestry-path as bugged because it does not constrain the ancestry path to first parents, only the steps on that path (which excludes the base). So the base commit can be the second parent of the last first-parent commit on the path. But it's in a core command, I don't think it's likely to get fixed because that might break working scripts. ( set -- `git rev-list --first-parent --ancestry-path --parents $base..$tip | tail -1`; [[ $2 = `git rev-parse $base` ]] )Moitoso
@jthill: interesting ... yes, it does seem as though --ancestry-path --first-parent should limit the walk that way.Parang
A
2

The no-fast-forward strategy means you can probably grep in git log --first-parent. You probably want only the hashes, so you can use git rev-list instead

git rev-list --first-parent | grep <commit hash>

Otherwise use --format with git log to display the data you want.

Edit: This post could give you some ideas

How can I tell if one commit is an ancestor of another commit (or vice-versa)?

Abortive answered 28/3, 2017 at 20:55 Comment(1)
This is a viable solution, like my brute-force method, only even more brute-y (we look at all reachable first-parent commits rather than trimming off some obviously-too-early ones).Parang
D
2

This is a performance friendly one-liner:

git rev-parse HEAD~"$( git rev-list --count --first-parent --ancestry-path <commit>..HEAD )"

If the output is your <commit>, then it's a first-parent ancestor.

The idea is that we measure the shortest path between the two commits with rev-list --count --ancestry-path, then get the commit at this position in the first-parent chain. Obviously, these must be the same if the inspected commit was a first-parent ancestor. Suppressed errors (e. g. first-parent chain is too short) are irrelevant.

To make it more sophisticated, you could rather create a git alias backed by a pretty readable shell script.

First write the script file:

#!/bin/sh

ref="$1"
head="$2"
if [ -z "$head" ]; then
    head="HEAD"
fi

commit=$( git rev-parse "$ref"^{commit} )
distance="$( git rev-list --count --ancestry-path --first-parent "$commit".."$head" )"
found="$( git rev-parse HEAD~"$distance" )"

if [ "$commit" != "$found" ]; then
    echo "${ref} is not a first-parent ancestor of ${head}"
    exit 1
fi

echo "${ref} is a first-parent ancestor of ${head} at a distance of ${distance}"
exit 0

Save it to an appropriate location on your system, make it executable, then set it as a git alias:

git config --global alias.fp '!<script-path>'

Replace fp with anything what's more comfortable for you. Substitute <script-path> with your script file's location, but keep the ! character, it's necessary to use external files.

After this, you can use the new alias like a normal git command:

$ git fp 66e339c
66e339c is a first-parent ancestor of HEAD at a distance of 45
Dolph answered 22/8, 2021 at 14:13 Comment(3)
Sweet. Add --first-parent to the rev-list to constrain the path to first parents, that won't trace through gobs of merged history.Moitoso
No, you still have to check that the base is also a first parent, see my comment to torek's answer, I regard the combination as bugged in a way that'll likely be too painful to fix. But testing HEAD~$count against the base is quicker and cleaner than the way I'd come up with.Moitoso
@Moitoso Ok, I misunderstood your suggestion. Updated the answer.Sociability

© 2022 - 2024 — McMap. All rights reserved.