Find Git commits that contain multiple specific commits
Asked Answered
B

3

10

General problem: Given a set of commits, how do I find the list of commits that have all those commits as ancestors, or relatedly, the first commit(s) that contain all those commits.

I can find branches (similarly tags) that contain the commits by looking for branches that are returned by git branch --contains <commit> for all the commits in the set, but git rev-list doesn't have a --contains option. Effectively, I'm looking for a way of combining the regular --contains arguments with git rev-list, and limiting the output to commits that contain all the listed commits, not any one of them (which is how --contains works normally).

Specific example: Given commits a, b, c, how can I find the first commit that has all three commits in its ancestry?

For example, given the below tree, how do I find the commit marked X?

* (master)
|
X
|\
a *
| |
b c
|/
*
|
*

I assume there's some magic I can do with git rev-list, and possibly involving the <commit1>...<commit2> notation, but I can't work out further than that.

Barman answered 18/12, 2012 at 17:59 Comment(7)
I can't think of an easy (efficient) way to do this, short of generating a list of all merge commits, and testing each one individually to see if each of the commits in question are reachable from there. Could be scripted relatively easily, but it would be slow. I think a recent (i.e. 1.8+) version of git added a --contains option in a few places that might make this quite a bit easier.Demonstrable
Do b and c belong to different branches ?Hod
@ShadyKiller: In the specific example, yes; in general, no. All three may be in the same branch (in which case the answer would just be whichever commit is newest), or different branches. Hell, there may be more or fewer than three commits; that was a relatively arbitrary number.Barman
First: I am pretty sure that git does not contain such a functionality. But this could be scripted with a runtime somewhere around O(n), with n being the number of commits in your repo. But why do you need this? And do you realize that this might even have multiple answers?Innermost
@Chronial: The question comes from trying to find places where some of the roots of the Git repository were merged together, as I wanted to see how that was done. Plus I thought it was an interesting question generally.Barman
@Chronial: And yes, I'm aware there may be multiple "earliest" commits that have all the listed commits as parents; that's why I wrote "the first commit(s) that contain all these commits" :)Barman
I think you're looking for this: Find the merge commit that brought in a given commit: #8475948Legislator
I
2

I guess the answer to that question is that git was not made for this. Git really doesn’t like the idea of “children of a commit”, and there is a very good reason for that: it’s not very well defined. Because a commit doesn’t know a about its children it’s a very vague set. You might not actually have all the branches in your repo and so are missing some children.

Gits internal storage structure also makes finding the children of a commit a rather expensive operation, as you have to walk the revision graph of all heads to either their corresponding roots or till you saw all the commits whose children you want to know about.

The only concept of that kind that git supports is the idea of one commit containing another commit. But this feature is only supported by very few git commands (git branch being one of them). And where git supports it, it does not support it for arbitrary commits, but only branch heads.

This all might seem like a rather harsh limitation of git, but in practice it turns out that you don’t need the “children” of a commit but usually only need to know which branches contain a specific commit.


That all said: If your really want to get the answer to your question, you will have to write your own script that finds it. The easiest way to go by this is to start with the output of git rev-list --parents --reverse --all. Parsing that line by line, you would build a tree, and for each node mark whether it is a child of the commits you are looking for. You do this by marking the commits themselves once you meet them and then carrying that property on to all their children and so on.

Once you have a commit that’s marked as containing all the commits, you add it to your “solution list” and mark all its children as dead – they can’t contain any first commits any more. This property will then also be passed on to all its descendants.

You can save a bit of memory here if you don’t store any parts of the tree that don’t contain any of the commits you asked for.


edit Hacked some python code

#!/usr/bin/python -O
import os
import sys

if len(sys.argv) < 2:
    print ("USAGE: {0} <list-of-revs>".format([sys.argv[0]]))
    exit(1)

rev_list = os.popen('git rev-list --parents --reverse --all')

looking_for = os.popen('git rev-parse {0}'
                       .format(" ".join(sys.argv[1:]))).read().splitlines()
solutions = set()
commits = {}

for line in rev_list:
    line = line.strip().split(" ")
    commit = set()
    sha = line[0]
    for parent in line[1:]:
        if not parent in commits:
            continue
        commit.update(commits[parent])
        if parent in solutions:
            commit.add("dead")
    if sha in looking_for:
        commit.add(sha)
    if not "dead" in commit and commit.issuperset(looking_for):
        solutions.add(sha)
    # only keep commit if it's a child of looking_for
    if len(commit) > 0:
        commits[sha] = commit

print "\n".join(solutions)
Innermost answered 20/12, 2012 at 12:15 Comment(0)
A
1

One possible solution:

Use 'git merge-base a b c' to get the commit to use as the starting point in a call to rev-list; we'll call it $MERGE_BASE.

Use 'git rev-list $MERGE_BASE..HEAD' call to list all commits from their common ancestor to HEAD. Loop through this output (pseudocode):

if commit == a || b || c
  break
else 
  $OLDEST_DESCENDANT = commit
return $OLDEST_DESCENDANT

This will work for your example above, but will give a false positive if they have never been merged, were not merged in the commit immediately following the youngest of a,b,c, or if there were multiple merge commits to bring together a,b, and c (if they each resided on their own branch). There's a bit of work left to find that oldest descendant.

You then should follow the above with something starts with $OLDEST_DESCENDANT and proceeds backwards in the DAG from it towards HEAD (rev-list --reverse $OLDEST_DESCENDANT~..HEAD), testing to see that the output of 'rev-list $MERGE_BASE~..$OLDEST contains all desired commits a, b, and c (maybe there's a better way to test that they are reachable than rev-list, though).

As twalberg mentions, testing the commits individually like this seems less than optimal and slow, but it's a start. This approach has the advantage over his merge commit list method in that it will provide a valid response when all the input commits are on the same branch.

Performance would be affected mostly by the distances between the merge base, head, X, and the youngest of the desired commit set (a, b, and c).

Archibald answered 19/12, 2012 at 1:33 Comment(1)
This looks good, I've just not had chance to sit down, write the pseudocode properly, and see what happens.Barman
H
-1

How about :

MERGE_BASE=`git merge-base A B C`
git log $MERGE_BASE...HEAD --merges

Assuming you have only 1 merge. Even if you have more merges, the oldest one is the one containing changes from all three commits

Hod answered 19/12, 2012 at 16:35 Comment(2)
This only works in very simple scenarios, if the revision graph has a serious complexity (one that actually requires such a command), you just get a smaller list of all the possible merges that could be the one. And the commit your looking for is not even necessarily a merge, but might be one of the listed ones.Innermost
You did not need to give me a -1 still :( . I was atleast partially correctHod

© 2022 - 2024 — McMap. All rights reserved.