The git rev-list
command is a very complicated, very central command in Git, as what it does is walk the graph. The word graph here refers to both the commit graph itself, and in some cases, the next level down (Git objects reachable from commits).
I figured that rev-list shows commits in reverse chronological order.
Not exactly, but close:
- The order is changeable. The default is reverse-chronological.
- The default is to walk some commits, but you can get
rev-list
to go deeper so as to include tree and blob objects and even tag objects. This is for programs like git fetch
and git push
(which invoke git pack-objects
) and git pack-objects
. I plan to ignore this possibility entirely here, but I feel I should at least mention it. 😀
So the default is to list some commits in reverse chronological order. It is both important, and a little bit tricky, to specify exactly which parts of the graph we will have git rev-list
walk: the some in some commits.
But, can someone share more insight on what --not
and --all
options are meant for?
As VonC notes, the effect here is to list commits that are new to the receiving repository. This depends on the fact that this git rev-list
command is running in a pre-receive hook. It generally doesn't do anything useful outside this particular hook. Thus, as you can see, a hook's run-time environment, in Git, is often at least a little bit special. (This is true for more than just the pre-receive hook: one must think about each hook's activation context.)
More about --not --all
The --all
option does just what you quoted from the documentation:
Pretend as if all the refs in refs/
are listed on the command line ...
So this does the equivalent of a git for-each-ref refs
: it loops over each reference. That includes branch names (master
or main
, develop
, feature/tall
, and so on, all of which are really in refs/heads/
), tag names (v1.2
which is really refs/tags/v1.2
), remote-tracking names (origin/develop
which is really refs/remotes/origin/develop
), replacement refs (in refs/replace/
), the stash (refs/stash
), bisection refs, Gerrit refs if you're using Gerrit, and so on. Note that it does not loop over reflog entries.
The --not
prefix is a simple boolean operation. In the gitrevisions syntax—see the gitrevisions documentation—we can write things like develop
, meaning I tell you to start from develop
and work backwards and include these commits, but also things like ^develop
, meaning I tell you to start from develop
and work backwards and exclude these commits. So if I write:
git rev-list feature1 feature2 ^main
I am asking Git to walk commits reachable from the commits identified by the names feature1
and feature2
, but to exclude commits reachable from the commits identified by main
. For (much) more about the general idea of reachability and graph-walking, see Think Like (a) Git.
The --not
operator effectively flips the ^
on each ref:
git rev-list --not feature1 feature2 ^main
is shorthand, as it were, for:
git rev-list ^feature1 ^feature2 main
This walks the list of commits reachable from main
, but excludes those reachable from either feature1
or feature2
.
Usually all commits are findable with --all
If you are using Git in the normal everyday way, and don't have a "detached HEAD" at the moment—detached HEAD mode is not exactly abnormal but it's not the usual way to work—the --all
option to git rev-list
tells it to include all commits, because all commits are reachable from all references.1 So --not --all
effectively excludes all commits. So adding --not --all
to any git rev-list
that would otherwise list some commits has the effect of inhibiting the list. The output is empty: why did we bother?
If you are in detached HEAD mode and have made several new commits—this can happen when you are in the middle of an interactive or conflicted rebase, for instance—then git rev-list HEAD --not --all
would list those commits that are reachable from HEAD
but not from any branch name. In that rebase, for instance, that would be just those commits that you have copied so far.
So "detached HEAD" mode would be once place where git rev-list --not --all
could be useful from the command line. But for the situation you're examining—a pre-receive hook—we're not really on the command line.
Pre-receive hooks
When someone uses git push
to send commits to your own Git, your Git:
- sets up a quarantine area to hold any new objects (new commits and blobs and so on);1
- negotiates with the sender to decide what the sender should send;
- receives these objects; and
- takes a list of ref update requests. These update requests essentially just say make this name hold this hash ID.2
Before actually doing any of the requested updates, your Git:
- Feeds the entire list to the pre-receive hook. That hook can say "no"; if so, the entire push, as a whole, is rejected.
- If that says "ok", feeds the list, one request at a time, to the update hook. When that hook says "ok", does the update. If the hook says "no", your Git rejects the one update, but goes on to examine others.
- After all updates are accepted or rejected in step 2, feeds the accepted list to the post-receive hook.
Objects that are needed, that were added to some ref in step 2, are moved from quarantine to Git's object database. Those that were rejected are not.
Now, think about a typical git push
. We get some new commit(s) and a request: create a new branch name feature/short
, or we get some new commit(s) and a request: update existing branch name develop
to include these new commits, along with the old ones.
In step 1 above, we have a single new hash ID. We ran a loop to read all the ref names, and their current and proposed-new hash IDs, and the loop ran only once, because only one name was being git push
-ed. That hash ID refers to the new commit or commits, that will either be added to this existing branch, or be the tip and other commits that are exclusive to the new branch.
We'd now like to inspect these commits, and not any of the existing commits that are reachable from any existing branch. For simplicity, rather than $new_list
in my other answer, let's suppose we just the one new hash ID, $new
, and the old hash ID for the branch name, $old
: all-zeros if the branch is all-new, or some valid existing commit if it's an existing branch name.
If the new commits are on a completely new branch, then:
git rev-list $new ^master ^develop ^feature/short ^feature/tall
would cover them, for instance, if we knew that the only existing branches were these four (and that there are no tags etc to worry about). But what if they're being added to, say, develop
? Then we'd like to exclude the commits that are currently on develop
. We could use the $old
hash ID to do that:
git rev-list $new ^master ^$old ^feature/short ^feature/tall
That would again list only the new commits that whoever is running git push origin develop
wants to add to our develop
.
But think about $old
. This is a hash ID. Where did Git get it? Git got this hash ID from the name develop
. This is a pre-receive hook; the name develop
has not been updated yet. So the name develop
is a name for the old hash ID $old
. That means:
git rev-list $new ^master ^develop ^feature/short ^feature/tall
will also do the job.
If git rev-list $new
followed by "and not all existing" will do the job, then:
git rev-list $new --not --branches
will do the job. That's almost what we have here.
The bug with just using --branches
is that it doesn't get any tags, or other refs. We could use --not --branches --tags
but --not --all
is shorter and also gets all other refs.
So this is where --not --all
comes from: it depends on the special case of a pre-receive hook. We list the new hash IDs, as proposed by whoever is running a git push
, that our Git has passed to us as a list of lines. We have git rev-list
walk the proposed-to-be-updated commit graph, looking at the new commits in the quarantine area, but excluding all the commits that are already in our repository. The rev-list command produces these hash IDs, one per line, which we then read in a shell loop, and do whatever we like to inspect each commit.
1The quarantine area was new in Git 2.11. Prior to that, new objects could remain in the repository for a while, even if the push is rejected. The quarantine area isn't really that big a deal for most people, but for big servers like GitHub, it can save them a lot of disk space.
2The request can be forced or not-forced, and if forced, could be a force-with-lease, or not. This information is not available in the pre-receive hook (nor in the update hook), which is, um, let's just say not so great, but there are compatibility issues with adding it. It's all livable, mostly, though. The hook can tell if it's a create new ref or delete existing ref request because if so, one of the two hash IDs—old or new—will be the all-zeros "null hash" (which is reserved; no hash ID is allowed to be all-zeros).