Ordering of git-rev-list
Asked Answered
U

2

7

How does git-rev-list order the commits that it returns?

I am mostly referring to commits that come in on a concurrent branch of development and then is merged into the main branch. It doesn't seem that the commits are ordered with respect to date, which makes sense because commits can be cherry-picked from various times in the past or future.

For instance, here is some history from git-log...

*   Sat, 25 Aug 2012 11:37:23 -0700 8238401
|\  
| * Thu, 23 Aug 2012 12:29:09 -0700 c9de861
* |   Fri, 24 Aug 2012 16:29:01 -0700 b7e8827
|\ \  
| * | Mon, 14 May 2012 20:46:30 +0200 0a1db74
| * | Mon, 14 May 2012 17:54:25 +0200 e03e71d
| * | Fri, 13 Jul 2012 12:01:11 +0200 bffa852
* | |   Fri, 24 Aug 2012 15:45:13 -0700 09fad50
|\ \ \  
| * | | Fri, 24 Aug 2012 12:19:22 -0700 97a17e4
| * | | Thu, 9 Aug 2012 19:43:25 -0700 5f4a61a
| * | | Fri, 3 Aug 2012 14:28:07 -0700 0c8858d
| * | | Thu, 2 Aug 2012 13:00:58 -0700 aa13bf0
| * | | Wed, 18 Jul 2012 14:30:15 -0700 decff7b
* | | |   Fri, 24 Aug 2012 15:43:19 -0700 091c742

Here is the output of the same history via rev-list.

$ git rev-list HEAD --max-count=13
8238401ccb9f7018c927866896bea583d351ad2a # 1 root
c9de8611d6a3e77757a714cdf6acf46178b1d622 # 2 descends into the second parent
b7e8827b8bbca0c69d85be34cc4a88888c1152f2 # 3 first parent of root
09fad5069636fb2e8cacf15817834e3d32ff6b8e # 4 descends into the first parent
091c742af985cc78711727ca06a24ae42b376fae
7fbca880aa5c011257ef734d0b5bfd5545dbaf6b
07c06f7a83640e11d6be13a87f02e986ecc6e4b3
1168410426293aef8ce33becb277ff225595e183
97a17e4e9fa5cafa531ff79cb88a9ee5c224a613
0a1db746fbcaf09681e446250f75581cc8f8fd05
e03e71da56608f60770eb80767dcd94e698cdcae
5f4a61aea834fe25ce1596bc9c0e0b5e563aa98b
0c8858de8c82bae3fd88513724689a07d231da7e

How does the rev-list command decide whether to list a first-parent or descend into a n-th-parent's commit graph? For instance, above after looking at (1), the rev-list descends into the second parent (2). However, after looking at (3) it descends into the first parent (4). Is this behavior well-defined?

Upanchor answered 5/11, 2012 at 0:38 Comment(0)
F
5

By default, the commits are ordered in reverse chronological order. You can get output in a different ordering depending on the options you pass. See the Commit Ordering section in the git-rev-list man page for the other options.

git log orders in reverse chronological order by default, too. However, when you run it with --graph it implies --topo-order.

Lastly, the commit ordering by date is done by commit date, but the default output of git log displays author date. With patches, cherry-picks, and rebases, those two can get out of sync.

Those last two points should explain why your two outputs are ordered differently, and why on the surface git rev-list isn't ordering by date.

Fiddlewood answered 5/11, 2012 at 1:5 Comment(2)
Thanks, that makes sense. I was not displaying the commit date, indeed.Upanchor
What does "reverse chronological order" mean? Does it use the commit timestamp or the author timestamp? Is it the same as --date-order?Musso
T
2

Is this behavior well-defined?

With Git 2.34 (Q4 2021), the revision traversal API has illustrated by two commits:

  • one by taking advantage of the commit-graph, when available, to determine if a commit is reachable from any of the existing refs.
  • one cancelling/reverting that change

Both illustrates how git-rev-list works:

See commit f559d6d, commit 809ea28, commit bf9c0cb, commit f45022d (09 Aug 2021), and commit 29ef1f2 (05 Aug 2021) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit a5619d4, 03 Sep 2021)

connected: do not sort input revisions

Signed-off-by: Patrick Steinhardt

In order to compute whether objects reachable from a set of tips are all connected, we do a revision walk with these tips as positive references and --not --all.
--not --all will cause the revision walk to load all preexisting references as uninteresting, which can be very expensive in repositories with many references.

Benchmarking the git-rev-list command highlights that by far the most expensive single phase is initial sorting of the input revisions: after all references have been loaded, we first sort commits by author date.
In a real-world repository with about 2.2 million references, it makes up about 40% of the total runtime of git-rev-list.

Ultimately, the connectivity check shouldn't really bother about the order of input revisions at all.
We only care whether we can actually walk all objects until we hit the cut-off point.
So sorting the input is a complete waste of time.

Introduce a new "--unsorted-input" flag to git-rev-list which will cause it to not sort the commits and adjust the connectivity check to always pass the flag.
This results in the following speedups, executed in a clone of gitlab-org/gitlab:

Benchmark #1: git rev-list  --objects --quiet --not --all --not $(cat newrev)
  Time (mean ± σ):      7.639 s ±  0.065 s    [User: 7.304 s, System: 0.335 s]
  Range (min … max):    7.543 s …  7.742 s    10 runs

Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev
  Time (mean ± σ):      4.995 s ±  0.044 s    [User: 4.657 s, System: 0.337 s]
  Range (min … max):    4.909 s …  5.048 s    10 runs

Summary
  'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran
    1.53 ± 0.02 times faster than 'git rev-list  --objects --quiet --not --all --not $newrev'

Note that not all refs are visible to clients.

rev-list-options now includes in its man page:

--unsorted-input

Show commits in the order they were given on the command line instead of sorting them in reverse chronological order by commit time. Cannot be combined with --no-walk or --no-walk=sorted.

rev-list-options now includes in its man page:

Cannot be combined with --graph. Cannot be combined with --unsorted-input if sorted or no argument was given.

And with the same Git 2.34 (Q4 2021), a regression fix reverts the optimization above:

See commit a7df4f5 (11 Nov 2021) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit 8996d68, 12 Nov 2021)

Revert "connected: do not sort input revisions"

This reverts commit f45022d (connected: do not sort input revisions, 2021-08-09, Git v2.34.0-rc0 -- merge listed in batch #3), as this is like breakage in the traversal more likely.
In a history with 10 single strand of pearls,

1-->2-->3--...->7-->8-->9-->10

asking "rev-list --unsorted-input 1 10 --not 9 8 7 6 5 4(man) fails to paint the bottom 1 uninteresting as the traversal stops, without completing the propagation of uninteresting bit starting at 4 down through 3 and 2 to 1.


2018:

Git 2.16 (Q1 2018) will allow git describe to give an object a human readable name based on an available ref when used as git describe <blob>.
(See more at "Which commit has this blob?")

In that context, git rev-list adds a new order.
See commit ce5b6f9 by Stefan Beller (stefanbeller).

revision.h: introduce blob/tree walking in order of the commits

The functionality to list tree objects in the order they were seen while traversing the commits will be used in one of the next commits, where we teach git describe to describe not only commits, but blobs, too.

That means the git rev-list man page has a new object traversal order:

--in-commit-order::

Print tree and blob ids in order of the commits.
The tree and blob ids are printed after they are first referenced by a commit.


You have one more parameter which influences git-rev-list: GIT_COMMIT_GRAPH_PARANOIA.

Before Git 2.44 (Q1 2024), we stopped relying on commit-graph that (still) records information about commits that are lost from the object store, which has negative performance implications.
The default has been flipped to disable this pessimization.

See commit b1df3b3 (24 Nov 2023) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 66685e8, 18 Dec 2023)

commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default

Reported-by: Jeff King
Signed-off-by: Patrick Steinhardt

In 7a5d604 ("commit: detect commits that exist in commit-graph but not in the ODB", 2023-10-31, Git v2.43.0-rc1 -- merge), we have introduced a new object existence check into repo_parse_commit_internal() so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database.
This new check of course comes with a performance penalty, which the commit put at around 30% for git rev-list --topo-order(man).

Default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit.
In order to not be inconsistent, also disable this behaviour by default in lookup_commit_in_graph(), where the object existence check has been introduced right at its inception via f559d6d ("revision: avoid hitting packfiles when commits are in commit-graph", 2021-08-09, Git v2.34.0-rc0 -- merge listed in batch #3).

git now includes in its man page:

GIT_COMMIT_GRAPH_PARANOIA:

When loading a commit object from the commit-graph, Git performs an existence check on the object in the object database. This is done to avoid issues with stale commit-graphs that contain references to already-deleted commits, but comes with a performance penalty.

The default is "false", which disables the aforementioned behavior. Setting this to "true" enables the existence check so that stale commits will never be returned from the commit-graph at the cost of performance.

Tisbee answered 29/12, 2017 at 20:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.