With Git 2.25 (Q1 2020), you can see an illustration of what "git name-rev
" does.
See commit 2866fd2, commit 49f7a2f, commit fee984b (09 Dec 2019), and commit 8c5724c, commit 3a52150, commit dd432a6, commit dd090a8, commit 766f9e3, commit d59fc83, commit bf43abc, commit e0c4da6, commit c593a26, commit d91ce88 (12 Nov 2019) by SZEDER Gábor (szeder
).
See commit c3794d4 (12 Nov 2019) by René Scharfe (``).
(Merged by Junio C Hamano -- gitster
-- in commit f3c520e, 25 Dec 2019)
t6120
: add a test to cover inner conditions in 'git name-rev's name_rev()
Signed-off-by: SZEDER Gábor
In 'builtin/name-rev.c
' in the name_rev()
function there is a loop iterating over all parents of the given commit, and the loop body looks like this:
if (parent_number > 1) {
if (generation > 0)
// branch #1
new_name = ...
else
// branch #2
new_name = ...
name_rev(parent, new_name, ...);
} else {
// branch #3
name_rev(...);
}
These conditions are not covered properly in the test suite.
As far as purely test coverage goes, they are all executed several times over in 't6120-describe.sh
'.
However, they don't directly influence the command's output, because the repository used in that test script contains several branches and tags pointing somewhere into the middle of the commit DAG, and thus result in a better name for the to-be-named commit.
This can hide bugs: e.g. by replacing the 'new_name
' parameter of the first recursive name_rev()
call with 'tip_name
' (effectively making both branch #1
and #2
a noop) 'git name-rev --all
' shows thousands of bogus names in the Git repository, but the whole test suite still passes successfully.
In an early version of a later patch in this series I managed to mess up all three branches (at once!), but the test suite still passed.
So add a new test case that operates on the following history:
A--------------master
\ /
\----------M2
\ /
\---M1-C
\ /
B
and names the commit 'B
' to make sure that all three branches are crucial to determine 'B
's name:
There is only a single ref, so all names are based on 'master', without any undesired interference from other refs.
Each time name_rev()
follows the second parent of a merge commit, it appends "^2
" to the name. Following 'master's second parent right at the start ensures that all commits on the ancestry path from 'master
' to 'B
' have a different base name from the original 'tip_name
' of the very first name_rev()
invocation.
Currently, while name_rev()
is recursive, it doesn't matter, but it will be necessary to properly cover all three branches after the recursion is eliminated later in this series.
Following 'M2
's second parent makes sure that branch #2 (i.e. when 'generation = 0
') affects 'B
's name.
Following the only parent of the non-merge commit 'C
' ensures that branch #3 affects 'B
's name, and that it increments 'generation
'.
Coming from 'C
' 'generation' is 1, thus following 'M1
's second parent makes sure that branch #1 affects 'B
's name.
With Git 2.25 (Q1 2020), the goal for "git name-rev
" is to avoid recursive calls.
name-rev
: eliminate recursion in name_rev()
Signed-off-by: SZEDER Gábor
The name_rev()
function calls itself recursively for each interesting parent of the commit it got as parameter, and, consequently, it can segfault when processing a deep history if it exhausts the available stack space.
E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories results in segfaults on my machine ('ulimit -s
' reports 8192kB of stack size limit), and nowadays the former segfaults in the Linux repo as well (it reached the necessary depth sometime between v5.3-rc4 and -rc5).
Eliminate the recursion by inserting the interesting parents into a LIFO 'prio_queue
' 1 and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the LIFO 'prio_queue
', so their relative order is preserved during processing, i.e. the first parent should come out first from the queue, because otherwise performance greatly suffers on mergy histories 2.
The stacksize-limited
test 'name-rev works in a deep repo
' in 't6120-describe.sh
' demonstrated this issue and expected failure.
Now the recursion is gone, so flip it to expect success.
Also gone are the dmesg
entries logging the segfault of that segfaulting 'git name-rev
' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of 'git name-rev --all
', usually swapping two lines every 35 lines in git.git
or every 150 lines in linux.git
.
This shouldn't matter in practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space
'.
Early versions of this patch used a 'commit_list
', resulting in ~15% performance penalty for 'git name-rev --all
' in 'linux.git
', presumably because of the memory allocation and release for each insertion and removal.
Using a LIFO 'prio_queue
' has basically no effect on performance.
We prefer shorter names, i.e. 'v0.1~234
' is preferred over 'v0.1^2~5
', meaning that usually following the first parent of a merge results in the best name for its ancestors.
So when later we follow the remaining parent(s) of a merge, and reach an already named commit, then we usually find that we can't give that commit a better name, and thus we don't have to visit any of its ancestors again.
OTOH, if we were to follow the N
th parent of the merge first, then the name of all its ancestors would include a corresponding '^N
'.
Those are not the best names for those commits, so when later we reach an already named commit following the first parent of that merge, then we would have to update the name of that commit and the names of all of its ancestors as well.
Consequently, we would have to visit many commits several times, resulting in a significant slowdown.
With Git 2.26 (Q1 2020), the memory footprint and performance of "git name-rev
" has been improved.
That provides an additional insight of what git name-rev
does.
See commit 079f970 (05 Feb 2020), and commit 2d53975, commit 977dc19, commit 1c56fc2, commit ddc42ec, commit f13ca7c, commit d689d6d, commit 15a4205, commit 36d2419, commit 71620ca (04 Feb 2020) by René Scharfe (rscharfe
).
See commit 3e2feb0 (05 Feb 2020) by Martin Ågren (``).
(Merged by Junio C Hamano -- gitster
-- in commit 0460c10, 17 Feb 2020)
name-rev
: sort tip names before applying
Signed-off-by: René Scharfe
name_ref()
is called for each ref and checks if its a better name for the referenced commit.
If that's the case it remembers it and checks if a name based on it is better for its ancestors as well.
This in done in the the order for_each_ref()
imposes on us.
That might not be optimal.
If bad names happen to be encountered first (as defined by is_better_name()
), names derived from them may spread to a lot of commits, only to be replaced by better names later.
Setting better names first can avoid that.
is_better_name()
prefers tags, short distances and old references.
The distance is a measure that we need to calculate for each candidate commit, but the other two properties are not dependent on the relationships of commits.
Sorting the refs by them should yield better performance than the essentially random order we currently use.
And applying older references first should also help to reduce rework due to the fact that older commits have less ancestors than newer ones.
So add all details of names to the tip table first, then sort them to prefer tags and older references and then apply them in this order. Here's the performance as measures by hyperfine for the Linux repo before:
Benchmark #1: ./git -C ../linux/ [`git name-rev --all`](https://git-scm.com/docs/git-name-rev#Documentation/git-name-rev.txt---all)
Time (mean ± σ): 851.1 ms ± 4.5 ms [User: 806.7 ms, System: 44.4 ms]
Range (min … max): 845.9 ms … 859.5 ms 10 runs
... and with this patch:
Benchmark #1: ./git -C ../linux/ [`git name-rev --all`](https://git-scm.com/docs/git-name-rev#Documentation/git-name-rev.txt---all)
Time (mean ± σ): 736.2 ms ± 8.7 ms [User: 688.4 ms, System: 47.5 ms]
Range (min … max): 726.0 ms … 755.2 ms 10 runs
With Git 2.35 (Q1 2022), "git name-rev
"(man) has been tweaked to give output that is shorter and easier to understand.
See commit 3656f84 (04 Dec 2021) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit 3f9d505, 21 Dec 2021)
name-rev
: prefer shorter names over following merges
Signed-off-by: Elijah Newren
Acked-by: Ævar Arnfjörð Bjarmason
Acked-by: Johannes Schindelin
name-rev
has a MERGE_TRAVERSAL_WEIGHT
to say that traversing a second or later parent of a merge should be 65535 times more expensive than a first-parent traversal, as per ac076c2 ("name-rev
: Fix non-shortest description", 2007-08-27, Git v1.5.3-rc7 -- merge).
The point of this weight is to prefer names like
v2.32.0~1471^2
over names like
v2.32.0~43^2~15^2~11^2~20^2~31^2
which are two equally valid names in git.git for the same commit.
Note that the first follows 1472 parent traversals compared to a mere 125 for the second.
Weighting all traversals equally would clearly prefer the second name since it has fewer parent traversals, but humans aren't going to be traversing commits and they tend to have an easier time digesting names with fewer segments.
The fact that the former only has two segments (~1471
, ^2
) makes it much simpler than the latter which has six segments (~43
, ^2
, ~15
, etc.).
Since name-rev
is meant to "find symbolic names suitable for human digestion", we prefer fewer segments.
However, the particular rule implemented in name-rev would actually prefer
v2.33.0-rc0~11^2~1
over
v2.33.0-rc0~20^2
because both have precisely one second parent traversal, and it gives the tie breaker to shortest number of total parent traversals.
Fewer segments is more important for human consumption than number of hops, so we'd rather see the latter which has one fewer segment.
Include the generation in is_better_name()
and use a new effective_distance()
calculation so that we prefer fewer segments in the printed name over fewer total parent traversals performed to get the answer.
Warning: with Git 2.40 (Q1 2023), "git name-rev
"(man) has an heuristics update.
See commit b2182a8 (09 Feb 2023) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit 5fc6d00, 22 Feb 2023)
name-rev
: fix names by dropping taggerdate workaround
Signed-off-by: Elijah Newren
Commit 7550424 ("name-rev
: include taggerdate in considering the best name", 2016-04-22, Git v2.9.0-rc0 -- merge listed in batch #9) introduced the idea of using taggerdate in the criteria for selecting the best name.
At the time, a certain commit in linux.git
-- namely, aed06b9cfcab -- was being named by name-rev as
v4.6-rc1~9^2~792
which, while correct, was very suboptimal.
Some investigation found that tweaking the MERGE_TRAVERSAL_WEIGHT
to lower it could give alternate answers such as
v3.13-rc7~9^2~14^2~42
or
v3.13~5^2~4^2~2^2~1^2~42
A manual solution involving looking at tagger dates came up with
v3.13-rc1~65^2^2~42
which is much nicer.
That workaround was then implemented in name-rev.
Unfortunately, the taggerdate heuristic is causing bugs.
I was pointed to a case in a private repository where name-rev reports a name of the form
v2022.10.02~86
when users expected to see one of the form
v2022.10.01~2
(I've modified the names and numbers a bit from the real testcase.)
As you can probably guess, v2022.10.01 was created after v2022.10.02 (by a few hours), even though it pointed to an older commit.
While the condition is unusual even in the repository in question, it is not the only problematic set of tags in that repository.
The taggerdate logic is causing problems.
Further, it turns out that this taggerdate heuristic isn't even helping anymore.
Due to the fix to naming logic in 3656f84 ("name-rev
: prefer shorter names over following merges", 2021-12-04, Git v2.35.0-rc0 -- merge listed in batch #4), we get improved names without the taggerdate heuristic.
For the original commit of interest in linux.git, a modern Git without the taggerdate heuristic still provides the same optimal answer of interest, namely:
v3.13-rc1~65^2^2~42
So, the taggerdate is no longer providing benefit, and it is causing problems.
Simply get rid of it.
However, note that "taggerdate" as a variable is used to store things besides a taggerdate these days.
Ever since commit ef1e740 ("name-rev
: favor describing with tags and use committer date to tiebreak", 2017-03-29, Git v2.14.0-rc0 -- merge listed in batch #4), this has been used to store committer dates and there it is used as a fallback tiebreaker (as opposed to a primary criteria overriding effective distance calculations).
We do not want to remove that fallback tiebreaker, so not all instances of "taggerdate
" are removed in this change.
With Git 2.45 (Q2 2024), batch 4, many small allocations "git name-rev
"(man) makes have been updated to allocate from a mem-pool.
See commit f39addd, commit 8d25663 (25 Feb 2024) by René Scharfe (rscharfe
).
(Merged by Junio C Hamano -- gitster
-- in commit b511164, 05 Mar 2024)
name-rev
: use mem_pool_strfmt()
Signed-off-by: René Scharfe
1c56fc2 ("name-rev
: pre-size buffer in get_parent_name()
", 2020-02-04, Git v2.26.0-rc0 -- merge listed in batch #6) got a big performance boost in an unusual repository by calculating the name length in advance.
This is a bit awkward, as it references the name components twice.
Use a memory pool to store the strings for the struct rev_name
member tip_name
.
Using mem_pool_strfmt()
allows efficient allocation without explicit size calculation.
This simplifies the formatting part of the code without giving up performance:
Benchmark 1: ./git_2.44.0 -C ../chromium/src name-rev `--all`
Time (mean ± σ): 1.231 s ± 0.013 s [User: 1.082 s, System: 0.136 s]
Range (min … max): 1.214 s … 1.252 s 10 runs
Benchmark 2: ./[`git -C`](https://github.com/git/git/blob/f39addd0d9d75a073847ed4311079a499dd33f35/Documentation/git.txt#L61)<sup>([man](https://git-scm.com/docs/git#Documentation/git.txt--Cltpathgt))</sup> ../chromium/src name-rev `--all`
Time (mean ± σ): 1.220 s ± 0.020 s [User: 1.083 s, System: 0.130 s]
Range (min … max): 1.197 s … 1.254 s 10 runs
Don't bother discarding the memory pool just before exiting.
The effort for that would be very low, but actually measurable in the above example, with no benefit to users.
At least UNLEAK it to calm down leak checkers.
This addresses the leaks that 45a14f5 (Revert "name-rev
: release unused name strings", 2022-04-22, Git v2.37.0-rc0 -- merge) brought back.
git describe
and thengit name-rev
which both works similarly ? Please note thatgit describe
is not limited to tags. – Jobi