The performance of git status should improve with Git 2.13 (Q2 2017).
See commit 950a234 (14 Apr 2017) by Jeff Hostetler (jeffhostetler
).
(Merged by Junio C Hamano -- gitster
-- in commit 8b6bba6, 24 Apr 2017)
> string-list
: use ALLOC_GROW
macro when reallocing string_list
Use ALLOC_GROW()
macro when reallocing a string_list
array
rather than simply increasing it by 32.
This is a performance optimization.
During status on a very large repo and there are many changes,
a significant percentage of the total run time is spent reallocing the wt_status.changes
array.
This change decreases the time in wt_status_collect_changes_worktree()
from 125 seconds to 45 seconds on my very large repository.
Plus, Git 2.17 (Q2 2018) will introduce a new trace, for measuring where the time is spent in the index-heavy operations.
See commit ca54d9b (27 Jan 2018) by Nguyễn Thái Ngọc Duy (pclouds
).
(Merged by Junio C Hamano -- gitster
-- in commit 090dbea, 15 Feb 2018)
trace
: measure where the time is spent in the index-heavy operations
All the known heavy code blocks are measured (except object database
access). This should help identify if an optimization is effective or
not.
An unoptimized git-status would give something like below:
0.001791141 s: read cache ...
0.004011363 s: preload index
0.000516161 s: refresh index
0.003139257 s: git command: ... 'status' '--porcelain=2'
0.006788129 s: diff-files
0.002090267 s: diff-index
0.001885735 s: initialize name hash
0.032013138 s: read directory
0.051781209 s: git command: './git' 'status'
The same Git 2.17 (Q2 2018) improves git status
with:
revision.c
: reduce object database queries
In mark_parents_uninteresting()
, we check for the existence of an
object file to see if we should treat a commit as parsed. The result
is to set the "parsed" bit on the commit.
Modify the condition to only check has_object_file()
if the result
would change the parsed bit.
When a local branch is different from its upstream ref, "git status
"
will compute ahead/behind counts.
This uses paint_down_to_common()
and hits mark_parents_uninteresting()
.
On a copy of the Linux repo with a local instance of "master" behind the remote branch "origin/master
" by ~60,000 commits, we find the performance of
"git status
" went from 1.42 seconds to 1.32 seconds, for a relative
difference of -7.0%.
Git 2.24 (Q3 2019) proposes another setting to improve git status
performance:
See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e (13 Aug 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit f4f8dfe, 09 Sep 2019)
repo-settings: create feature.manyFiles setting
The feature.manyFiles
setting is suitable for repos with many
files in the working directory.
By setting index.version=4
and core.untrackedCache=true
, commands such as 'git status
' should improve.
But:
With Git 2.24 (Q4 2019), the codepath that reads the index.version
configuration was broken with a recent update, which has been corrected.
See commit c11e996 (23 Oct 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit 4d6fb2b, 24 Oct 2019)
repo-settings
: read an int for index.version
Signed-off-by: Derrick Stolee
Several config options were combined into a repo_settings
struct in ds/feature-macros, including a move of the "index.version" config setting in 7211b9e ("repo-settings
: consolidate some config settings", 2019-08-13, Git v2.24.0-rc1 -- merge listed in batch #0).
Unfortunately, that file looked like a lot of boilerplate and what is clearly a factor of copy-paste overload, the config setting is parsed with repo_config_ge_bool()
instead of repo_config_get_int()
. This means that a setting "index.version=4" would not register correctly and would revert to the default version of 3.
I caught this while incorporating v2.24.0-rc0 into the VFS for Git codebase, where we really care that the index is in version 4.
This was not caught by the codebase because the version checks placed in t1600-index.sh
did not test the "basic" scenario enough. Here, we modify the test to include these normal settings to not be overridden by features.manyFiles
or GIT_INDEX_VERSION
.
While the "default" version is 3, this is demoted to version 2 in do_write_index()
when not necessary.
git status
will also compare SHA1 faster, due to Git 2.33 (Q3 2021), using an optimized hashfile API in the codepath that writes the index file.
See commit f6e2cd0, commit 410334e, commit 2ca245f (18 May 2021), and commit 68142e1 (17 May 2021) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit 0dd2fd1, 14 Jun 2021)
csum-file.h
: increase hashfile buffer size
Signed-off-by: Derrick Stolee
The hashfile API uses a hard-coded buffer size of 8KB and has ever since it was introduced in c38138c ("git-pack-objects
: write the pack files with a SHA1 csum", 2005-06-26, Git v0.99 -- merge).
It performs a similar function to the hashing buffers in read-cache.c
, but that code was updated from 8KB to 128KB in f279894 ("read-cache
: make the index write buffer size 128K", 2021-02-18, Git v2.31.0-rc1 -- merge).
The justification there was that do_write_index()
improves from 1.02s to 0.72s.
Since our end goal is to have the index writing code use the hashfile API, we need to unify this buffer size to avoid a performance regression.
Since these buffers are now on the heap, we can adjust their size based on the needs of the consumer.
In particular, callers to hashfd_throughput()
are expecting to report progress indicators as the buffer flushes.
These callers would prefer the smaller 8k buffer to avoid large delays between updates, especially for users with slower networks.
When the progress indicator is not used, the larger buffer is preferable.
By adding a new trace2
region in the chunk-format API, we can see that the writing portion of 'git multi-pack-index write
'(man) lowers from ~1.49s to ~1.47s on a Linux machine.
These effects may be more pronounced or diminished on other filesystems.