For files you do not version, see also "UNTRACKED FILES AND PERFORMANCE" with git status
.
git status
should be quicker in Git 2.13 (Q2 2017), because of:
On that last point, see commit a33fc72 (14 Apr 2017) by Jeff Hostetler (jeffhostetler
).
(Merged by Junio C Hamano -- gitster
-- in commit cdfe138, 24 Apr 2017)
read-cache
: force_verify_index_checksum
Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr()
which is called from read_index()
unless the "force_verify_index_checksum
" global variable is set.
Teach fsck
to force this verification.
The checksum verification is for detecting disk corruption, and for small projects, the time it takes to compute SHA-1 is not that significant, but for gigantic repositories this calculation adds significant time to every command.
Git 2.14 improves again git status performance by better taking into account the "untracked cache", which allows Git to skip reading the untracked directories if their stat
data have not changed, using the mtime
field of the stat
structure.
See the Documentation/technical/index-format.txt
for more on untracked cache.
See commit edf3b90 (08 May 2017) by David Turner (dturner-tw
).
(Merged by Junio C Hamano -- gitster
-- in commit fa0624f, 30 May 2017)
When "git checkout
", "git merge
", etc. manipulates the in-core index, various pieces of information in the index extensions are discarded from the original state, as it is usually not the case that they are kept up-to-date and in-sync with the operation on the main index.
The untracked cache extension is copied across these operations now, which would speed up "git status" (as long as the cache is properly invalidated).
More generally, writing to the cache will be also quicker with Git 2.14.x/2.15
See commit ce012de, commit b50386c, commit 3921a0b (21 Aug 2017) by Kevin Willford (``).
(Merged by Junio C Hamano -- gitster
-- in commit 030faf2, 27 Aug 2017)
We used to spend more than necessary cycles allocating and freeing
piece of memory while writing each index entry out.
This has been optimized.
[That] would save anywhere between 3-7% when the index had over a million entries with no performance degradation on small repos.
Update Dec. 2017: Git 2.16 (Q1 2018) will propose an additional enhancement, this time for git log
, since the code to iterate over loose object files just got optimized.
See commit 163ee5e (04 Dec 2017) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit 97e1f85, 13 Dec 2017)
sha1_file
: use strbuf_add()
instead of strbuf_addf()
Replace use of strbuf_addf()
with strbuf_add()
when enumerating
loose objects in for_each_file_in_obj_subdir()
. Since we already
check the length and hex-values of the string before consuming
the path, we can prevent extra computation by using the lower-
level method.
One consumer of for_each_file_in_obj_subdir()
is the abbreviation
code. OID (object identifiers) abbreviations use a cached list of loose objects (per object subdirectory) to make repeated queries fast, but there is
significant cache load time when there are many loose objects.
Most repositories do not have many loose objects before repacking, but in the GVFS case (see "Announcing GVFS (Git Virtual File System)") the repos can grow to have millions of loose objects.
Profiling 'git log' performance in Git For Windows on a GVFS-enabled repo with ~2.5 million loose objects revealed 12% of the CPU time was spent in strbuf_addf()
.
Add a new performance test to p4211-line-log.sh
that is more
sensitive to this cache-loading.
By limiting to 1000 commits, we more closely resemble user wait time when reading history into a pager.
For a copy of the Linux repo with two ~512 MB packfiles and ~572K loose objects, running 'git log --oneline --parents --raw -1000
' had the following performance:
HEAD~1 HEAD
----------------------------------------
7.70(7.15+0.54) 7.44(7.09+0.29) -3.4%
Update March 2018: Git 2.17 will improve git status
some more: see this answer.
Update: Git 2.20 (Q4 2018) adds Index Entry Offset Table (IEOT), which allows for git status
to load the index faster.
See commit 77ff112, commit 3255089, commit abb4bb8, commit c780b9c, commit 3b1d9e0, commit 371ed0d (10 Oct 2018) by Ben Peart (benpeart
).
See commit 252d079 (26 Sep 2018) by Nguyễn Thái Ngọc Duy (pclouds
).
(Merged by Junio C Hamano -- gitster
-- in commit e27bfaa, 19 Oct 2018)
read-cache: load cache entries on worker threads
This patch helps address the CPU cost of loading the index by utilizing
the Index Entry Offset Table (IEOT) to divide loading and conversion of
the cache entries across multiple threads in parallel.
I used p0002-read-cache.sh
to generate some performance data:
Test w/100,000 files reduced the time by 32.24%
Test w/1,000,000 files reduced the time by -4.77%
Note that on the 1,000,000 files case, multi-threading the cache entry parsing
does not yield a performance win. This is because the cost to parse the
index extensions in this repo, far outweigh the cost of loading the cache
entries.
That allows for:
config
: add new index.threads
config setting
Add support for a new index.threads
config setting which will be used to
control the threading code in do_read_index()
.
- A value of 0 will tell the index code to automatically determine the correct number of threads to use.
A value of 1 will make the code single threaded.
- A value greater than 1 will set the maximum number of threads to use.
For testing purposes, this setting can be overwritten by setting the
GIT_TEST_INDEX_THREADS=<n>
environment variable to a value greater than 0.
Git 2.21 (Q1 2019) introduces a new improvement, with the update of the loose object cache, used to optimize existence look-up, which has been updated.
See commit 8be88db (07 Jan 2019), and commit 4cea1ce, commit d4e19e5, commit 0000d65 (06 Jan 2019) by René Scharfe (rscharfe
).
(Merged by Junio C Hamano -- gitster
-- in commit eb8638a, 18 Jan 2019)
object-store
: use one oid_array
per subdirectory for loose cache
The loose objects cache is filled one subdirectory at a time as needed.
It is stored in an oid_array
, which has to be resorted after each add operation.
So when querying a wide range of objects, the partially filled array needs to be resorted up to 255 times, which takes over 100 times longer than sorting once.
Use one oid_array
for each subdirectory.
This ensures that entries have to only be sorted a single time.
It also avoids eight binary search steps for each cache lookup as a small bonus.
The cache is used for collision checks for the log placeholders %h
, %t
and %p
, and we can see the change speeding them up in a repository with ca. 100 objects per subdirectory:
$ git count-objects
26733 objects, 68808 kilobytes
Test HEAD^ HEAD
--------------------------------------------------------------------
4205.1: log with %H 0.51(0.47+0.04) 0.51(0.49+0.02) +0.0%
4205.2: log with %h 0.84(0.82+0.02) 0.60(0.57+0.03) -28.6%
4205.3: log with %T 0.53(0.49+0.04) 0.52(0.48+0.03) -1.9%
4205.4: log with %t 0.84(0.80+0.04) 0.60(0.59+0.01) -28.6%
4205.5: log with %P 0.52(0.48+0.03) 0.51(0.50+0.01) -1.9%
4205.6: log with %p 0.85(0.78+0.06) 0.61(0.56+0.05) -28.2%
4205.7: log with %h-%h-%h 0.96(0.92+0.03) 0.69(0.64+0.04) -28.1%
With Git 2.26 (Q1 2020), the object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal.
There however are some cases where they can work together, and they were taught about them.
See commit 20a5fd8 (18 Feb 2020) by Junio C Hamano (gitster
).
See commit 3ab3185, commit 84243da, commit 4f3bd56, commit cc4aa28, commit 2aaeb9a, commit 6663ae0, commit 4eb707e, commit ea047a8, commit 608d9c9, commit 55cb10f, commit 792f811, commit d90fe06 (14 Feb 2020), and commit e03f928, commit acac50d, commit 551cf8b (13 Feb 2020) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit 0df82d9, 02 Mar 2020)
pack-bitmap
: implement BLOB_NONE
filtering
Signed-off-by: Jeff King
We can easily support BLOB_NONE
filters with bitmaps.
Since we know the types of all of the objects, we just need to clear the result bits of any blobs.
Note two subtleties in the implementation (which I also called out in comments):
- we have to include any blobs that were specifically asked for (and not reached through graph traversal) to match the non-bitmap version
- we have to handle in-pack and "ext_index" objects separately.
Arguably prepare_bitmap_walk() could be adding these ext_index
objects to the type bitmaps.
But it doesn't for now, so let's match the rest of the bitmap code here (it probably wouldn't be an efficiency improvement to do so since the cost of extending those bitmaps is about the same as our loop here, but it might make the code a bit simpler).
Here are perf results for the new test on git.git:
Test HEAD^ HEAD
--------------------------------------------------------------------------------
5310.9: rev-list count with blob:none 1.67(1.62+0.05) 0.22(0.21+0.02) -86.8%
To know more aboud oid_array
, consider Git 2.27 (Q2 2020)
See commit 0740d0a, commit c79eddf, commit 7383b25, commit ed4b804, commit fe299ec, commit eccce52, commit 600bee4 (30 Mar 2020) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit a768f86, 22 Apr 2020)
oid_array
: use size_t
for count and allocation
Signed-off-by: Jeff King
The oid_array
object uses an "int
" to store the number of items and the allocated size.
It's rather unlikely for somebody to have more than 2^31 objects in a repository (the sha1's alone would be 40GB!), but if they do, we'd overflow our alloc variable.
You can reproduce this case with something like:
git init repo
cd repo
# make a pack with 2^24 objects
perl -e '
my $nr = 2**24;
for (my $i = 0; $i < $nr; $i++) {
print "blob\n";
print "data 4\n";
print pack("N", $i);
}
| git fast-import
# now make 256 copies of it; most of these objects will be duplicates,
# but oid_array doesn't de-dup until all values are read and it can
# sort the result.
cd .git/objects/pack/
pack=$(echo *.pack)
idx=$(echo *.idx)
for i in $(seq 0 255); do
# no need to waste disk space
ln "$pack" "pack-extra-$i.pack"
ln "$idx" "pack-extra-$i.idx"
done
# and now force an oid_array to store all of it
git cat-file --batch-all-objects --batch-check
which results in:
fatal: size_t overflow: 32 * 18446744071562067968
So the good news is that st_mult()
sees the problem (the large number is because our int wraps negative, and then that gets cast to a size_t
), doing the job it was meant to: bailing in crazy situations rather than causing an undersized buffer.
But we should avoid hitting this case at all, and instead limit ourselves based on what malloc()
is willing to give us.
We can easily do that by switching to size_t
.
The cat-file
process above made it to ~120GB virtual set size before the integer overflow (our internal hash storage is 32-bytes now in preparation for sha256, so we'd expect ~128GB total needed, plus potentially more to copy from one realloc'd block to another)).
After this patch (and about 130GB of RAM+swap), it does eventually read in the whole set. No test for obvious reasons.
Note that this object was defined in sha1-array.c
, which has been renamed oid-array.c
: a more neutral name, considering Git will be eventually transition from SHA1 to SHA2.
Another optimization:
With Git 2.31 (Q1 2021), the code around the cache-tree extension in the index has been optimized.
See commit a4b6d20, commit 4bdde33, commit 22ad860, commit 845d15d (07 Jan 2021), and commit 0e5c950, commit 4c3e187, commit fa7ca5d, commit c338898, commit da8be8c (04 Jan 2021) by Derrick Stolee (derrickstolee
).
See commit 0b72536 (07 Jan 2021) by René Scharfe (rscharfe
).
(Merged by Junio C Hamano -- gitster
-- in commit a0a2d75, 05 Feb 2021)
cache-tree
: speed up consecutive path comparisons
Signed-off-by: Derrick Stolee
The previous change reduced time spent in strlen()
while comparing consecutive paths in verify_cache()
, but we can do better.
The conditional checks the existence of a directory separator at the correct location, but only after doing a string comparison.
Swap the order to be logically equivalent but perform fewer string comparisons.
To test the effect on performance, I used a repository with over three million paths in the index.
I then ran the following command on repeat:
git -c index.threads=1 commit --amend --allow-empty --no-edit
Here are the measurements over 10 runs after a 5-run warmup:
Benchmark #1: v2.30.0
Time (mean ± σ): 854.5 ms ± 18.2 ms
Range (min … max): 825.0 ms … 892.8 ms
Benchmark #2: Previous change
Time (mean ± σ): 833.2 ms ± 10.3 ms
Range (min … max): 815.8 ms … 849.7 ms
Benchmark #3: This change
Time (mean ± σ): 815.5 ms ± 18.1 ms
Range (min … max): 795.4 ms … 849.5 ms
This change is 2% faster than the previous change and 5% faster than v2.30.0.