Ways to improve git status performance
Asked Answered
J

14

101

I have a repo of 10 GB on a Linux machine which is on NFS. The first time git status takes 36 minutes and subsequent git status takes 8 minutes. Seems Git depends on the OS for caching files. Only the first git commands like commit, status that involves pack/repack the whole repo takes a very long time for a huge repo. I am not sure if you have used git status on such a large repo, but has anyone come across this issue?

I have tried git gc, git clean, git repack but the time taken is still/almost the same.

Will sub-modules or any other concepts like breaking the repo into smaller ones help? If so which is the best for splitting a larger repo. Is there any other way to improve time taken for git commands on a large repo?

Juliettajuliette answered 14/2, 2011 at 16:49 Comment(2)
NFS is pretty much the bottleneck here. lstat is quite a synchronous operation.Lichi
Possible duplicate of Git Status Takes a Long Time to CompleteWoodworking
H
52

To be more precise, git depends on the efficiency of the lstat(2) system call, so tweaking your client’s “attribute cache timeout” might do the trick.

The manual for git-update-index — essentially a manual mode for git-status — describes what you can do to alleviate this, by using the --assume-unchanged flag to suppress its normal behavior and manually update the paths that you have changed. You might even program your editor to unset this flag every time you save a file.

The alternative, as you suggest, is to reduce the size of your checkout (the size of the packfiles doesn’t really come into play here). The options are a sparse checkout, submodules, or Google’s repo tool.

(There’s a mailing list thread about using Git with NFS, but it doesn’t answer many questions.)

Halfprice answered 14/2, 2011 at 17:27 Comment(3)
The thing you missed: Linus' patch there did actually get merged, and it can be enabled by setting core.preloadindex to true - see the git-config docs for a little more of a description. (My workplace uses NFS, and I've run into exactly this problem - but never noticed the preloadindex setting. Thanks for pointing me the right way!)Dreamworld
'git config core.preloadindex true' should be added to the accepted answer here. possibly with the -uno flag from user1077329These
core.preloadindex flag is set to true by default as of Git 2.1.0: git.kernel.org/pub/scm/git/git.git/tree/Documentation/RelNotes/…Arne
H
39

I'm also seeing this problem on a large project shared over NFS.

It took me some time to discover the flag -uno that can be given to both git commit and git status.

What this flag does is to disable looking for untracked files. This reduces the number of nfs operations significantly. The reason is that in order for git to discover untracked files it has to look in all subdirectories so if you have many subdirectories this will hurt you. By disabling git from looking for untracked files you eliminate all these NFS operations.

Combine this with the core.preloadindex flag and you can get resonable perfomance even on NFS.

Headmistress answered 30/5, 2012 at 20:48 Comment(1)
As is mentioned in git-status(1) it can be set as default by setting the status.showUntrackedFiles config.Erebus
P
37

Try git gc. Also, git clean may help.

The git manual states:

Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance) and removing unreachable objects which may have been created from prior invocations of git add.

Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.

I always notice a difference after running git gc when git status is slow!

UPDATE II - Not sure how I missed this, but the OP already tried git gc and git clean. I swear that wasn't originally there, but I don't see any changes in the edits. Sorry for that!

Pash answered 24/5, 2016 at 20:58 Comment(5)
I don't understand the down vote either; this is really helpful. git gc cut down the time for git log to run from 15 seconds to 0 on one of my repos.Debut
@NicolasC Ah! Not sure how I missed that, but I'd down vote my answer for that as well. :-/Pash
git cg is good, git clean maybe could delete some unwanted file?Phenomena
WARNING: avoid git clean, it will delete all files not under version control! This is very bad, as these are the ones that are impossible to retrieve.Osmometer
git gc --aggressive is even betterDynamiter
S
26

If your git repo makes heavy use of submodules, you can greatly speed up the performance of git status by editing the config file in the .git directory and setting ignore = dirty on any particularly large/heavy submodules. For example:

[submodule "mysubmodule"]
url = ssh://mysubmoduleURL
ignore = dirty

You'll lose the convenience of a reminder that there are unstaged changes in any of the submodules that you may have forgotten about, but you'll still retain the main convenience of knowing when the submodules are out of sync with the main repo. Plus, you can still change your working directory to the submodule itself and use git status within it as per usual to see more information. See this question for more details about what "dirty" means.

Sruti answered 24/8, 2012 at 14:40 Comment(1)
I added boost as a submodule to some c++ project and your answer was exactly what I wanted. Thanks!! Is there a way for this config setting to propagate to all repos on other machines for that project? It seems like just pushing won't do that.Challah
H
11

The performance of git status should improve with Git 2.13 (Q2 2017).

See commit 950a234 (14 Apr 2017) by Jeff Hostetler (jeffhostetler).
(Merged by Junio C Hamano -- gitster -- in commit 8b6bba6, 24 Apr 2017)

> string-list: use ALLOC_GROW macro when reallocing string_list

Use ALLOC_GROW() macro when reallocing a string_list array rather than simply increasing it by 32.
This is a performance optimization.

During status on a very large repo and there are many changes, a significant percentage of the total run time is spent reallocing the wt_status.changes array.

This change decreases the time in wt_status_collect_changes_worktree() from 125 seconds to 45 seconds on my very large repository.


Plus, Git 2.17 (Q2 2018) will introduce a new trace, for measuring where the time is spent in the index-heavy operations.

See commit ca54d9b (27 Jan 2018) by Nguyễn Thái Ngọc Duy (pclouds).
(Merged by Junio C Hamano -- gitster -- in commit 090dbea, 15 Feb 2018)

trace: measure where the time is spent in the index-heavy operations

All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not.
An unoptimized git-status would give something like below:

0.001791141 s: read cache ...
0.004011363 s: preload index
0.000516161 s: refresh index
0.003139257 s: git command: ... 'status' '--porcelain=2'
0.006788129 s: diff-files
0.002090267 s: diff-index
0.001885735 s: initialize name hash
0.032013138 s: read directory
0.051781209 s: git command: './git' 'status'

The same Git 2.17 (Q2 2018) improves git status with:

revision.c: reduce object database queries

In mark_parents_uninteresting(), we check for the existence of an object file to see if we should treat a commit as parsed. The result is to set the "parsed" bit on the commit.

Modify the condition to only check has_object_file() if the result would change the parsed bit.

When a local branch is different from its upstream ref, "git status" will compute ahead/behind counts.
This uses paint_down_to_common() and hits mark_parents_uninteresting().

On a copy of the Linux repo with a local instance of "master" behind the remote branch "origin/master" by ~60,000 commits, we find the performance of "git status" went from 1.42 seconds to 1.32 seconds, for a relative difference of -7.0%.


Git 2.24 (Q3 2019) proposes another setting to improve git status performance:

See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e (13 Aug 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit f4f8dfe, 09 Sep 2019)

repo-settings: create feature.manyFiles setting

The feature.manyFiles setting is suitable for repos with many files in the working directory.
By setting index.version=4 and core.untrackedCache=true, commands such as 'git status' should improve.

But:

With Git 2.24 (Q4 2019), the codepath that reads the index.version configuration was broken with a recent update, which has been corrected.

See commit c11e996 (23 Oct 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 4d6fb2b, 24 Oct 2019)

repo-settings: read an int for index.version

Signed-off-by: Derrick Stolee

Several config options were combined into a repo_settings struct in ds/feature-macros, including a move of the "index.version" config setting in 7211b9e ("repo-settings: consolidate some config settings", 2019-08-13, Git v2.24.0-rc1 -- merge listed in batch #0).

Unfortunately, that file looked like a lot of boilerplate and what is clearly a factor of copy-paste overload, the config setting is parsed with repo_config_ge_bool() instead of repo_config_get_int(). This means that a setting "index.version=4" would not register correctly and would revert to the default version of 3.

I caught this while incorporating v2.24.0-rc0 into the VFS for Git codebase, where we really care that the index is in version 4.

This was not caught by the codebase because the version checks placed in t1600-index.sh did not test the "basic" scenario enough. Here, we modify the test to include these normal settings to not be overridden by features.manyFiles or GIT_INDEX_VERSION.
While the "default" version is 3, this is demoted to version 2 in do_write_index() when not necessary.


git status will also compare SHA1 faster, due to Git 2.33 (Q3 2021), using an optimized hashfile API in the codepath that writes the index file.

See commit f6e2cd0, commit 410334e, commit 2ca245f (18 May 2021), and commit 68142e1 (17 May 2021) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 0dd2fd1, 14 Jun 2021)

csum-file.h: increase hashfile buffer size

Signed-off-by: Derrick Stolee

The hashfile API uses a hard-coded buffer size of 8KB and has ever since it was introduced in c38138c ("git-pack-objects: write the pack files with a SHA1 csum", 2005-06-26, Git v0.99 -- merge).
It performs a similar function to the hashing buffers in read-cache.c, but that code was updated from 8KB to 128KB in f279894 ("read-cache: make the index write buffer size 128K", 2021-02-18, Git v2.31.0-rc1 -- merge).
The justification there was that do_write_index() improves from 1.02s to 0.72s.
Since our end goal is to have the index writing code use the hashfile API, we need to unify this buffer size to avoid a performance regression.

Since these buffers are now on the heap, we can adjust their size based on the needs of the consumer.
In particular, callers to hashfd_throughput() are expecting to report progress indicators as the buffer flushes.
These callers would prefer the smaller 8k buffer to avoid large delays between updates, especially for users with slower networks.
When the progress indicator is not used, the larger buffer is preferable.

By adding a new trace2 region in the chunk-format API, we can see that the writing portion of 'git multi-pack-index write'(man) lowers from ~1.49s to ~1.47s on a Linux machine.
These effects may be more pronounced or diminished on other filesystems.

Hanway answered 26/4, 2017 at 20:55 Comment(3)
See also https://mcmap.net/q/20310/-git-is-really-slow-for-100-000-objects-any-fixes and the new index.threads config settingHanway
GIT_TRACE=true git log This is how you run trace and find bottleneckBadge
@Badge Actually, since Git .22, you have also trace2: https://mcmap.net/q/12679/-how-can-i-debug-git-git-shell-related-problemsHanway
M
7

git config --global core.preloadIndex true

Did the job for me. Check the official documentation here.

Migratory answered 23/1, 2018 at 13:20 Comment(5)
What version of Git are you using?Hanway
2.7.4. I use Linux Subsystem For Windows and even updated apt-get seems to have references to quite old packages.Migratory
Ok make sense. I don't think it is needed with more recent version.Hanway
This even helped me out with git version 2.17.1Uncomfortable
This is now enabled by default already.Attaway
C
6

In our codebase where we have somewhere in the range of 20 - 30 submodules,
git status --ignore-submodules
sped things up for me drastically. Do note that this will not report on the status of submodules.

Caesura answered 16/4, 2019 at 0:0 Comment(1)
set for all the future git status: git config diff.ignoreSubmodules dirtyLewie
L
5

Something that hasn't been mentioned yet is, to activate the filesystem cache on windows machines (linux filesystems are completly different and git was optimized for them, therefore this probably only helps on windows).

git config core.fscache true


As a last resort, if git is still slow, one could turn off the modification time inspection, that git needs to find out which files have changed.
git config core.ignoreStat true

BUT: Changed files have to be added afterwards by the dev himself with git add. Git doesn't find changes itself.

source

Lustreware answered 3/7, 2019 at 20:21 Comment(1)
This helped me on Windows 10, even though I had a rather recent version of Git for Windows. Thank you. My repo was ~100 Gb in .git folder (git lfs)Shellishellie
H
5

With Git 2.40 (Q1 2023), the advice message given by "git status"(man) when it takes a long time to enumerate untracked paths has been updated.

It better illustrates all the configuration settings you can apply to get a snappier/faster git status.

See commit ecbc23e (30 Nov 2022) by Rudy Rigot (rudyrigot).
(Merged by Junio C Hamano -- gitster -- in commit f3d9bc8, 19 Dec 2022)

status: modernize git-status "slow untracked files" advice

Signed-off-by: Rudy Rigot

git status(man) can be slow when there are a large number of untracked files and directories since Git must search the entire worktree to enumerate them.
When it is too slow, Git prints advice with the elapsed search time and a suggestion to disable the search using the -uno option.
This suggestion also carries a warning that might scare off some users.

However, these days, -uno isn't the only option.
Git can reduce the time taken to enumerate untracked files by caching results from previous git status invocations, when the core.untrackedCache and core.fsmonitor features are enabled.

Update the git status man page to explain these configuration options, and update the advice to provide more detail about the current configuration and to refer to the updated documentation.

git status now includes in its man page:

UNTRACKED FILES AND PERFORMANCE

git status can be very slow in large worktrees if/when it needs to search for untracked files and directories.

There are many configuration options available to speed this up by either avoiding the work or making use of cached results from previous Git commands.
There is no single optimum set of settings right for everyone.

We'll list a summary of the relevant options to help you, but before going into the list, you may want to run git status again, because your configuration may already be caching git status results, so it could be faster on subsequent runs.

  • The --untracked-files=no flag or the status.showUntrackedFiles=no config (see above for both): indicate that git status should not report untracked files. This is the fastest option. git status will not list the untracked files, so you need to be careful to remember if you create any new files and manually git add them.

  • advice.statusUoption=false (see git config): setting this variable to false disables the warning message given when enumerating untracked files takes more than 2 seconds. In a large project, it may take longer and the user may have already accepted the trade off (e.g. using "-uno" may not be an acceptable option for the user), in which case, there is no point issuing the warning message, and in such a case, disabling the warning may be the best.

  • core.untrackedCache=true (see git update-index): enable the untracked cache feature and only search directories that have been modified since the previous git status command.
    Git remembers the set of untracked files within each directory and assumes that if a directory has not been modified, then the set of untracked files within has not changed.

    This is much faster than enumerating the contents of every directory, but still not without cost, because Git still has to search for the set of modified directories. The untracked cache is stored in the .git/index file. The reduced cost of searching for untracked files is offset slightly by the increased size of the index and the cost of keeping it up-to-date. That reduced search time is usually worth the additional size.

  • core.untrackedCache=true and core.fsmonitor=true or core.fsmonitor=<hook_command_pathname> (see git update-index): enable both the untracked cache and FSMonitor features and only search directories that have been modified since the previous git status command.

    This is faster than using just the untracked cache alone because Git can also avoid searching for modified directories.
    Git only has to enumerate the exact set of directories that have changed recently. While the FSMonitor feature can be enabled without the untracked cache, the benefits are greatly reduced in that case.

Note that after you turn on the untracked cache and/or FSMonitor features it may take a few git status commands for the various caches to warm up before you see improved command times.
This is normal.

Hanway answered 22/12, 2022 at 13:48 Comment(1)
See also commit 71ccda7, Git 2.45, Q2 2024.Hanway
I
2

Ok, this is quite hard to believe if I wouldn't see with my eyes... I had very BAD performance on my brand new work laptop, git status takes from 5 to 10 seconds to complete even for the most stupid repository. I've tried all the advice in this thread then I noticed that also git log was slow so I've broad my search for generic slowness of git fresh installation and I've found this https://github.com/gitextensions/gitextensions/issues/5314#issuecomment-416081823

in a desperate move I've tried to update the graphic driver of my laptop and...

Holy Santa Claus sh*t... that did the trick!

...for me too!

So apparently graphic card driver have some relation here... hard to understand why, but now the performance are "as expected"!

Incantatory answered 2/6, 2021 at 12:3 Comment(0)
Q
1

Leftover index.lock files

git status can be pathologically slow when you have leftover index.lock files.

This happens especially when you have git submodules, because then you often don't notice such lefterover files.

Summary: Run find .git/ -name index.lock, and delete the leftover files after checking that they are indeed not used by any currently running program.


Details

I found that my shell git status was extremely slow in my repo, with git 2.19 on Ubuntu 16.04.

Dug in and found that /usr/bin/time git status in my assets git submodule took 1.7 seconds.

Found with strace that git read all my big files in there with mmap. It doesn't usually do that, usually stat is enough.

I googled the problem and found the Use of index and Racy Git problem.

Tried git update-index somefile (in my case gitignore in the submodule checkout) shown here but it failed with

fatal: Unable to create '/home/niklas/src/myproject/.git/modules/assets/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.

This is a classical error. Usually you notice it at any git operation, but for submodules that you don't often commit to, you may not notice it for months, because it only appears when adding something to the index; the warning is not raised on read-only git status.

Removing the index.lock file, git status became fast immediately, mmaps disappeared, and it's now over 1000x faster.

So if your git status is unnaturally slow, check find .git/ -name index.lock and delete the leftovers.

Quita answered 10/10, 2019 at 16:21 Comment(0)
D
1

A frequent cause of slowness for big repos is status command's up-to-date check with the remote branch - set this repo-level configuration to disable it:

git config status.aheadBehind false
Darlington answered 22/12, 2022 at 11:18 Comment(0)
O
0

As a test, try temporarily disabling realtime protection for antivirus software. If that's the issue, swap your antivirus.

Case in point: I had Webroot running, and it was taking 30 to 60 seconds to do anything with Git. Paused the realtime protection, and suddenly my original performance was back, with sub-second updates and a fast, snappy system.

I chose Webroot as it is famed for minimal impact on system performance, but in this case it was pouring metaphorical molasses into my CPU.

Osmometer answered 22/2, 2023 at 10:35 Comment(0)
I
-1

It is a pretty old question. Though, I am surprised that no one commented about binary file given the repository size.

You mentioned that your git repo is ~10GB. It seems that apart from NFS issue and other git issues (resolvable by git gc and git configuration change as outline in other answers), git commands (git status, git diff, git add) might be slow because of large number of binary file in the repository. git is not good at handling binary file. You can remove unnecessary binary file using following command (example is given for NetCDF file; have a backup of git repository before):

git filter-branch --force --index-filter \  
'git rm --cached --ignore-unmatch *.nc' \   
--prune-empty --tag-name-filter cat -- --all

Do not forget to put '*.nc' to gitignore file to stop git from recommit the file.

Indignation answered 22/11, 2019 at 5:16 Comment(1)
Size of files has nothing to do with git status performance.Fiddling

© 2022 - 2024 — McMap. All rights reserved.