With Git 2.25 (Q1 2020), Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout
" command.
Git 2.37 (Q3 2022) makes the cone mode the default. See last section of this answer.
First, here is an extended example, starting with a fast clone using a --filter
option:
git clone --filter=blob:none --no-checkout https://github.com/git/git
cd git
git sparse-checkout init --cone
# that sets git config core.sparseCheckoutCone true
git read-tree -mu HEAD
Using the cone option (detailed/documented below) means your .git\info\sparse-checkout
will include patterns starting with:
/*
!/*/
Meaning: only top files, no subfolder.
If you do not want top file, you need to avoid the cone mode:
# Disablecone mode in .git/config.worktree
git config core.sparseCheckoutCone false
# remove .git\info\sparse-checkout
git sparse-checkout disable
# Add the expected pattern, to include just a subfolder without top files:
git sparse-checkout set /mySubFolder/
# populate working-tree with only the right files:
git read-tree -mu HEAD
I'm just trying to figure out what to pass to git sparse-checkout set
to exclude something.
That is trickier.
A workaround might involve including everything else explicitly.
This should result in the presentations
directory being included in your sparse checkout, but without the heavy_presentation
subdirectory
That would be:
# Initialize the sparse-checkout feature
git sparse-checkout init --cone
# Set the directories you want to include and exclude
git sparse-checkout set presentations/*
git sparse-checkout add '!presentations/heavy_presentation'
In details:
(See more at "Bring your monorepo down to size with sparse-checkout
" from
Derrick Stolee)
So not only excluding a subfolder does work, but it will work faster with the "cone" mode of a sparse checkout (with Git 2.25).
See commits Merged by Junio C Hamano -- gitster
-- in commit bd72a08, 25 Dec 2019:
Signed-off-by: Derrick Stolee
The sparse-checkout feature can have quadratic performance as the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can be very significant.
Create a new Boolean config option, core.sparseCheckoutCone, to indicate that we expect the sparse-checkout file to contain a more limited set of patterns.
This is a separate config setting from core.sparseCheckout
to avoid breaking older clients by introducing a tri-state option.
The config
man page includes:
`core.sparseCheckoutCone`:
Enables the "cone mode" of the sparse checkout feature.
When the sparse-checkout file contains a limited set of patterns, then this mode provides significant performance advantages.
The git sparse-checkout
man page details:
CONE PATTERN SET
The full pattern set allows for arbitrary pattern matches and complicated inclusion/exclusion rules.
These can result in O(N*M)
pattern matches when updating the index, where N
is the number of patterns and M
is the number of paths in the index. To combat this performance issue, a more restricted pattern set is allowed when core.spareCheckoutCone
is enabled.
The accepted patterns in the cone pattern set are:
- Recursive: All paths inside a directory are included.
- Parent: All files immediately inside a directory are included.
In addition to the above two patterns, we also expect that all files in the root directory are included. If a recursive pattern is added, then all leading directories are added as parent patterns.
By default, when running git sparse-checkout init
, the root directory is added as a parent pattern.
At this point, the sparse-checkout file contains the following patterns:
/*
!/*/
This says "include everything in root, but nothing two levels below root."
If we then add the folder A/B/C
as a recursive pattern, the folders A
and A/B
are added as parent patterns.
The resulting sparse-checkout file is now
/*
!/*/
/A/
!/A/*/
/A/B/
!/A/B/*/
/A/B/C/
Here, order matters, so the negative patterns are overridden by the positive
patterns that appear lower in the file.
If core.sparseCheckoutCone=true
, then Git will parse the sparse-checkout file expecting patterns of these types.
Git will warn if the patterns do not match.
If the patterns do match the expected format, then Git will use faster hash-
based algorithms to compute inclusion in the sparse-checkout
.
So:
Helped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee
To make the cone pattern set easy to use, update the behavior of 'git sparse-checkout (init|set)
'.
Add '--cone
' flag to 'git sparse-checkout init
' to set the config option 'core.sparseCheckoutCone=true
'.
When running 'git sparse-checkout set
' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories.
Note, the --cone
option is only documented in Git 2.26 (Q1 2020)
(Merged by Junio C Hamano -- gitster
-- in commit ea46d90, 05 Feb 2020)
doc
: sparse-checkout
: mention --cone
option
Signed-off-by: Matheus Tavares
Acked-by: Derrick Stolee
In af09ce2 ("sparse-checkout
: init and set in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge), the '--cone
' option was added to 'git sparse-checkout
init'.
Document it in git sparse-checkout
:
That includes:
When --cone
is provided, the core.sparseCheckoutCone
setting is also set, allowing for better performance with a limited set of patterns.
("set of patterns" presented above, in the "CONE PATTERN SET
" section of this answer)
How much faster this new "cone" mode would be?
sparse-checkout
: use hashmaps for cone patterns
Helped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee
The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match.
As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here:
/$folder/
!/$folder/*/
This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following performance:
core.sparseCheckout=false: 0.21 s (0.00 s)
core.sparseCheckout=true : 3.75 s (3.50 s)
core.sparseCheckoutCone=true : 0.23 s (0.01 s)
The times in parentheses above correspond to the time spent in the first clear_ce_flags()
call, according to the trace2
performance traces.
While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature.
And:
sparse-checkout
: respect core.ignoreCase in cone mode
Signed-off-by: Derrick Stolee
When a user uses the sparse-checkout feature in cone mode, they add patterns using "git sparse-checkout set <dir1> <dir2> ...
" or by using "--stdin
" to provide the directories line-by-line over stdin.
This behaviour naturally looks a lot like the way a user would type "git add <dir1> <dir2> ...
"
If core.ignoreCase
is enabled, then "git add
" will match the input using a case-insensitive match.
Do the same for the sparse-checkout
feature.
Perform case-insensitive checks while updating the skip-worktree bits during unpack_trees()
. This is done by changing the hash algorithm and hashmap comparison methods to optionally use case- insensitive methods.
When this is enabled, there is a small performance cost in the hashing algorithm.
To tease out the worst possible case, the following was run on a repo with a deep directory structure:
git ls-tree -d -r --name-only HEAD |
git sparse-checkout set --stdin
The 'set' command was timed with core.ignoreCase
disabled or enabled.
For the repo with a deep history, the numbers were
core.ignoreCase=false: 62s
core.ignoreCase=true: 74s (+19.3%)
For reproducibility, the equivalent test on the Linux kernel repository had these numbers:
core.ignoreCase=false: 3.1s
core.ignoreCase=true: 3.6s (+16%)
Now, this is not an entirely fair comparison, as most users will define their sparse cone using more shallow directories, and the performance improvement from eb42feca97 ("unpack-trees: hash less in cone mode" 2019-11-21, Git 2.25-rc0) can remove most of the hash cost. For a more realistic test, drop the "-r
" from the ls-tree
command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in each case, and the deep repository takes one second, plus or minus 0.05s, in each case.
Thus, we can demonstrate a cost to this change, but it is unlikely to matter to any reasonable sparse-checkout cone.
With Git 2.25 (Q1 2020), "git sparse-checkout
list" subcommand learned to give its output in a more concise form when the "cone" mode is in effect.
See commit 4fd683b, commit de11951 (30 Dec 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit c20d4fd, 06 Jan 2020)
sparse-checkout
: list directories in cone mode
Signed-off-by: Derrick Stolee
When core.sparseCheckoutCone
is enabled, the 'git sparse-checkout set
' command takes a list of directories as input, then creates an ordered list of sparse-checkout patterns such that those directories are recursively included and all sibling entries along the parent directories are also included.
Listing the patterns is less user-friendly than the directories themselves.
In cone mode, and as long as the patterns match the expected cone-mode pattern types, change the output of 'git sparse-checkout list
' to only show the directories that created the patterns.
With this change, the following piped commands would not change the working directory:
git sparse-checkout list | git sparse-checkout set --stdin
The only time this would not work is if core.sparseCheckoutCone
is true
, but the sparse-checkout file contains patterns that do not match the expected pattern types for cone mode.
The code recently added in this release to move to the entry beyond the ones in the same directory in the index in the sparse-cone mode did not count the number of entries to skip over incorrectly, which has been corrected, with Git 2.25.1 (Feb. 2020).
See commit 7210ca4 (27 Jan 2020) by Junio C Hamano (gitster
).
See commit 4c6c797 (10 Jan 2020) by Derrick Stolee via GitGitGadget (``).
(Merged by Junio C Hamano -- gitster
-- in commit 043426c, 30 Jan 2020)
unpack-trees
: correctly compute result count
Reported-by: Johannes Schindelin
Signed-off-by: Derrick Stolee
The clear_ce_flags_dir()
method processes the cache entries within a common directory. The returned int
is the number of cache entries processed by that directory.
When using the sparse-checkout feature in cone mode, we can skip the pattern matching for entries in the directories that are entirely included or entirely excluded.
eb42feca ("unpack-trees
: hash less in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge listed in batch #0) introduced this performance feature. The old mechanism relied on the counts returned by calling clear_ce_flags_1()
, but the new mechanism calculated the number of rows by subtracting "cache_end
" from "cache
" to find the size of the range.
However, the equation is wrong because it divides by sizeof(struct cache_entry *)
. This is not how pointer arithmetic works!
A coverity build of Git for Windows in preparation for the 2.25.0 release found this issue with the warning:
Pointer differences, such as `cache_end` - cache, are automatically
scaled down by the size (8 bytes) of the pointed-to type (struct `cache_entry` *).
Most likely, the division by sizeof(struct `cache_entry` *) is extraneous
and should be eliminated.
This warning is correct.
With Git 2.26 (Q1 2020), some rough edges in the sparse-checkout feature, especially around the cone mode, have been cleaned up.
See commit f998a3f, commit d2e65f4, commit e53ffe2, commit e55682e, commit bd64de4, commit d585f0e, commit 4f52c2c, commit 9abc60f (31 Jan 2020), and commit 9e6d3e6, commit 41de0c6, commit 47dbf10, commit 3c75406, commit d622c34, commit 522e641 (24 Jan 2020) by Derrick Stolee (derrickstolee
).
See commit 7aa9ef2 (24 Jan 2020) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit 433b8aa, 14 Feb 2020)
sparse-checkout
: fix cone mode behavior mismatch
Reported-by: Finn Bryant
Signed-off-by: Derrick Stolee
The intention of the special "cone mode" in the sparse-checkout feature is to always match the same patterns that are matched by the same sparse-checkout file as when cone mode is disabled.
When a file path is given to "git sparse-checkout
set" in cone mode, then the cone mode improperly matches the file as a recursive path.
When setting the skip-worktree bits, files were not expecting the MATCHED_RECURSIVE
response, and hence these were left out of the matched cone.
Fix this bug by checking for MATCHED_RECURSIVE
in addition to MATCHED
and add a test that prevents regression.
The documentation now includes:
When core.sparseCheckoutCone
is enabled, the input list is considered a
list of directories instead of sparse-checkout patterns.
The command writes patterns to the sparse-checkout file to include all files contained in those directories (recursively) as well as files that are siblings of ancestor directories.
The input format matches the output of git ls-tree --name-only
. This includes interpreting pathnames that begin with a double quote ("
) as C-style quoted strings.
With Git 2.26 (Q1 2020), "git sparse-checkout
" learned a new "add
" subcommand.
See commit 6c11c6a (20 Feb 2020), and commit ef07659, commit 2631dc8, commit 4bf0c06, commit 6fb705a (11 Feb 2020) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit f4d7dfc, 05 Mar 2020)
Signed-off-by: Derrick Stolee
When using the sparse-checkout feature, a user may want to incrementally grow their sparse-checkout pattern set.
Allow adding patterns using a new 'add' subcommand.
This is not much different from the 'set' subcommand, because we still want to allow the '--stdin
' option and interpret inputs as directories when in cone mode and patterns otherwise.
When in cone mode, we are growing the cone.
This may actually reduce the set of patterns when adding directory A
when A/B
is already a directory in the cone. Test the different cases: siblings, parents, ancestors.
When not in cone mode, we can only assume the patterns should be appended to the sparse-checkout file.
And:
Signed-off-by: Derrick Stolee
When using Windows, a user may run 'git sparse-checkout
set A\B\C' to add the Unix-style path
A/B/C` to their sparse-checkout patterns.
Normalizing the input path converts the backslashes to slashes before we add the string 'A/B/C
' to the recursive hashset.
The sparse-checkout patterns have been forbidden from excluding all paths, leaving an empty working tree, for a long time.
With Git 2.27 (Q2 2020), this limitation has been lifted.
See commit ace224a (04 May 2020) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit e9acbd6, 08 May 2020)
Reported-by: Lars Schneider
Signed-off-by: Derrick Stolee
Remove the error condition when updating the sparse-checkout leaves an empty working directory.
This behavior was added in 9e1afb167 ("sparse checkout: inhibit empty worktree", 2009-08-20, Git v1.7.0-rc0 -- merge).
The comment was added in a7bc906f2 ("Add explanation why we do not allow to sparse checkout to empty working tree", 2011-09-22, Git v1.7.8-rc0 -- merge) in response to a "dubious" comment in 84563a624 ("[
unpack-trees.c](https
://github.com/git/git/blob/ace224ac5fb120e9cae894e31713ab60e91f141f/unpack-trees.c): cosmetic fix", 2010-12-22, Git v1.7.5-rc0 -- merge).
With the recent "cone mode" and "git sparse-checkout init [--cone]
" command, it is common to set a reasonable sparse-checkout pattern set of
/*
!/*/
which matches only files at root. If the repository has no such files, then their "git sparse-checkout init
" command will fail.
Now that we expect this to be a common pattern, we should not have the commands fail on an empty working directory.
If it is a confusing result, then the user can recover with "git sparse-checkout disable
" or "git sparse-checkout set
". This is especially simple when using cone mode.
With Git 2.37 (Q3 2022), deprecate non-cone mode of the sparse-checkout feature.
See commit 5d4b293, commit a8defed, commit 72fa58e, commit 5d295dc, commit 0d86f59, commit 71ceb81, commit f69dfef, commit 2d95707, commit dde1358 (22 Apr 2022) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit 377d347, 03 Jun 2022)
Signed-off-by: Elijah Newren
Make cone mode the default, and update the documentation accordingly.
git config
now includes in its man page:
The "non
cone mode" can be requested to allow specifying a more flexible
patterns by setting this variable to 'false'.
git sparse-checkout
now includes in its man page:
Unless core.sparseCheckoutCone
is explicitly set to false
, Git will
parse the sparse-checkout file expecting patterns of these types. Git will
warn if the patterns do not match. If the patterns do match the expected
format, then Git will use faster hash-based algorithms to compute inclusion
in the sparse-checkout.
And:
Signed-off-by: Elijah Newren
Now that cone mode is the default, we'd like to focus on the arguments to set/add being directories rather than patterns, and it probably makes sense to provide an earlier heads up that files from leading directories get included as well.
git sparse-checkout
now includes in its man page:
By default, the input list is considered a list of directories, matching
the output of git ls-tree -d --name-only
.
This includes interpreting
pathnames that begin with a double quote ("
) as C-style quoted strings.
Note that all files under the specified directories (at any depth) will
be included in the sparse checkout, as well as files that are siblings
of either the given directory or any of its ancestors (see 'CONE PATTERN
SET' below for more details).
In the past, this was not the default,
and --cone
needed to be specified or core.sparseCheckoutCone
needed to be enabled.