Git sparse checkout with exclusion
Asked Answered
R

6

24

According to this thread, exclusion in Git's sparse-checkout feature is supposed to be implemented. Is it?

Assume that I have the following structure:

papers/
papers/...
presentations/
presentations/heavy_presentation
presentations/...

Now I want to exclude presentations/heavy_presentation from the checkout, while leaving the rest in the checkout. I haven't managed to get this running. What's the right syntax for this?

Recess answered 5/3, 2012 at 19:7 Comment(0)
A
9

With Git 2.25 (Q1 2020), Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command.

Git 2.37 (Q3 2022) makes the cone mode the default. See last section of this answer.


First, here is an extended example, starting with a fast clone using a --filter option:

git clone --filter=blob:none --no-checkout https://github.com/git/git
cd git
git sparse-checkout init --cone
# that sets git config core.sparseCheckoutCone true
git read-tree -mu HEAD

Using the cone option (detailed/documented below) means your .git\info\sparse-checkout will include patterns starting with:

/*
!/*/

Meaning: only top files, no subfolder.
If you do not want top file, you need to avoid the cone mode:

# Disablecone mode in .git/config.worktree
git config core.sparseCheckoutCone false

# remove .git\info\sparse-checkout
git sparse-checkout disable

# Add the expected pattern, to include just a subfolder without top files:
git sparse-checkout set /mySubFolder/

# populate working-tree with only the right files:
git read-tree -mu HEAD

I'm just trying to figure out what to pass to git sparse-checkout set to exclude something.

That is trickier.

A workaround might involve including everything else explicitly.

This should result in the presentations directory being included in your sparse checkout, but without the heavy_presentation subdirectory

That would be:

# Initialize the sparse-checkout feature
git sparse-checkout init --cone

# Set the directories you want to include and exclude
git sparse-checkout set presentations/*
git sparse-checkout add '!presentations/heavy_presentation'

In details:

(See more at "Bring your monorepo down to size with sparse-checkout" from Derrick Stolee)

So not only excluding a subfolder does work, but it will work faster with the "cone" mode of a sparse checkout (with Git 2.25).

See commits Merged by Junio C Hamano -- gitster -- in commit bd72a08, 25 Dec 2019:

sparse-checkout: add 'cone' mode

Signed-off-by: Derrick Stolee

The sparse-checkout feature can have quadratic performance as the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can be very significant.

Create a new Boolean config option, core.sparseCheckoutCone, to indicate that we expect the sparse-checkout file to contain a more limited set of patterns.
This is a separate config setting from core.sparseCheckout to avoid breaking older clients by introducing a tri-state option.

The config man page includes:

`core.sparseCheckoutCone`:

Enables the "cone mode" of the sparse checkout feature.
When the sparse-checkout file contains a limited set of patterns, then this mode provides significant performance advantages.

The git sparse-checkout man page details:

CONE PATTERN SET

The full pattern set allows for arbitrary pattern matches and complicated inclusion/exclusion rules.
These can result in O(N*M) pattern matches when updating the index, where N is the number of patterns and M is the number of paths in the index. To combat this performance issue, a more restricted pattern set is allowed when core.spareCheckoutCone is enabled.

The accepted patterns in the cone pattern set are:

  1. Recursive: All paths inside a directory are included.
  2. Parent: All files immediately inside a directory are included.

In addition to the above two patterns, we also expect that all files in the root directory are included. If a recursive pattern is added, then all leading directories are added as parent patterns.

By default, when running git sparse-checkout init, the root directory is added as a parent pattern. At this point, the sparse-checkout file contains the following patterns:

/*
!/*/

This says "include everything in root, but nothing two levels below root."
If we then add the folder A/B/C as a recursive pattern, the folders A and A/B are added as parent patterns.
The resulting sparse-checkout file is now

/*
!/*/
/A/
!/A/*/
/A/B/
!/A/B/*/
/A/B/C/

Here, order matters, so the negative patterns are overridden by the positive patterns that appear lower in the file.

If core.sparseCheckoutCone=true, then Git will parse the sparse-checkout file expecting patterns of these types.
Git will warn if the patterns do not match.
If the patterns do match the expected format, then Git will use faster hash- based algorithms to compute inclusion in the sparse-checkout.

So:

sparse-checkout: init and set in cone mode

Helped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

To make the cone pattern set easy to use, update the behavior of 'git sparse-checkout (init|set)'.

Add '--cone' flag to 'git sparse-checkout init' to set the config option 'core.sparseCheckoutCone=true'.

When running 'git sparse-checkout set' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories.


Note, the --cone option is only documented in Git 2.26 (Q1 2020)
(Merged by Junio C Hamano -- gitster -- in commit ea46d90, 05 Feb 2020)

doc: sparse-checkout: mention --cone option

Signed-off-by: Matheus Tavares
Acked-by: Derrick Stolee

In af09ce2 ("sparse-checkout: init and set in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge), the '--cone' option was added to 'git sparse-checkout init'.

Document it in git sparse-checkout:

That includes:

When --cone is provided, the core.sparseCheckoutCone setting is also set, allowing for better performance with a limited set of patterns.

("set of patterns" presented above, in the "CONE PATTERN SET" section of this answer)


How much faster this new "cone" mode would be?

sparse-checkout: use hashmaps for cone patterns

Helped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match.

As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here:

/$folder/
!/$folder/*/

This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following performance:

    core.sparseCheckout=false: 0.21 s (0.00 s)
    core.sparseCheckout=true : 3.75 s (3.50 s)
core.sparseCheckoutCone=true : 0.23 s (0.01 s)

The times in parentheses above correspond to the time spent in the first clear_ce_flags() call, according to the trace2 performance traces.

While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature.

And:

sparse-checkout: respect core.ignoreCase in cone mode

Signed-off-by: Derrick Stolee

When a user uses the sparse-checkout feature in cone mode, they add patterns using "git sparse-checkout set <dir1> <dir2> ..." or by using "--stdin" to provide the directories line-by-line over stdin.
This behaviour naturally looks a lot like the way a user would type "git add <dir1> <dir2> ..."

If core.ignoreCase is enabled, then "git add" will match the input using a case-insensitive match.
Do the same for the sparse-checkout feature.

Perform case-insensitive checks while updating the skip-worktree bits during unpack_trees(). This is done by changing the hash algorithm and hashmap comparison methods to optionally use case- insensitive methods.

When this is enabled, there is a small performance cost in the hashing algorithm.
To tease out the worst possible case, the following was run on a repo with a deep directory structure:

git ls-tree -d -r --name-only HEAD |
git sparse-checkout set --stdin

The 'set' command was timed with core.ignoreCase disabled or enabled.
For the repo with a deep history, the numbers were

core.ignoreCase=false: 62s
core.ignoreCase=true:  74s (+19.3%)

For reproducibility, the equivalent test on the Linux kernel repository had these numbers:

core.ignoreCase=false: 3.1s
core.ignoreCase=true:  3.6s (+16%)

Now, this is not an entirely fair comparison, as most users will define their sparse cone using more shallow directories, and the performance improvement from eb42feca97 ("unpack-trees: hash less in cone mode" 2019-11-21, Git 2.25-rc0) can remove most of the hash cost. For a more realistic test, drop the "-r" from the ls-tree command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in each case, and the deep repository takes one second, plus or minus 0.05s, in each case.

Thus, we can demonstrate a cost to this change, but it is unlikely to matter to any reasonable sparse-checkout cone.


With Git 2.25 (Q1 2020), "git sparse-checkout list" subcommand learned to give its output in a more concise form when the "cone" mode is in effect.

See commit 4fd683b, commit de11951 (30 Dec 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit c20d4fd, 06 Jan 2020)

sparse-checkout: list directories in cone mode

Signed-off-by: Derrick Stolee

When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set' command takes a list of directories as input, then creates an ordered list of sparse-checkout patterns such that those directories are recursively included and all sibling entries along the parent directories are also included.
Listing the patterns is less user-friendly than the directories themselves.

In cone mode, and as long as the patterns match the expected cone-mode pattern types, change the output of 'git sparse-checkout list' to only show the directories that created the patterns.

With this change, the following piped commands would not change the working directory:

git sparse-checkout list | git sparse-checkout set --stdin

The only time this would not work is if core.sparseCheckoutCone is true, but the sparse-checkout file contains patterns that do not match the expected pattern types for cone mode.


The code recently added in this release to move to the entry beyond the ones in the same directory in the index in the sparse-cone mode did not count the number of entries to skip over incorrectly, which has been corrected, with Git 2.25.1 (Feb. 2020).

See commit 7210ca4 (27 Jan 2020) by Junio C Hamano (gitster).
See commit 4c6c797 (10 Jan 2020) by Derrick Stolee via GitGitGadget (``).
(Merged by Junio C Hamano -- gitster -- in commit 043426c, 30 Jan 2020)

unpack-trees: correctly compute result count

Reported-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

The clear_ce_flags_dir() method processes the cache entries within a common directory. The returned int is the number of cache entries processed by that directory.
When using the sparse-checkout feature in cone mode, we can skip the pattern matching for entries in the directories that are entirely included or entirely excluded.

eb42feca ("unpack-trees: hash less in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge listed in batch #0) introduced this performance feature. The old mechanism relied on the counts returned by calling clear_ce_flags_1(), but the new mechanism calculated the number of rows by subtracting "cache_end" from "cache" to find the size of the range.
However, the equation is wrong because it divides by sizeof(struct cache_entry *). This is not how pointer arithmetic works!

A coverity build of Git for Windows in preparation for the 2.25.0 release found this issue with the warning:

Pointer differences, such as `cache_end` - cache, are automatically 
scaled down by the size (8 bytes) of the pointed-to type (struct `cache_entry` *). 
Most likely, the division by sizeof(struct `cache_entry` *) is extraneous 
and should be eliminated.

This warning is correct.


With Git 2.26 (Q1 2020), some rough edges in the sparse-checkout feature, especially around the cone mode, have been cleaned up.

See commit f998a3f, commit d2e65f4, commit e53ffe2, commit e55682e, commit bd64de4, commit d585f0e, commit 4f52c2c, commit 9abc60f (31 Jan 2020), and commit 9e6d3e6, commit 41de0c6, commit 47dbf10, commit 3c75406, commit d622c34, commit 522e641 (24 Jan 2020) by Derrick Stolee (derrickstolee).
See commit 7aa9ef2 (24 Jan 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 433b8aa, 14 Feb 2020)

sparse-checkout: fix cone mode behavior mismatch

Reported-by: Finn Bryant
Signed-off-by: Derrick Stolee

The intention of the special "cone mode" in the sparse-checkout feature is to always match the same patterns that are matched by the same sparse-checkout file as when cone mode is disabled.

When a file path is given to "git sparse-checkout set" in cone mode, then the cone mode improperly matches the file as a recursive path.
When setting the skip-worktree bits, files were not expecting the MATCHED_RECURSIVE response, and hence these were left out of the matched cone.

Fix this bug by checking for MATCHED_RECURSIVE in addition to MATCHED and add a test that prevents regression.

The documentation now includes:

When core.sparseCheckoutCone is enabled, the input list is considered a list of directories instead of sparse-checkout patterns.
The command writes patterns to the sparse-checkout file to include all files contained in those directories (recursively) as well as files that are siblings of ancestor directories.
The input format matches the output of git ls-tree --name-only. This includes interpreting pathnames that begin with a double quote (") as C-style quoted strings.


With Git 2.26 (Q1 2020), "git sparse-checkout" learned a new "add" subcommand.

See commit 6c11c6a (20 Feb 2020), and commit ef07659, commit 2631dc8, commit 4bf0c06, commit 6fb705a (11 Feb 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit f4d7dfc, 05 Mar 2020)

sparse-checkout: create 'add' subcommand

Signed-off-by: Derrick Stolee

When using the sparse-checkout feature, a user may want to incrementally grow their sparse-checkout pattern set.
Allow adding patterns using a new 'add' subcommand.

This is not much different from the 'set' subcommand, because we still want to allow the '--stdin' option and interpret inputs as directories when in cone mode and patterns otherwise.

When in cone mode, we are growing the cone.
This may actually reduce the set of patterns when adding directory A when A/B is already a directory in the cone. Test the different cases: siblings, parents, ancestors.

When not in cone mode, we can only assume the patterns should be appended to the sparse-checkout file.

And:

sparse-checkout: work with Windows paths

Signed-off-by: Derrick Stolee

When using Windows, a user may run 'git sparse-checkout set A\B\C' to add the Unix-style path A/B/C` to their sparse-checkout patterns.

Normalizing the input path converts the backslashes to slashes before we add the string 'A/B/C' to the recursive hashset.


The sparse-checkout patterns have been forbidden from excluding all paths, leaving an empty working tree, for a long time.

With Git 2.27 (Q2 2020), this limitation has been lifted.

See commit ace224a (04 May 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit e9acbd6, 08 May 2020)

sparse-checkout: stop blocking empty workdirs

Reported-by: Lars Schneider
Signed-off-by: Derrick Stolee

Remove the error condition when updating the sparse-checkout leaves an empty working directory.

This behavior was added in 9e1afb167 ("sparse checkout: inhibit empty worktree", 2009-08-20, Git v1.7.0-rc0 -- merge).

The comment was added in a7bc906f2 ("Add explanation why we do not allow to sparse checkout to empty working tree", 2011-09-22, Git v1.7.8-rc0 -- merge) in response to a "dubious" comment in 84563a624 ("[unpack-trees.c](https://github.com/git/git/blob/ace224ac5fb120e9cae894e31713ab60e91f141f/unpack-trees.c): cosmetic fix", 2010-12-22, Git v1.7.5-rc0 -- merge).

With the recent "cone mode" and "git sparse-checkout init [--cone]" command, it is common to set a reasonable sparse-checkout pattern set of

/*
!/*/

which matches only files at root. If the repository has no such files, then their "git sparse-checkout init" command will fail.

Now that we expect this to be a common pattern, we should not have the commands fail on an empty working directory.
If it is a confusing result, then the user can recover with "git sparse-checkout disable" or "git sparse-checkout set". This is especially simple when using cone mode.


With Git 2.37 (Q3 2022), deprecate non-cone mode of the sparse-checkout feature.

See commit 5d4b293, commit a8defed, commit 72fa58e, commit 5d295dc, commit 0d86f59, commit 71ceb81, commit f69dfef, commit 2d95707, commit dde1358 (22 Apr 2022) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 377d347, 03 Jun 2022)

sparse-checkout: make --cone the default

Signed-off-by: Elijah Newren

Make cone mode the default, and update the documentation accordingly.

git config now includes in its man page:

The "non cone mode" can be requested to allow specifying a more flexible patterns by setting this variable to 'false'.

git sparse-checkout now includes in its man page:

Unless core.sparseCheckoutCone is explicitly set to false, Git will parse the sparse-checkout file expecting patterns of these types. Git will warn if the patterns do not match. If the patterns do match the expected format, then Git will use faster hash-based algorithms to compute inclusion in the sparse-checkout.

And:

git-sparse-checkout.txt: wording updates for the cone mode default

Signed-off-by: Elijah Newren

Now that cone mode is the default, we'd like to focus on the arguments to set/add being directories rather than patterns, and it probably makes sense to provide an earlier heads up that files from leading directories get included as well.

git sparse-checkout now includes in its man page:

By default, the input list is considered a list of directories, matching the output of git ls-tree -d --name-only.
This includes interpreting pathnames that begin with a double quote (") as C-style quoted strings.

Note that all files under the specified directories (at any depth) will be included in the sparse checkout, as well as files that are siblings of either the given directory or any of its ancestors (see 'CONE PATTERN SET' below for more details).

In the past, this was not the default, and --cone needed to be specified or core.sparseCheckoutCone needed to be enabled.

Austria answered 28/12, 2019 at 22:32 Comment(4)
you paste the whole doc in there as if this whole thing was needed to answer the question....Sixfold
Remove your copy/paste of the docs, and link to it instead. That's an unnecessary bifurcation of information. We're in the business of organizing information, not forking it.Lycanthropy
I dedicated a good portion of my life to reading this answer twice and I swear it ignores the question and just pastes every part of the Git changelog that mentions sparse-checkout. I'm just trying to figure out what to pass to git sparse-checkout set to exclude something. I swear it doesn't support it and you need to edit the info file manuallyToy
@MichaelMrozek Good question. I have edited the answer to include the known workaround.Austria
I
10

Sadly none of the above worked for me so I spent very long time trying different combination of sparse-checkout file.

In my case I wanted to skip folders with IntelliJ IDEA configs.

Here is what I did:


Run git clone https://github.com/myaccount/myrepo.git --no-checkout

Run git config core.sparsecheckout true

Created .git\info\sparse-checkout with following content

!.idea/*
!.idea_modules/*
/*

Run 'git checkout --' to get all files.


Critical thing to make it work was to add /* after folder's name.

I have git 1.9

Iconoduly answered 30/9, 2014 at 20:41 Comment(0)
A
9

With Git 2.25 (Q1 2020), Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command.

Git 2.37 (Q3 2022) makes the cone mode the default. See last section of this answer.


First, here is an extended example, starting with a fast clone using a --filter option:

git clone --filter=blob:none --no-checkout https://github.com/git/git
cd git
git sparse-checkout init --cone
# that sets git config core.sparseCheckoutCone true
git read-tree -mu HEAD

Using the cone option (detailed/documented below) means your .git\info\sparse-checkout will include patterns starting with:

/*
!/*/

Meaning: only top files, no subfolder.
If you do not want top file, you need to avoid the cone mode:

# Disablecone mode in .git/config.worktree
git config core.sparseCheckoutCone false

# remove .git\info\sparse-checkout
git sparse-checkout disable

# Add the expected pattern, to include just a subfolder without top files:
git sparse-checkout set /mySubFolder/

# populate working-tree with only the right files:
git read-tree -mu HEAD

I'm just trying to figure out what to pass to git sparse-checkout set to exclude something.

That is trickier.

A workaround might involve including everything else explicitly.

This should result in the presentations directory being included in your sparse checkout, but without the heavy_presentation subdirectory

That would be:

# Initialize the sparse-checkout feature
git sparse-checkout init --cone

# Set the directories you want to include and exclude
git sparse-checkout set presentations/*
git sparse-checkout add '!presentations/heavy_presentation'

In details:

(See more at "Bring your monorepo down to size with sparse-checkout" from Derrick Stolee)

So not only excluding a subfolder does work, but it will work faster with the "cone" mode of a sparse checkout (with Git 2.25).

See commits Merged by Junio C Hamano -- gitster -- in commit bd72a08, 25 Dec 2019:

sparse-checkout: add 'cone' mode

Signed-off-by: Derrick Stolee

The sparse-checkout feature can have quadratic performance as the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can be very significant.

Create a new Boolean config option, core.sparseCheckoutCone, to indicate that we expect the sparse-checkout file to contain a more limited set of patterns.
This is a separate config setting from core.sparseCheckout to avoid breaking older clients by introducing a tri-state option.

The config man page includes:

`core.sparseCheckoutCone`:

Enables the "cone mode" of the sparse checkout feature.
When the sparse-checkout file contains a limited set of patterns, then this mode provides significant performance advantages.

The git sparse-checkout man page details:

CONE PATTERN SET

The full pattern set allows for arbitrary pattern matches and complicated inclusion/exclusion rules.
These can result in O(N*M) pattern matches when updating the index, where N is the number of patterns and M is the number of paths in the index. To combat this performance issue, a more restricted pattern set is allowed when core.spareCheckoutCone is enabled.

The accepted patterns in the cone pattern set are:

  1. Recursive: All paths inside a directory are included.
  2. Parent: All files immediately inside a directory are included.

In addition to the above two patterns, we also expect that all files in the root directory are included. If a recursive pattern is added, then all leading directories are added as parent patterns.

By default, when running git sparse-checkout init, the root directory is added as a parent pattern. At this point, the sparse-checkout file contains the following patterns:

/*
!/*/

This says "include everything in root, but nothing two levels below root."
If we then add the folder A/B/C as a recursive pattern, the folders A and A/B are added as parent patterns.
The resulting sparse-checkout file is now

/*
!/*/
/A/
!/A/*/
/A/B/
!/A/B/*/
/A/B/C/

Here, order matters, so the negative patterns are overridden by the positive patterns that appear lower in the file.

If core.sparseCheckoutCone=true, then Git will parse the sparse-checkout file expecting patterns of these types.
Git will warn if the patterns do not match.
If the patterns do match the expected format, then Git will use faster hash- based algorithms to compute inclusion in the sparse-checkout.

So:

sparse-checkout: init and set in cone mode

Helped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

To make the cone pattern set easy to use, update the behavior of 'git sparse-checkout (init|set)'.

Add '--cone' flag to 'git sparse-checkout init' to set the config option 'core.sparseCheckoutCone=true'.

When running 'git sparse-checkout set' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories.


Note, the --cone option is only documented in Git 2.26 (Q1 2020)
(Merged by Junio C Hamano -- gitster -- in commit ea46d90, 05 Feb 2020)

doc: sparse-checkout: mention --cone option

Signed-off-by: Matheus Tavares
Acked-by: Derrick Stolee

In af09ce2 ("sparse-checkout: init and set in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge), the '--cone' option was added to 'git sparse-checkout init'.

Document it in git sparse-checkout:

That includes:

When --cone is provided, the core.sparseCheckoutCone setting is also set, allowing for better performance with a limited set of patterns.

("set of patterns" presented above, in the "CONE PATTERN SET" section of this answer)


How much faster this new "cone" mode would be?

sparse-checkout: use hashmaps for cone patterns

Helped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match.

As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here:

/$folder/
!/$folder/*/

This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following performance:

    core.sparseCheckout=false: 0.21 s (0.00 s)
    core.sparseCheckout=true : 3.75 s (3.50 s)
core.sparseCheckoutCone=true : 0.23 s (0.01 s)

The times in parentheses above correspond to the time spent in the first clear_ce_flags() call, according to the trace2 performance traces.

While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature.

And:

sparse-checkout: respect core.ignoreCase in cone mode

Signed-off-by: Derrick Stolee

When a user uses the sparse-checkout feature in cone mode, they add patterns using "git sparse-checkout set <dir1> <dir2> ..." or by using "--stdin" to provide the directories line-by-line over stdin.
This behaviour naturally looks a lot like the way a user would type "git add <dir1> <dir2> ..."

If core.ignoreCase is enabled, then "git add" will match the input using a case-insensitive match.
Do the same for the sparse-checkout feature.

Perform case-insensitive checks while updating the skip-worktree bits during unpack_trees(). This is done by changing the hash algorithm and hashmap comparison methods to optionally use case- insensitive methods.

When this is enabled, there is a small performance cost in the hashing algorithm.
To tease out the worst possible case, the following was run on a repo with a deep directory structure:

git ls-tree -d -r --name-only HEAD |
git sparse-checkout set --stdin

The 'set' command was timed with core.ignoreCase disabled or enabled.
For the repo with a deep history, the numbers were

core.ignoreCase=false: 62s
core.ignoreCase=true:  74s (+19.3%)

For reproducibility, the equivalent test on the Linux kernel repository had these numbers:

core.ignoreCase=false: 3.1s
core.ignoreCase=true:  3.6s (+16%)

Now, this is not an entirely fair comparison, as most users will define their sparse cone using more shallow directories, and the performance improvement from eb42feca97 ("unpack-trees: hash less in cone mode" 2019-11-21, Git 2.25-rc0) can remove most of the hash cost. For a more realistic test, drop the "-r" from the ls-tree command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in each case, and the deep repository takes one second, plus or minus 0.05s, in each case.

Thus, we can demonstrate a cost to this change, but it is unlikely to matter to any reasonable sparse-checkout cone.


With Git 2.25 (Q1 2020), "git sparse-checkout list" subcommand learned to give its output in a more concise form when the "cone" mode is in effect.

See commit 4fd683b, commit de11951 (30 Dec 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit c20d4fd, 06 Jan 2020)

sparse-checkout: list directories in cone mode

Signed-off-by: Derrick Stolee

When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set' command takes a list of directories as input, then creates an ordered list of sparse-checkout patterns such that those directories are recursively included and all sibling entries along the parent directories are also included.
Listing the patterns is less user-friendly than the directories themselves.

In cone mode, and as long as the patterns match the expected cone-mode pattern types, change the output of 'git sparse-checkout list' to only show the directories that created the patterns.

With this change, the following piped commands would not change the working directory:

git sparse-checkout list | git sparse-checkout set --stdin

The only time this would not work is if core.sparseCheckoutCone is true, but the sparse-checkout file contains patterns that do not match the expected pattern types for cone mode.


The code recently added in this release to move to the entry beyond the ones in the same directory in the index in the sparse-cone mode did not count the number of entries to skip over incorrectly, which has been corrected, with Git 2.25.1 (Feb. 2020).

See commit 7210ca4 (27 Jan 2020) by Junio C Hamano (gitster).
See commit 4c6c797 (10 Jan 2020) by Derrick Stolee via GitGitGadget (``).
(Merged by Junio C Hamano -- gitster -- in commit 043426c, 30 Jan 2020)

unpack-trees: correctly compute result count

Reported-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

The clear_ce_flags_dir() method processes the cache entries within a common directory. The returned int is the number of cache entries processed by that directory.
When using the sparse-checkout feature in cone mode, we can skip the pattern matching for entries in the directories that are entirely included or entirely excluded.

eb42feca ("unpack-trees: hash less in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge listed in batch #0) introduced this performance feature. The old mechanism relied on the counts returned by calling clear_ce_flags_1(), but the new mechanism calculated the number of rows by subtracting "cache_end" from "cache" to find the size of the range.
However, the equation is wrong because it divides by sizeof(struct cache_entry *). This is not how pointer arithmetic works!

A coverity build of Git for Windows in preparation for the 2.25.0 release found this issue with the warning:

Pointer differences, such as `cache_end` - cache, are automatically 
scaled down by the size (8 bytes) of the pointed-to type (struct `cache_entry` *). 
Most likely, the division by sizeof(struct `cache_entry` *) is extraneous 
and should be eliminated.

This warning is correct.


With Git 2.26 (Q1 2020), some rough edges in the sparse-checkout feature, especially around the cone mode, have been cleaned up.

See commit f998a3f, commit d2e65f4, commit e53ffe2, commit e55682e, commit bd64de4, commit d585f0e, commit 4f52c2c, commit 9abc60f (31 Jan 2020), and commit 9e6d3e6, commit 41de0c6, commit 47dbf10, commit 3c75406, commit d622c34, commit 522e641 (24 Jan 2020) by Derrick Stolee (derrickstolee).
See commit 7aa9ef2 (24 Jan 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 433b8aa, 14 Feb 2020)

sparse-checkout: fix cone mode behavior mismatch

Reported-by: Finn Bryant
Signed-off-by: Derrick Stolee

The intention of the special "cone mode" in the sparse-checkout feature is to always match the same patterns that are matched by the same sparse-checkout file as when cone mode is disabled.

When a file path is given to "git sparse-checkout set" in cone mode, then the cone mode improperly matches the file as a recursive path.
When setting the skip-worktree bits, files were not expecting the MATCHED_RECURSIVE response, and hence these were left out of the matched cone.

Fix this bug by checking for MATCHED_RECURSIVE in addition to MATCHED and add a test that prevents regression.

The documentation now includes:

When core.sparseCheckoutCone is enabled, the input list is considered a list of directories instead of sparse-checkout patterns.
The command writes patterns to the sparse-checkout file to include all files contained in those directories (recursively) as well as files that are siblings of ancestor directories.
The input format matches the output of git ls-tree --name-only. This includes interpreting pathnames that begin with a double quote (") as C-style quoted strings.


With Git 2.26 (Q1 2020), "git sparse-checkout" learned a new "add" subcommand.

See commit 6c11c6a (20 Feb 2020), and commit ef07659, commit 2631dc8, commit 4bf0c06, commit 6fb705a (11 Feb 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit f4d7dfc, 05 Mar 2020)

sparse-checkout: create 'add' subcommand

Signed-off-by: Derrick Stolee

When using the sparse-checkout feature, a user may want to incrementally grow their sparse-checkout pattern set.
Allow adding patterns using a new 'add' subcommand.

This is not much different from the 'set' subcommand, because we still want to allow the '--stdin' option and interpret inputs as directories when in cone mode and patterns otherwise.

When in cone mode, we are growing the cone.
This may actually reduce the set of patterns when adding directory A when A/B is already a directory in the cone. Test the different cases: siblings, parents, ancestors.

When not in cone mode, we can only assume the patterns should be appended to the sparse-checkout file.

And:

sparse-checkout: work with Windows paths

Signed-off-by: Derrick Stolee

When using Windows, a user may run 'git sparse-checkout set A\B\C' to add the Unix-style path A/B/C` to their sparse-checkout patterns.

Normalizing the input path converts the backslashes to slashes before we add the string 'A/B/C' to the recursive hashset.


The sparse-checkout patterns have been forbidden from excluding all paths, leaving an empty working tree, for a long time.

With Git 2.27 (Q2 2020), this limitation has been lifted.

See commit ace224a (04 May 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit e9acbd6, 08 May 2020)

sparse-checkout: stop blocking empty workdirs

Reported-by: Lars Schneider
Signed-off-by: Derrick Stolee

Remove the error condition when updating the sparse-checkout leaves an empty working directory.

This behavior was added in 9e1afb167 ("sparse checkout: inhibit empty worktree", 2009-08-20, Git v1.7.0-rc0 -- merge).

The comment was added in a7bc906f2 ("Add explanation why we do not allow to sparse checkout to empty working tree", 2011-09-22, Git v1.7.8-rc0 -- merge) in response to a "dubious" comment in 84563a624 ("[unpack-trees.c](https://github.com/git/git/blob/ace224ac5fb120e9cae894e31713ab60e91f141f/unpack-trees.c): cosmetic fix", 2010-12-22, Git v1.7.5-rc0 -- merge).

With the recent "cone mode" and "git sparse-checkout init [--cone]" command, it is common to set a reasonable sparse-checkout pattern set of

/*
!/*/

which matches only files at root. If the repository has no such files, then their "git sparse-checkout init" command will fail.

Now that we expect this to be a common pattern, we should not have the commands fail on an empty working directory.
If it is a confusing result, then the user can recover with "git sparse-checkout disable" or "git sparse-checkout set". This is especially simple when using cone mode.


With Git 2.37 (Q3 2022), deprecate non-cone mode of the sparse-checkout feature.

See commit 5d4b293, commit a8defed, commit 72fa58e, commit 5d295dc, commit 0d86f59, commit 71ceb81, commit f69dfef, commit 2d95707, commit dde1358 (22 Apr 2022) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 377d347, 03 Jun 2022)

sparse-checkout: make --cone the default

Signed-off-by: Elijah Newren

Make cone mode the default, and update the documentation accordingly.

git config now includes in its man page:

The "non cone mode" can be requested to allow specifying a more flexible patterns by setting this variable to 'false'.

git sparse-checkout now includes in its man page:

Unless core.sparseCheckoutCone is explicitly set to false, Git will parse the sparse-checkout file expecting patterns of these types. Git will warn if the patterns do not match. If the patterns do match the expected format, then Git will use faster hash-based algorithms to compute inclusion in the sparse-checkout.

And:

git-sparse-checkout.txt: wording updates for the cone mode default

Signed-off-by: Elijah Newren

Now that cone mode is the default, we'd like to focus on the arguments to set/add being directories rather than patterns, and it probably makes sense to provide an earlier heads up that files from leading directories get included as well.

git sparse-checkout now includes in its man page:

By default, the input list is considered a list of directories, matching the output of git ls-tree -d --name-only.
This includes interpreting pathnames that begin with a double quote (") as C-style quoted strings.

Note that all files under the specified directories (at any depth) will be included in the sparse checkout, as well as files that are siblings of either the given directory or any of its ancestors (see 'CONE PATTERN SET' below for more details).

In the past, this was not the default, and --cone needed to be specified or core.sparseCheckoutCone needed to be enabled.

Austria answered 28/12, 2019 at 22:32 Comment(4)
you paste the whole doc in there as if this whole thing was needed to answer the question....Sixfold
Remove your copy/paste of the docs, and link to it instead. That's an unnecessary bifurcation of information. We're in the business of organizing information, not forking it.Lycanthropy
I dedicated a good portion of my life to reading this answer twice and I swear it ignores the question and just pastes every part of the Git changelog that mentions sparse-checkout. I'm just trying to figure out what to pass to git sparse-checkout set to exclude something. I swear it doesn't support it and you need to edit the info file manuallyToy
@MichaelMrozek Good question. I have edited the answer to include the known workaround.Austria
P
8

I would have expected something like the below to work:

/*
!presentations/heavy_presentation

But it doesn't. And I did try many other combinations. I think the exclude is not implemented properly and there are bugs around it (still)

Something like:

presentations/*
!presentations/heavy_presentation

does work though and you will get the presentations folder without the heavy_presentation folder.

So the workaround would be to include everything else explicitly.

Poetize answered 5/3, 2012 at 23:42 Comment(2)
Thanks, confirmed. I have edited your post to add another example that was not working.Recess
Your first solution worked for me in git 2.21.0 on windowsThreewheeler
R
6

With Git 2.37 (released in June 2022) it is much easier. To exclude one folder and a few files matching a mask (just to provide more general/helpful example than the question asks) I did this:

git sparse-checkout set --no-cone "/*" "!/folder/" "!/path/to/dist/*.map"

This worked quite intuitively (well, after a few hours spent to find this formula). The folder completely disappears, all the *.map files from path/to/dist folder, too. Nothing else was touched.

A few important bits:

  1. I strongly suggest to backup your local repo before starting if it has any unstaged/ignored files. My first try (without "/*" etc.) was scary - as if most of my data disappeared. #5 below seemed to help to restore everything, but you never know for sure with a big repo...

  2. "/*" was the magic piece. It asks GIT to include everything not excluded later on. It doesn't work without it (removing lots of repo contents). It must come first in the list!

  3. You may need set +H for the command to get through (bash treats ! as a special command). And set -H afterwards to restore the default bash behaviour.

  4. I recommend to check what is GIT's interpretation of the paths you used by typing:

    cat .git/info/sparse-checkout

    Before finding the "formula" for my case I was surprised with the results quite a few times (e.g. see #6).

  5. Do ls for a few repo paths after running the command. If things go wrong, then git sparse-checkout disable should restore all the missing files. At least this worked very well in my case.

  6. Better use quotes for all your paths. Especially important in "/*"! Here is what I got in .git/info/sparse-checkout when I used it without quotes (each from new line, for some reason stackoverflow doesn't format that well):

    /bin /dev /etc /home /lib /lib64 /opt /proc /root /run /sbin /tmp /usr /var !folder/ !path/to/dist/*.map

    You can imagine that these patterns weren't what I wanted to say...

  7. Mind leading slashes everywhere ("!/folder/"). If omitted ("!folder/") then folders with such a name will be deleted everywhere in the hierarchy, not just on the top level.

  8. --no-cone is now important. This was the default mode in the past, and this may introduce lots of confusion when looking at older advice over the internet! GIT docs elaborate on that if you want to understand things better.

Hope this helps someone.


Update: Added leading slashes to the excluded paths, explained in #7 above.

Rachellerachis answered 28/1, 2023 at 0:51 Comment(5)
Good feedback, that seems easier than the git sparse-checkout command as I initially reported back in 2019.Austria
Exclamation marks are allowed with the single quote ', e.g., '!/folder/', even without set +H.Recess
What pattern would you use to ignore files in top level of repository using --no-cone mode?Ensilage
Try "!/file.txt". I think it should work.Rachellerachis
I also spent an hour reading up on this, but one question is still unanswered: Will sparse checkout only remove unwanted files/folders from checkout, or also the objects in the .git folder?Coaxial
M
4

I had the same problem. I fixed it with something like:

!presentations/heavy_presentation
presentations/*

How I understand that it works: It reads the file rule by rule. If something is included, it includes all paths that contain that word, and it doesn't change its status anymore until the end of the sparse checkout. If you add exclude rule before include, in my opinion it will delete the files first and than mark all as included.

I am not completely sure, this is what I have supposed based on my experience and has been working for me. I hope it will help someone.

Mihe answered 8/3, 2013 at 14:11 Comment(0)
G
1

Short Answer:

git sparse-checkout set /* !/presentations/heavy_presentation/
git sparse-checkout init [--cone]

--cone Option: Not relevant for only few patterns / small repo, but for speeding up in general. Requires a certain canonical order of the patterns as explained by the sparse-checkout / CONE PATTERN SET documentation). Can be introduced later also by:

git config core.sparseCheckoutCone true
Gomphosis answered 22/2, 2021 at 18:40 Comment(1)
This won't work currently (I'm on Git 2.37 now), at least without quotes. /* is now expanded into /bin /dev /etc /home and so on. Also, the init command is now deprecated. I added a more detailed answer with what worked for me currently: https://mcmap.net/q/12961/-git-sparse-checkout-with-exclusionRachellerachis

© 2022 - 2024 — McMap. All rights reserved.