The new git sparse-checkout
command (introduced in Git 2.25 (Q1 2020) comes from a Microsoft contribution based on its Scalar project
At Microsoft, we support the Windows OS repository using VFS for Git (formerly GVFS). VFS for Git uses a virtualized filesystem to bypass many assumptions about repository size, enabling the Windows developers to use Git at a scale previously thought impossible.
While supporting VFS for Git, we identified performance bottlenecks using a custom trace system and collecting user feedback.
We made several contributions to the Git client, including the commit-graph file and improvements to git push
and sparse-checkout
.
Building on these contributions and many other recent improvements to Git, we began a project to support very large repositories without needing a virtualized filesystem.
Hence the Scalar project, which has transitioned (mid 2021) from a modified version of VFS for Git into a thin shell around core Git features.
The Scalar executable has now been ported to be included in the microsoft/git fork.
It is integrated with Git for Windows 2.38 (Oct. 2022)
The 2020 article "Bring your monorepo down to size with sparse-checkout" from Derrick Stolee explains how sparse checkout is managed nowodays (2020+)
Using sparse-checkout
with an existing repository
To restrict your working directory to a set of directories, run the following commands:
git sparse-checkout init --cone
git sparse-checkout set <dir1> <dir2> ...
If you get stuck, run git sparse-checkout disable
to return to a full working directory.
The init
subcommand sets the necessary Git config options and fills the sparse-checkout
file with patterns that mean "only match files in the root directory".
The set
subcommand modifies the sparse-checkout
file with patterns to match the files in the given directories.
Further, any files that are immediately in a directory that’s a parent to a specified directory are also included.
For example, if you ran git sparse-checkout set A/B
, then Git would include files with names A/B/C.txt
(immediate child of A/B
) and A/D.txt
(immediate sibling of A/B
) as well as E.txt
(immediate sibling of A
).
For instance:
The team building the Android app can usually get away with only the files in client/android
and run all integration testing with the currently-deployed services.
The Android team needs a much smaller set of files as they work.
This means they can use the git sparse-checkout set
command to restrict to that directory:
$ git sparse-checkout set client/android
$ ls
bootstrap.sh* client/ LICENSE.md README.md
$ ls client/
android/
$ find . -type f | wc -l
62
git sparse-checkout
uses a sparse index since Git 2.32 (Q1 2021).
See the article "Make your monorepo feel small with Git’s sparse index" from Derrick Stolee.
The sparse index differs from a normal “full” index in one aspect: it can store directory paths with the object ID for its tree object.
This is in addition to the file paths which are paired with blob objects.
Since the cone mode sparse-checkout patterns match on a directory level, we can determine that an entire directory is out of the sparse-checkout cone and replace all of its contained file paths with a single directory path.
The sparse directory entries correspond to directories that are just outside of the sparse-checkout definition.
These directories also have a cache-tree node whose range is only one entry: that sparse directory entry.
With Git 2.36 (Q2 2022), "git update-index
"(man), "git checkout-index
"(man), and "git clean
"(man) are taught to work better with the sparse checkout feature.
See commit b9ca5e2, commit c35e9f5, commit e015d4d, commit 35682ad, commit 88078f5, commit b553ef6, commit 1e9e10e, commit 1624333, commit bb01b26 (11 Jan 2022) by Victoria Dye (vdye
).
(Merged by Junio C Hamano -- gitster
-- in commit 2f45f3e, 17 Feb 2022)
update-index
: integrate with sparse index
Signed-off-by: Victoria Dye
Reviewed-by: Elijah Newren
Enable use of the sparse index with update-index
.
Most variations of update-index
work without explicitly expanding the index or making any other updates in or outside of update-index.c
.
The one usage requiring additional changes is --cacheinfo
; if a file inside a sparse directory was specified, the index would not be expanded until after the cache tree is invalidated, leading to a mismatch between the index and cache tree.
This scenario is handled by rearranging add_index_entry_with_check
, allowing index_name_stage_pos
to expand the index before attempting to invalidate the relevant cache tree path, avoiding cache tree/index corruption.
With Git 2.36 (Q2 2022), the git sparse-checkout cone patterns are better controlled.
See commit 8dd7c47, commit 4ce5043, commit bb8b5e9, commit d526b4d, commit f748012 (19 Feb 2022) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit 9671764, 06 Mar 2022)
sparse-checkout
: reject arguments in cone-mode that look like patterns
Reviewed-by: Derrick Stolee
Signed-off-by: Elijah Newren
In sparse-checkout add/set
under cone
mode, the arguments passed are supposed to be directories rather than gitignore-style
patterns.
However, given the amount of effort spent in the manual discussing patterns, it is easy for users to assume they need to pass patterns such as
/foo/*
or
!/bar/*/
or perhaps they really do ignore the directory rule and specify a random gitignore-style
pattern like
*.c
To help catch such mistakes, throw an error if any of the positional arguments:
* starts with any of '/!'
* contains any of '*?[]'
Inform users they can pass --skip-checks
if they have a directory that really does have such special characters in its name.
(We exclude '' because of sparse-checkout's special handling of backslashes; see the MINGW test in t1091.46.)
And, still with 2.36:
With Git 2.36 (Q2 2022), further polishing of git sparse-checkout".
See commit 8dd7c47, commit 4ce5043, commit bb8b5e9, commit d526b4d, commit f748012 (19 Feb 2022) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit 9671764, 06 Mar 2022)
sparse-checkout
: pay attention to prefix for {set, add}
Helped-by: Junio Hamano
Reviewed-by: Derrick Stolee
Signed-off-by: Elijah Newren
In cone mode, non-option arguments to set & add are clearly paths, and as such, we should pay attention to prefix.
In non-cone mode, it is not clear that folks intend to provide paths since the inputs are gitignore-style
patterns.
Paying attention to prefix would prevent folks from doing things like
git sparse-checkout add /.gitattributes
git sparse-checkout add '/toplevel-dir/*'
In fact, the former will result in
fatal: '/.gitattributes' is outside repository...
while the later will result in:
fatal: Invalid path '/toplevel-dir': No such file or directory
despite the fact that both are valid gitignore-style
patterns that would select real files if added to the sparse-checkout file.
This might lead people to just use the path without the leading slash, potentially resulting in them grabbing files with the same name throughout the directory hierarchy contrary to their expectations.
See also this thread and this one.
Adding prefix seems to just be fraught with error; so for now simply throw an error in non-cone mode when sparse-checkout set/add are run from a subdirectory.