As explained at How do I clone a subdirectory only of a Git repository? the best way I've found so far to download all files in a Git subdirectory only is:
git clone --depth 1 --filter=blob:none --sparse \
https://github.com/cirosantilli/test-git-partial-clone-big-small
cd test-git-partial-clone-big-small
git sparse-checkout set small
which is my best attempt so far at downloading only the small/
directory.
However, as soon as I run:
git clone --depth 1 --filter=blob:none --sparse \
https://github.com/cirosantilli/test-git-partial-clone-big-small
any files (but not directories) present on the root directory are downloaded and appear in the repository, in the case of that test repo I get the unwanted file:
generate.sh
How to prevent that from happening, to obtain only the subdirectories that I'm interested in, without any root directory files?
I've checked on other repositories e.g. https://github.com/torvalds/linux , and having a large number of small files on toplevel does not slow down the download significantly (by downloading them one by one separately), so this would only be a problem if there are large files on toplevel.
Tested on Git 2.37.2, Ubuntu 22.10, February 2023.
git clone --sparse
does exactly that: Employ a sparse-checkout, with only files in the toplevel directory initially being present. (Emphasize mine — phd). That is, you cannot do what you want withgit clone
. You can setup sparse checkout and then usegit fetch
but fetch doesn't allow filters AFAIK. So you have to choose between one way or the other. – Relativity