How to prevent git clone --filter=blob:none --sparse from downloading files on the root directory?
Asked Answered
V

1

2

As explained at How do I clone a subdirectory only of a Git repository? the best way I've found so far to download all files in a Git subdirectory only is:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/cirosantilli/test-git-partial-clone-big-small
cd test-git-partial-clone-big-small
git sparse-checkout set small

which is my best attempt so far at downloading only the small/ directory.

However, as soon as I run:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/cirosantilli/test-git-partial-clone-big-small

any files (but not directories) present on the root directory are downloaded and appear in the repository, in the case of that test repo I get the unwanted file:

generate.sh

How to prevent that from happening, to obtain only the subdirectories that I'm interested in, without any root directory files?

I've checked on other repositories e.g. https://github.com/torvalds/linux , and having a large number of small files on toplevel does not slow down the download significantly (by downloading them one by one separately), so this would only be a problem if there are large files on toplevel.

Tested on Git 2.37.2, Ubuntu 22.10, February 2023.

Vyborg answered 1/2, 2023 at 14:7 Comment(1)
git clone --sparse does exactly that: Employ a sparse-checkout, with only files in the toplevel directory initially being present. (Emphasize mine — phd). That is, you cannot do what you want with git clone. You can setup sparse checkout and then use git fetch but fetch doesn't allow filters AFAIK. So you have to choose between one way or the other.Relativity
A
2

Do your clone --no-checkout aka -n, then set up your sparsity rules exactly as you want. To get really minimal clone traffic, don't use blob:none, use tree:0. Smoketest:

git clone -n --depth=1 --filter=tree:0 \
        https://github.com/cirosantilli/test-git-partial-clone-big-small
cd !$:t:r
git sparse-checkout set --no-cone '*/'
git checkout
Astronomer answered 8/4, 2023 at 18:7 Comment(4)
OK, by doing git sparse-checkout set --no-cone small it achieves by use case of downloading only the small directory, thanks. I wonder if there's a way without --no-cone which man git says should be avoided.Vyborg
It's the definition of cone mode that you can't. Get a directory, get all its toplevel contents. That simplification makes working with really large checkouts much more efficient, but you have to give up some selectivity. I think the doc using "deprecate" overstates the case, there's downsides that might bite you. See if any of the listed downsides are painful in your use, if not, then they're not, and you can painlessly use --no-cone.Astronomer
I've also added a big tree to the test now: github.com/cirosantilli/test-git-partial-clone-big-small/blob/… and this approach appears to fetch them unfortunately. E.g. they show on git ls-files, and ncdu says .git is several megs.Vyborg
Seems you're right, to get that level of selectivity in the fetch you're going to have to go full manual transmission on it or switch back to a blob:none filter and read all the trees since github (like all others afaik) doesn't support combining filters.Astronomer

© 2022 - 2024 — McMap. All rights reserved.