How to make shallow git submodules?
Asked Answered
D

10

201

Is it possible to have shallow submodules? I have a superproject with several submodules, each with a long history, so it gets unnecessarily big dragging all that history.

All I have found is this unanswered thread.

Should I just hack git-submodule to implement this?

Dodecagon answered 27/1, 2010 at 3:34 Comment(1)
"git submodule add/update" can now clone the submodule repositories shallowly! See my answer belowBk
B
209

TLDR;

git clone --recurse-submodules --shallow-submodules

(But see caveat with Ciro Santilli answer)
Or: record that a submodule should be shallow cloned:

git config -f .gitmodules submodule.<name>.shallow true

Which means the next git clone --recurse-submodules will shallow clone the submodule '<name>' (depth 1), even without the --shallow-submodules.


What follows is the evolution of git submodule/git clone when it comes to shallow clones, starting (in 2013) with Git 1.8.4, and going from there.


New in the upcoming Git 1.8.4 (July 2013):

"git submodule update" can optionally clone the submodule repositories shallowly.

(And git 2.10 Q3 2016 allows to record that with git config -f .gitmodules submodule.<name>.shallow true.
See the end of this answer)

See commit 275cd184d52b5b81cb89e4ec33e540fb2ae61c1f:

Add the --depth option to the add and update commands of "git submodule", which is then passed on to the clone command. This is useful when the submodule(s) are huge and you're not really interested in anything but the latest commit.

Tests are added and some indention adjustments were made to conform to the rest of the testfile on "submodule update can handle symbolic links in pwd".

Signed-off-by: Fredrik Gustafsson <[email protected]>
Acked-by: Jens Lehmann <[email protected]>

That means this works:

# add shallow submodule
git submodule add --depth 1 <repo-url> <path>
git config -f .gitmodules submodule.<path>.shallow true

# later unshallow
git config -f .gitmodules submodule.<path>.shallow false
git submodule update <path>

The commands can be ran in any order. The git submodule command perform the actual clone (using depth 1 this time). And the git config commands make the option permanent for other people who will clone the repo recursively later.

As an example, suppose you have the repo https://github.com/foo/bar and you want to add https://github.com/lorem/ipsum as a submodule, in your repo at path/to/submodule. The commands may look like like the following:

git submodule add --depth 1 [email protected]:lorem/ipsum.git path/to/submodule
git config -f .gitmodules submodule.path/to/submodule.shallow true

The following results in the same thing too (opposite order):

git config -f .gitmodules submodule.path/to/submodule.shallow true
git submodule add --depth 1 [email protected]:lorem/ipsum.git path/to/submodule

The next time someone runs git clone --recursive [email protected]:foo/bar.git, it will pull in the whole history of https://github.com/foo/bar, but it will only shallow-clone the submodule as expected.

With:

--depth

This option is valid for add and update commands.
Create a 'shallow' clone with a history truncated to the specified number of revisions.


atwyman adds in the comments:

As far as I can tell this option isn't usable for submodules which don't track master very closely. If you set depth 1, then submodule update can only ever succeed if the submodule commit you want is the latest master. Otherwise you get "fatal: reference is not a tree".

That is true.
That is, until git 2.8 (March 2016). With 2.8, the submodule update --depth has one more chance to succeed, even if the SHA1 is directly reachable from one of the remote repo HEADs.

See commit fb43e31 (24 Feb 2016) by Stefan Beller (stefanbeller).
Helped-by: Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit 9671a76, 26 Feb 2016)

submodule: try harder to fetch needed sha1 by direct fetching sha1

When reviewing a change that also updates a submodule in Gerrit, a common review practice is to download and cherry-pick the patch locally to test it.
However when testing it locally, the 'git submodule update' may fail fetching the correct submodule sha1 as the corresponding commit in the submodule is not yet part of the project history, but also just a proposed change.

If $sha1 was not part of the default fetch, we try to fetch the $sha1 directly. Some servers however do not support direct fetch by sha1, which leads git-fetch to fail quickly.
We can fail ourselves here as the still missing sha1 would lead to a failure later in the checkout stage anyway, so failing here is as good as we can get.


MVG points out in the comments to commit fb43e31 (git 2.9, Feb 2016)

It would seem to me that commit fb43e31 requests the missing commit by SHA1 id, so the uploadpack.allowReachableSHA1InWant and uploadpack.allowTipSHA1InWant settings on the server will probably affect whether this works.
I wrote a post to the git list today, pointing out how the use of shallow submodules could be made to work better for some scenarios, namely if the commit is also a tag.
Let's wait and see.

I guess this is a reason why fb43e31 made the fetch for a specific SHA1 a fallback after the fetch for the default branch.
Nevertheless, in case of “--depth 1” I think it would make sense to abort early: if none of the listed refs matches the requested one, and asking by SHA1 isn't supported by the server, then there is no point in fetching anything, since we won't be able to satisfy the submodule requirement either way.


Update August 2016 (3 years later)

With Git 2.10 (Q3 2016), you will be able to do

 git config -f .gitmodules submodule.<name>.shallow true

See "Git submodule without extra weight" for more.


Git 2.13 (Q2 2017) do add in commit 8d3047c (19 Apr 2017) by Sebastian Schuberth (sschuberth).
(Merged by Sebastian Schuberth -- sschuberth -- in commit 8d3047c, 20 Apr 2017)

a clone of this submodule will be performed as a shallow clone (with a history depth of 1)

However, Ciro Santilli adds in the comments (and details in his answer)

shallow = true on .gitmodules only affects the reference tracked by the HEAD of the remote when using --recurse-submodules, even if the target commit is pointed to by a branch, and even if you put branch = mybranch on the .gitmodules as well.


Git 2.20 (Q4 2018) improves on the submodule support, which has been updated to read from the blob at HEAD:.gitmodules when the .gitmodules file is missing from the working tree.

See commit 2b1257e, commit 76e9bdc (25 Oct 2018), and commit b5c259f, commit 23dd8f5, commit b2faad4, commit 2502ffc, commit 996df4d, commit d1b13df, commit 45f5ef3, commit bcbc780 (05 Oct 2018) by Antonio Ospite (ao2).
(Merged by Junio C Hamano -- gitster -- in commit abb4824, 13 Nov 2018)

submodule: support reading .gitmodules when it's not in the working tree

When the .gitmodules file is not available in the working tree, try using the content from the index and from the current branch.
This covers the case when the file is part of the repository but for some reason it is not checked out, for example because of a sparse checkout.

This makes it possible to use at least the 'git submodule' commands which read the gitmodules configuration file without fully populating the working tree.

Writing to .gitmodules will still require that the file is checked out, so check for that before calling config_set_in_gitmodules_file_gently.

Add a similar check also in git-submodule.sh::cmd_add() to anticipate the eventual failure of the "git submodule add" command when .gitmodules is not safely writeable; this prevents the command from leaving the repository in a spurious state (e.g. the submodule repository was cloned but .gitmodules was not updated because config_set_in_gitmodules_file_gently failed).

Moreover, since config_from_gitmodules() now accesses the global object store, it is necessary to protect all code paths which call the function against concurrent access to the global object store.
Currently this only happens in builtin/grep.c::grep_submodules(), so call grep_read_lock() before invoking code involving config_from_gitmodules().

NOTE: there is one rare case where this new feature does not work properly yet: nested submodules without .gitmodules in their working tree.


Note: Git 2.24 (Q4 2019) fixes a possible segfault when cloning a submodule shallow.

See commit ddb3c85 (30 Sep 2019) by Ali Utku Selen (auselen).
(Merged by Junio C Hamano -- gitster -- in commit 678a9ca, 09 Oct 2019)


Git 2.25 (Q1 2020), clarifies the git submodule update documentation.

See commit f0e58b3 (24 Nov 2019) by Philippe Blain (phil-blain).
(Merged by Junio C Hamano -- gitster -- in commit ef61045, 05 Dec 2019)

doc: mention that 'git submodule update' fetches missing commits

Helped-by: Junio C Hamano
Helped-by: Johannes Schindelin
Signed-off-by: Philippe Blain

'git submodule update' will fetch new commits from the submodule remote if the SHA-1 recorded in the superproject is not found. This was not mentioned in the documentation.


Warning: With Git 2.25 (Q1 2020), the interaction between "git clone --recurse-submodules" and alternate object store was ill-designed.

The documentation and code have been taught to make more clear recommendations when the users see failures.

See commit 4f3e57e, commit 10c64a0 (02 Dec 2019) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit 5dd1d59, 10 Dec 2019)

submodule--helper: advise on fatal alternate error

Signed-off-by: Jonathan Tan
Acked-by: Jeff King

When recursively cloning a superproject with some shallow modules defined in its .gitmodules, then recloning with "--reference=<path>", an error occurs. For example:

git clone --recurse-submodules --branch=master -j8 \
  https://android.googlesource.com/platform/superproject \
  master
git clone --recurse-submodules --branch=master -j8 \
  https://android.googlesource.com/platform/superproject \
  --reference master master2

fails with:

fatal: submodule '<snip>' cannot add alternate: reference repository
'<snip>' is shallow

When a alternate computed from the superproject's alternate cannot be added, whether in this case or another, advise about configuring the "submodule.alternateErrorStrategy" configuration option and using "--reference-if-able" instead of "--reference" when cloning.

That is detailed in:

With Git 2.25 (Q1 2020), The interaction between "git clone --recurse-submodules" and alternate object store was ill-designed.

Doc: explain submodule.alternateErrorStrategy

Signed-off-by: Jonathan Tan
Acked-by: Jeff King

Commit 31224cbdc7 ("clone: recursive and reference option triggers submodule alternates", 2016-08-17, Git v2.11.0-rc0 -- merge listed in batch #1) taught Git to support the configuration options "submodule.alternateLocation" and "submodule.alternateErrorStrategy" on a superproject.

If "submodule.alternateLocation" is configured to "superproject" on a superproject, whenever a submodule of that superproject is cloned, it instead computes the analogous alternate path for that submodule from $GIT_DIR/objects/info/alternates of the superproject, and references it.

The "submodule.alternateErrorStrategy" option determines what happens if that alternate cannot be referenced.
However, it is not clear that the clone proceeds as if no alternate was specified when that option is not set to "die" (as can be seen in the tests in 31224cbdc7).
Therefore, document it accordingly.

The config submodule documentation now includes:

submodule.alternateErrorStrategy::

Specifies how to treat errors with the alternates for a submodule as computed via submodule.alternateLocation.
Possible values are ignore, info, die.
Default is die.
Note that if set to ignore or info, and if there is an error with the computed alternate, the clone proceeds as if no alternate was specified.


Note: "git submodule update --quiet"(man) did not propagate the quiet option down to underlying git fetch(man), which has been corrected with Git 2.32 (Q2 2021).

See commit 62af4bd (30 Apr 2021) by Nicholas Clark (nwc10).
(Merged by Junio C Hamano -- gitster -- in commit 74339f8, 11 May 2021)

submodule update: silence underlying fetch with "--quiet"

Signed-off-by: Nicholas Clark

Commands such as

$ git submodule update --quiet --init --depth=1

involving shallow clones, call the shell function fetch_in_submodule, which in turn invokes git fetch.
Pass the --quiet option onward there.

Bk answered 17/7, 2013 at 6:32 Comment(31)
Wow that was fast ! Thx for the answer by the way. Oh and --depth shoudl take an argument too ;)Pitfall
@VonC, but still the depth value is hard to determine especially for existing submodules. (This question gets no answer at all.)Juggle
As far as I can tell this option isn't usable for submodules which don't track master very closely. If you set depth 1, then submodule update can only ever succeed if the submodule commit you want is the latest master. Otherwise you get "fatal: reference is not a tree". Presumably with a larger depth you might successfully update to something older than master, but never to something from a different unmerged branch. I feel like this is a git feature nobody's used in real scenarios.Lauder
@atwyman what version of git are you using? A submodule never "track" a branch. It can be made to update to the latest of a branch (https://mcmap.net/q/13005/-git-submodule-tracking-latest). But it always checkout a fix commit (depth or not)Bk
@Bk I'm testing on git v2.4.3 on Mac. Submodules always point to a specific commit, but developers who update them are usually pulling from some branch, so I was using that as shorthand. The example which failed for me is a pre-existing submodule whose commit is the tip of a branch newer than master. What appears to happen when I run "git submodule update --init --depth 1" in a newly-cloned repo is the submodule gets fetched at the commit which is the tip of its master, then the attempt to checkout the real commit fails because it's unknown. Manually fetching doesn't change anything.Lauder
@atwyman it seems sensible that a submodule fetch with a depth 1 can no longer switch to its pre-recorded gitlink SHA1 because said SHA1 is not in the history fetched by the submodule.Bk
@atwyman It seems git 2.8 should improve the case you mention. I'll try and test it out.Bk
@Bk thanks. The release notes look like there's a change aimed at this use case. Do you know if the extra fetch mentioned in the release notes is something I can do manually with older versions, or if I'll need to get the new one? I tried "git fetch --update-shallow origin <hash>" in the submodule and it didn't seem to have any effect at all.Lauder
@atwyman As I documented before, you can fetch manually one single commit (https://mcmap.net/q/13621/-retrieve-specific-commit-from-a-remote-git-repository)... provided you are in git 2.5+, and the git repo server supports that feature.Bk
Good to know, and it explains why this didn't work for me using 2.4.3 (which is what gets installed by Xcode). I can try out upgrading to 2.5 and see if I can make that work, but it might be hard to get all the relevant servers upgraded (I think they're mostly Ubuntu 14.0.4 which may not have that version), not to mention all our developers. That's my problem to figure out, I guess. Thanks for the info. I didn't realize our submodules were pushing the bleeding edge of git features so much.Lauder
Let's say I have a sha1hash. I know this is the hash of some commit of some repo. I want to use that repo as submodule, but only for that hash, and I don't care about history, so I want the depth = 1. Is this currently possible with git submodule or git clone?Uncovenanted
It would seem to me that commit fb43e31 requests the missing commit by SHA1 id, so the uploadpack.allowReachableSHA1InWant and uploadpack.allowTipSHA1InWant settings on the server will probably affect whether this works. I wrote a post to the git list today, pointing out how the use of shallow submodules could be made to work better for some scenarios, namely if the commit is also a tag. Let's wait and see.Chat
With the recent addition of shallow option in .gitmodules, does the --depth 1 option work for branches that aren't tracking master closely?Uncovenanted
@Uncovenanted Not sure: you can ask a new question for others to test it out.Bk
@Uncovenanted shallow = true on .gitmodules only affects the reference tracked by the HEAD of the remote when using --recurse-submodules according to my tests: https://mcmap.net/q/13854/-how-to-make-shallow-git-submodulesLorenzetti
@Bk can we tell git to fetch only a particular folder as submodule instead of the whole project dir? MY submodule proj looks like src/foo/bar. Can we add only the bar folder and all it's contents as submodule to another project?Fadein
@AvinashRaj Not that I know of, beside a sparse checkout on the submodule: https://mcmap.net/q/13599/-set-git-submodule-to-shallow-clone-amp-sparse-checkoutBk
It's not clear from the answer what's the current way to do it. Also, it's not clear if all of that is needed each time somebody clones new copy or these sparce submodule settings become part of the repo that references these submodules (eg each new clone and submodule update results in sparce submodule checkouts)Promontory
Compared to the average StackOverflow anwers, the details in this one are superb. One thing isn't clear though: what are the exact steps to make a shallow submodule? For example, do we need to run git config -f .gitmodules submodule.<name>.shallow true before making the module? After making the module with its whole history? F.e. run git submodule add --depth 1 -- path first? After? Do we always need to use --depth even thoguh we added the module with --depth? etc?Leong
@Leong As I mention in https://mcmap.net/q/13855/-git-submodule-without-extra-weight, this is to record the submodule shallowness in an existing submodule: anyone cloning that repository would automatically get a shallow submodule as a result.Bk
I figured that much, just not sure on the actual steps. I mean, I could probably fiddle with it (I am about to start), but knowing the exact steps would be useful. This article I just found makes some things more clear and concise: fluentreports.com/blog/?p=195Leong
Or for example, what exactly is <name> in submodule.<name>.shallow?Leong
@Leong It is the name used when you create (git submodule add --name xxx) the submodule: git-scm.com/docs/git-submodule#Documentation/…Bk
Awesome. Thanks. Turns out, the order of the commands doesn't matter much. Either way, the git submodule and git config command will merge their modifications into the same place in the .gitconfig file. git submodule add --depth 1 applies the depth only when cloning that time (not stored as a setting), while git config -f .gitmodule ...shallow true adds the option to the same place in the config, to make it permanent. It can be called before or after.Leong
I added some examples and description.Leong
There's so much activity here. It's hard to find the relevant information. Is there a way to --unshallow submodules when more space is available?Zeb
@Zeb Not sure, but try and update after setting shallow to false (setting seen in https://mcmap.net/q/13855/-git-submodule-without-extra-weight)Bk
@Marcono1234 Thank you for the feedback, good point. I have edited the answer accordingly, and added at the top a "TLDR;".Bk
I am not sure the caveat on --shallow-submodules is still relevant today. I tried the example of @CiroSantilliOurBigBook.com and it actually succeeded using git 2.40.Yasminyasmine
@Yasminyasmine That is great news. I am curious as to which recent evolution in Git would allow that caveat to disappear.Bk
@Bk Same. I looked at the release notes and couldn't find anything related to --shallow-modules recently.Yasminyasmine
E
42

Git 2.9.0 support submodules shallow clone directly, so now you can just call:

git clone url://to/source/repository --recursive --shallow-submodules
Expatiate answered 15/8, 2016 at 10:50 Comment(4)
This option is the most promising, but it fails on git 2.14.1 the submodule commit is not tracked by either a branch or tag: https://mcmap.net/q/13854/-how-to-make-shallow-git-submodulesLorenzetti
@CiroSantilli刘晓波死六四事件法轮功 Make sure your git server is also updatedExpatiate
Thanks, I've tested locally, without a server, and on GitHub, which I can't update :-)Lorenzetti
I have the same issue using git 2.20, it doesn't work when the submodule is not on the tip of the branch.Selfassurance
D
17

Following Ryan's answer I was able to come up with this simple script which iterates through all submodules and shallow clones them:

#!/bin/bash
git submodule init
for i in $(git submodule | sed -e 's/.* //'); do
    spath=$(git config -f .gitmodules --get submodule.$i.path)
    surl=$(git config -f .gitmodules --get submodule.$i.url)
    git clone --depth 1 $surl $spath
done
git submodule update
Dodecagon answered 30/1, 2010 at 23:26 Comment(8)
I'm getting fatal: reference is not a tree: 88fb67b07621dfed054d8d75fd50672fb26349df for each submoduleTalyah
@Talyah have you seen https://mcmap.net/q/13854/-how-to-make-shallow-git-submodules/…Dodecagon
oh shit, well, if you rewrite your script to use git submodule --depth 1 update instead of git clone, then I'll upvote :)Talyah
@Talyah : I wrote my answer in 2010. Things have changed. You can't expect everyone to maintain all of their answers. I did mark the current valid answer as accepted.Dodecagon
well, stackoverflow is meant to be a tool that works for future generations too, so then don't complain if your outdated answers get downvoted :) I do update my answers from time to timeTalyah
@Talyah This is one of the reasons why I stopped contributing to Stackoverflow. People have these unrealistic expectations. It would be a full-time job to maintain every one of my 1637 answers. And then there are also the comments, I suppose I'd have to maintain those as well? Take a look at the dates, that is what they're for. If you read some .NET blog from 2002 with code using ArrayList instead of List, would you use that? Would you demand that the author updated his post? Same principle applies here.Dodecagon
but you cannot go against the status quo, if I go ahead and convert your script into a new answer, people will start upvoting my answer and downvoting yours, because it's what it works for them; also a stackoverflow question is not outdated just because of the date, so long as there is no other stackoverflow thread that talks about the same and is not newerTalyah
s/statusquo/progress/Talyah
L
9

Summary of buggy / unexpected / annoying behaviour as of Git 2.14.1

  1. shallow = true in .gitmodules only affects git clone --recurse-submodules if the HEAD of the remote submodule points to the required commit, even if the target commit is pointed to by a branch, and even if you put branch = mybranch on the .gitmodules as well.

    Local test script. Same behaviour on GitHub 2017-11, where HEAD is controlled by the default branch repo setting:

    git clone --recurse-submodules https://github.com/cirosantilli/test-shallow-submodule-top-branch-shallow
    cd test-shallow-submodule-top-branch-shallow/mod
    git log
    # Multiple commits, not shallow.
    
  2. git clone --recurse-submodules --shallow-submodules fails if the commit is neither referenced by a branch or tag with a message: error: Server does not allow request for unadvertised object.

    Local test script. Same behaviour on GitHub:

    git clone --recurse-submodules --shallow-submodules https://github.com/cirosantilli/test-shallow-submodule-top-sha
    # error
    

    I also asked on the mailing list: https://marc.info/?l=git&m=151863590026582&w=2 and the reply was:

    In theory this should be easy. :)

    In practice not so much, unfortunately. This is because cloning will just obtain the latest tip of a branch (usually master). There is no mechanism in clone to specify the exact sha1 that is wanted.

    The wire protocol supports for asking exact sha1s, so that should be covered. (Caveat: it only works if the server operator enables uploadpack.allowReachableSHA1InWant which github has not AFAICT)

    git-fetch allows to fetch arbitrary sha1, so as a workaround you can run a fetch after the recursive clone by using "git submodule update" as that will use fetches after the initial clone.

TODO test: allowReachableSHA1InWant.

Lorenzetti answered 19/11, 2017 at 7:43 Comment(1)
It seems like there is just no simple way to checkout a detached HEAD commit hash for the submodule, and have downstream users git clone --recursive that fetches only that specific commit.Uncovenanted
I
8

Reading through the git-submodule "source", it looks like git submodule add can handle submodules that already have their repositories present. In that case...

$ git clone $remote1 $repo
$ cd $repo
$ git clone --depth 5 $remotesub1 $sub1
$ git submodule add $remotesub1 $sub1
#repeat as necessary...

You'll want to make sure the required commit is in the submodule repo, so make sure you set an appropriate --depth.

Edit: You may be able to get away with multiple manual submodule clones followed by a single update:

$ git clone $remote1 $repo
$ cd $repo
$ git clone --depth 5 $remotesub1 $sub1
#repeat as necessary...
$ git submodule update
Inquiry answered 30/1, 2010 at 3:32 Comment(1)
Now for git 1.8.0, you can't clone a repository inside a repository anymore. So this solution don't work anymore.Juggle
U
2

Are the canonical locations for your submodules remote? If so, are you OK with cloning them once? In other words, do you want the shallow clones just because you are suffering the wasted bandwidth of frequent submodule (re)clones?

If you want shallow clones to save local diskspace, then Ryan Graham's answer seems like a good way to go. Manually clone the repositories so that they are shallow. If you think it would be useful, adapt git submodule to support it. Send an email to the list asking about it (advice for implementing it, suggestions on the interface, etc.). In my opinion, the folks there are quite supportive of potential contributors that earnestly want to enhance Git in constructive ways.

If you are OK with doing one full clone of each submodule (plus later fetches to keep them up to date), you might try using the --reference option of git submodule update (it is in Git 1.6.4 and later) to refer to local object stores (e.g. make --mirror clones of the canonical submodule repositories, then use --reference in your submodules to point to these local clones). Just be sure to read about git clone --reference/git clone --shared before using --reference. The only likely problem with referencing mirrors would be if they ever end up fetching non-fast-forward updates (though you could enable reflogs and expand their expiration windows to help retain any abandoned commits that might cause a problem). You should not have any problems as long as

  • you do not make any local submodule commits, or
  • any commits that are left dangling by non-fast-forwards that the canonical repositories might publish are not ancestors to your local submodule commits, or
  • you are diligent about keeping your local submodule commits rebased on top of whatever non-fast-forwards might be published in the canonical submodule repositories.

If you go with something like this and there is any chance that you might carry local submodule commits in your working trees, it would probably be a good idea to create an automated system that makes sure critical objects referenced by the checked-out submodules are not left dangling in the mirror repositories (and if any are found, copies them to the repositories that need them).

And, like the git clone manpage says, do not use --reference if you do not understand these implications.

# Full clone (mirror), done once.
git clone --mirror $sub1_url $path_to_mirrors/$sub1_name.git
git clone --mirror $sub2_url $path_to_mirrors/$sub2_name.git

# Reference the full clones any time you initialize a submodule
git clone $super_url super
cd super
git submodule update --init --reference $path_to_mirrors/$sub1_name.git $sub1_path_in_super
git submodule update --init --reference $path_to_mirrors/$sub2_name.git $sub2_path_in_super

# To avoid extra packs in each of the superprojects' submodules,
#   update the mirror clones before any pull/merge in super-projects.
for p in $path_to_mirrors/*.git; do GIT_DIR="$p" git fetch; done

cd super
git pull             # merges in new versions of submodules
git submodule update # update sub refs, checkout new versions,
                     #   but no download since they reference the updated mirrors

Alternatively, instead of --reference, you could use the mirror clones in combination with the default hardlinking functionality of git clone by using local mirrors as the source for your submodules. In new super-project clones, do git submodule init, edit the submodule URLs in .git/config to point to the local mirrors, then do git submodule update. You would need to reclone any existing checked-out submodules to get the hardlinks. You would save bandwidth by only downloading once into the mirrors, then fetching locally from those into your checked-out submodules. The hard linking would save disk space (although fetches would tend to accumulate and be duplicated across multiple instances of the checked-out submodules' object stores; you could periodically reclone the checked-out submodules from the mirrors to regain the disk space saving provided by hardlinking).

Uneasy answered 30/1, 2010 at 15:8 Comment(0)
M
2

Reference to How to clone git repository with specific revision/changeset?

I have written a simple script which has no problem when your submodule reference is away from the master

git submodule foreach --recursive 'git rev-parse HEAD | xargs -I {} git fetch origin {} && git reset --hard FETCH_HEAD'

This statement will fetch the referenced version of submodule.

It is fast but you cannot commit your edit on the submodule (you have to fetch unshallow it before https://mcmap.net/q/13343/-how-to-convert-a-git-shallow-clone-to-a-full-clone)

in full:

#!/bin/bash
git submodule init
git submodule foreach --recursive 'git rev-parse HEAD | xargs -I {} git fetch origin {} && git reset --hard FETCH_HEAD'
git submodule update --recursive
Mumble answered 1/8, 2016 at 0:29 Comment(0)
R
1

I created a slightly different version, for when it's not running at the bleeding edge, which not all projects do. The standard submodule additions did't work nor did the script above. So I added a hash lookup for the tag ref, and if it doesn't have one, it falls back to full clone.

#!/bin/bash
git submodule init
git submodule | while read hash name junk; do
    spath=$(git config -f .gitmodules --get submodule.$name.path)
    surl=$(git config -f .gitmodules --get submodule.$name.url)
    sbr=$(git ls-remote --tags $surl | sed -r "/${hash:1}/ s|^.*tags/([^^]+).*\$|\1|p;d")
    if [ -z $sbr ]; then
        git clone $surl $spath
    else
        git clone -b $sbr --depth 1 --single-branch $surl $spath
    fi
done
git submodule update 
Recurvate answered 15/12, 2015 at 18:15 Comment(0)
P
0

Shallow clone of a submodule is perfect because they snapshot at a particular revision/changeset. It's easy to download a zip from the website so I tried for a script.

#!/bin/bash
git submodule deinit --all -f
for value in $(git submodule | perl -pe 's/.*(\w{40})\s([^\s]+).*/\1:\2/'); do
  mysha=${value%:*}
  mysub=${value#*:}
  myurl=$(grep -A2 -Pi "path = $mysub" .gitmodules | grep -Pio '(?<=url =).*/[^.]+')
  mydir=$(dirname $mysub)
  wget $myurl/archive/$mysha.zip
  unzip $mysha.zip -d $mydir
  test -d $mysub && rm -rf $mysub
  mv $mydir/*-$mysha $mysub
  rm $mysha.zip
done
git submodule init

git submodule deinit --all -f clears the submodule tree which allows the script to be reusable.

git submodule retrieves the 40 char sha1 followed by a path that corresponds to the same in .gitmodules. I use perl to concatenate this information, delimited by a colon, then employ variable transformation to separate the values into mysha and mysub.

These are the critical keys because we need the sha1 to download and the path to correlate the url in .gitmodules.

Given a typical submodule entry:

[submodule "label"]
    path = localpath
    url = https://github.com/repository.git

myurl keys on path = then looks 2 lines after to get the value. This method may not work consistently and require refinement. The url grep strips any remaining .git type references by matching to the last / and anything up to a ..

mydir is mysub minus a final /name which would by the directory leading up to the submodule name.

Next is a wget with the format of downloadable zip archive url. This may change in future.

Unzip the file to mydir which would be the subdirectory specified in the submodule path. The resultant folder will be the last element of the url-sha1.

Check to see if the subdirectory specified in the submodule path exists and remove it to allow renaming of the extracted folder.

mv rename the extracted folder containing our sha1 to its correct submodule path.

Delete downloaded zip file.

Submodule init

This is more a WIP proof of concept rather than a solution. When it works, the result is a shallow clone of a submodule at a specified changeset.

Should the repository re-home a submodule to a different commit, re-run the script to update.

The only time a script like this would be useful is for non-collaborative local building of a source project.

Peroxidize answered 6/6, 2019 at 15:8 Comment(0)
C
-1

I needed a solution to shallow clone submodules when I can not effect on cloning of main repo. Based on one solution above:

#!/bin/bash
git submodule init
for i in $(git submodule | sed -e 's/.* //'); do
    git submodule update --init --depth 1 -- $i
done
Chrischrism answered 14/10, 2020 at 20:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.