Git submodule without extra weight
Asked Answered
D

2

22

I'm not a Git master yet, faced a problem I can't figure out how to fix. I have a repo with my WordPress custom skeleton and I've added WordPress as a submodule from its original repo by git submodule add wp_repo_url. When I clone my repo to local machine with:

git clone --recursive https://github.com/user/repo local_dir

it downloads the WP submodule as expected, but here's the problem - actual files are only 20.7Mb, and in .git/modules/core/objects/pack I've got a huge 124Mb .pack file, which, I suppose, is smth like commit history / revisions of that submodule.

How can I re-add submodule or modify while cloning to prevent downloading this extra weight?

UPDATE:

With the help of @iclmam I've came up with the following setup:

  • my skeleton repo will have WordPress as a submodule, the whole original repo with history
  • when creating a new project from skeleton, I'll clone it without --recursive option to get only the main files and empty folder for submodule
  • IF I need WordPress with full history - for example, if I need to switch between different WP branches/tags to test my plugin/theme backward compatibility - then I'll get this submodule with full history
  • if I just need a plain clean install of recent WP version, I'll change into wp directory and go the old way:

    curl -L -O http://wordpress.org/latest.zip
    unzip latest.zip 
    mv wordpress/* .
    rm latest.zip  
    rm -rf wordpress
    

Not a perfect solution (I wanted to automate everything as much as possible), but it works for now.

Any advices on the original question are appreciated.

Duff answered 8/5, 2015 at 17:52 Comment(2)
Note: with Git 2.10 (Q3 2016), you will be able to do git config -f .gitmodules submodule.<name>.shallow true: See my answer belowCamus
Great news, thanks for pointing this out!Duff
C
39

since Git 2.10+ (Q3 2016), you will be able to do a regular clone... and still benefit from shallow clone for submodules.

All you need to do is record that configuration in your .gitmodules:

git config -f .gitmodules submodule.<name>.shallow true

Add, commit, and push: anyone cloning your repo (regular clone, full history) will get only a depth of 1 for the submodule <name>.

See commit f6fb30a, commit abed000 and commit 37f52e9 (03 Aug 2016) by Stefan Beller (stefanbeller).
(Merged by Junio C Hamano -- gitster -- in commit dc7e09a, 08 Aug 2016)

> submodule update: learn --[no-]recommend-shallow option

Sometimes the history of a submodule is not considered important by the projects upstream. To make it easier for downstream users, allow a boolean field 'submodule.<name>.shallow' in .gitmodules, which can be used to recommend whether upstream considers the history important.

This field is honored in the initial clone by default, it can be ignored by giving the --no-recommend-shallow option.

Camus answered 11/8, 2016 at 11:41 Comment(6)
Somehow, this doesn't work with branches. I have a submodule and not only I don't care about its history, but I also don't care for most of its contents. For that reason I cloned the repo, branched master to master-src and in my master-src branch I delete all the stuff that I don't care about. When I add shallow submodule with master-src branch I do get only the files that are part of master-src branch, however, also sparce checkout history gets fetched anyways.Viviparous
In short, original submodule with history was 100MB (size of basic clone), shallow checkout of this submodule's master branch is 30MB, size of master-src without .git is 800KB. If I add shallow submodule master-src branch total size grows by 30MB, yet I expected it to grow by roughly size of master-src (800KB)Viviparous
This problem doesn't exist if don't create master-src, but directly remove irrelevant code in master branch and add shallow submodule's master branchViviparous
@PavelP That would be better illustrated in a new question, with your version of Git and OS, and a link back to this answer from your new question. That way, I or others can help.Camus
How do you do it with submodules in nested folders, is the syntax like git config -f .gitmodules submodule.import/github/Sefaria/Sefaria-Export.shallow true?Raab
@Raab It should be, yes. Usually, the middle part can be wrapped in quotes: git config -f .gitmodules submodule."import/github/Sefaria/Sefaria-Export".shallow trueCamus
M
0

If you are using WP as a submodule, that means you probably have the need to access the history inside that submodule. That means you need this pack file.

Git packs data in pack files. This is for effenciency and disk space saving purpose. See Git internal - Packfiles . If you wonder what is in the packfile, you can use the git verify-pack command. Used with the -v option, you might find out that a huge file has been put in your repository.

If for some reasons you want to 'clean' the submodule, I would then suggest your to read Why is my repo so big ?

If you do not want the full history in the submodule, you can try to clone it with the -depth option (see git submodule command), so that it is a shallow clone with a history truncated. This might decrease the size of the pack folder.

1) clone the main repo without the recurse option

2) inside the main repo, initialize the submodule using the git submodule command with the -depth option

Mammy answered 8/5, 2015 at 18:39 Comment(6)
I don't need a history of this submodule, I can even stick with a current stable branch or release (tag) instead of master. The purpose is to clone my custom skeleton to my local machine anytime I start a new project and have it install a fresh, current stable version of WP instead of manually downloading from wordpress.org. In this case I understand that when WP has a new stable release I would have to update my submodule branch/tag as well, but that's not a problem if in this case I'll get a submodule without large packs. Or cloning particular branch/tag will also get this packs?Duff
I am not sure what you mean by cloning a particular branch/tag. When you clone using the -branch option, it just sets your HEAD to that specific branch instead of the branch pointed to by the cloned repository’s HEAD. However, it is a clone and by definition, it would therefore also contain the whole history. Try using the -depth option. You will then get a truncated history.Mammy
Yes, thanks for your help. I guess I'm getting the picture. Let me clarify my goal and how I see it can be reached. I've created my own repo on Github and pushed my custom data to this remote. Now I need to add WordPress as a submodule and I need it to be at current stable tag and have truncated history. 1. Add submodule: git submodule add https://github.com/WordPress/WordPress.git core 2. Init it: git submodule init -depth 1 3. Change into submodule's dir: cd core 4. Move submodule pointer to the tag I need: git checkout 4.2.2 Is it correct?Duff
Nope, I've got it wrong. You were pointing me to the right direction from your first reply. I'll have a full submodule history in my skeleton repo, that's fine. The key is to init and clone a specific submodule branch/tag/revision in a new project, started using the skeleton repo. And that's where --depth option will help. Testing it right now.Duff
The --depth option is available only with git submodule add and git submodule update commands. I do not think it is the best way as you will then have to work out the depth you need. In your case, since you are not interested in the history, why don't you just get the zip of the specific commit of Wordpress and unzip it in core? You can then put "core" in a .gitignore file so that git does not try to manage the "core" directory.Mammy
Yep, still can't make it work. The 4.2 branch already has some commits ahead of 4.2.2 tag. I'm getting a lot of mess here and --depth 1 is not working as expected, you are right. I was trying to automate everything and make it possible to just run 1-2 commands and get everything ready. Looks like I'll leave it as a submodule, as it was originally but instead of doing --recursive I'll not fetch the submodule at all until I need a full histtory to be able to switch branches/tags and test my plugin with different WP version. Otherwise I'll just cd core and curl latest wp zip.Duff

© 2022 - 2024 — McMap. All rights reserved.