Git repo where each submodule is a branch of same repo. How to avoid double/triple... download with git clone --recursive?
Asked Answered
git
C

2

8

Suppose I have the following project tree:

src
data
doc

I'd like to keep all the folders in a Git repository, published to Gitlab. But I don't want to track data and doc together with src.

So I use the following strategy:

git remote add origin ADDRESS
git submodule add -b data ADDRESS data
git submodule add -b doc ADDRESS doc

It actually works fine, except when I try to replicate the repository with:

git clone --recursive ADDRESS

all objects get transmitted 3 times: both the root and data and doc all contain:

  • origin/master
  • origin/data
  • origin/doc

Is there an easy way to avoid this? Just to clarify what I'd like:

  • the master repository should only fetch origin/master, not the other two
  • the data submodule should only fetch origin/data.
  • the doc submodule should only fetch origin/doc.

Would be easy to achieve with 3 separate repositories, but that's too cumbersome, since I apply this approach for multiple projects.

UPDATE

git worktree from this answer allows me to achieve what I want manually.

But now, instead of the automatic approach (which consumes 4x bandwidth):

git clone --recursive git@foo:foo/bar.git

I have to do:

git clone git@foo:foo/bar.git
cd bar
git worktree add data origin/data
git worktree add src/notebooks origin/notebooks
git worktree add doc origin/doc
git worktree add reports origin/reports

I could automate this process with some scripts, since .gitmodules file already contains the complete info:

[submodule "data"]
    path = data
    url = git@foo:foo/bar.git
    branch = data
[submodule "src/notebooks"]
    path = src/notebooks
    url = git@foo:foo/bar.git
    branch = notebooks
[submodule "doc"]
    path = doc
    url = git@foo:foo/bar.git
    branch = doc
[submodule "reports"]
    path = reports
    url = git@foo:foo/bar.git
    branch = reports

I wonder if there already is some standard git script or flag that handles this?

Cavatina answered 11/4, 2017 at 20:58 Comment(3)
You can tell Git to do single-branch clones. I would not recommend this in general, but it should work for this particular case.Fiona
But I don't want to track data and doc together with src. Is there any sound reason for that?Plague
Data and source code are separate concepts. Source code are text files that you can grep and review. Data are usually binary blobs. I wonder if there is any sound reason to mix them.Cavatina
G
4

Git is designed to be distributed, that means every user should have whole history and all branches. If you want to have a single bare repo, but different working trees to reduce network traffic, you can do it using git worktree command:

So in your case, let's say you have a src folder as a main folder with src branch, creating other two from it should be as simple as

git worktree add ../data data
git worktree add ../doc doc

See this awesome answer https://mcmap.net/q/11286/-how-can-i-have-multiple-working-directories-with-git to get more info about this command. But if you have an older git without worktree support, you can use git-new-workdir script as

git-new-workdir project-dir new-workdir branch

This is also described in Multiple working directories with Git?

Gesner answered 14/4, 2017 at 17:0 Comment(1)
Thanks. I'll award the bounty before it disappears. Maybe someone can still answer if it's possible to automate the clone using git worktree.Cavatina
C
1

Warning: "git worktree add" internally calls "reset --hard" that should not descend into submodules, even when submodule.recurse configuration is set, but it was affected.

This has been corrected with Git 2.25 (Q1 2020).

See commit 4782cf2 (27 Oct 2019) by Philippe Blain (phil-blain).
(Merged by Junio C Hamano -- gitster -- in commit 05fc647, 01 Dec 2019)

worktree: teach "add" to ignore submodule.recurse config

Signed-off-by: Philippe Blain

"git worktree add" internally calls "reset --hard", but if submodule.recurse is set, reset tries to recurse into initialized submodules, which makes start_command try to cd into non-existing submodule paths and die.

Fix that by making sure that the call to reset in "worktree add" does not recurse.

Contractive answered 5/12, 2019 at 19:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.