What goes wrong when using git worktree with git submodules
Asked Answered
K

1

66

I recently discovered the git worktree command:

The new working directory is linked to the current repository, sharing everything except working directory specific files such as HEAD, index, etc.

But the docs also indicate

… the support for submodules is incomplete. It is NOT recommended to make multiple checkouts of a superproject.

without further explanation as to what goes wrong.

Can someone enlighten me about the problems to expect? For example, will I be fine if I use the separate worktrees generated this way only for changes that do not affect the submodules?

Kenyon answered 7/8, 2015 at 7:25 Comment(0)
B
58

Commit a83a66a is quite clear about that:

git-submodule.sh expects $GIT_DIR/config to be per-worktree, at least for the submodule.* part.
Here I think we have two options:

  • either update config.c to also read $GIT_DIR/config.worktree (which is per worktree) in addition to $GIT_DIR/config (shared) and store worktree-specific vars in the new place,
  • or update git-submodule.sh to read/write submodule.* directly from $GIT_DIR/config.submodule (per worktree).

These take time to address properly. Meanwhile, make a note to the user that they should not use multiple worktrees in submodule context.

More generally, where to put those submodules?

There are a couple options:

  • You may want to keep $SUB repos elsewhere (perhaps in a central place) outside $SUPER. This is also true for nested submodules where a superproject may be a submodule of another superproject.
  • You may want to keep all $SUB repos in $SUPER/modules (or some other place in $SUPER)
  • We could even push it further and merge all $SUB repos into $SUPER instead of storing them separately. But that would at least require ref namespace enabled.

This commit was an answer to commit df56607.


From a git user point of view, that means a git submodule update --init --recursive does not know exactly where to checkout the submodules.
Are they duplicated across all worktrees, or are they centralized somewhere? This isn't formally specified yet.


A year later (and with git 2.9), clacke adds in the comments

the confusion has been resolved, but not in an optimal manner.
Submodules work fine now as far as I can see, but each worktree has its own set of submodule repos (under motherrepo.git/worktree/<worktreename>/modules/<submodule>), so if you have a submodule that's big, you are facing some serious disk usage.


Git aliases to handle submodules in subtrees:

The alias git wtas expects that git wta is defined globally, or at least for all the repos involved. No warranty included. Your favorite pet may catch a painful infection if your path names have spaces in them.

It expects a structure in your repo like the one in a non-bare repo with submodules initiated, so if you have a bare repo, you'll have to mimic that setup. A submodule with the name (not path) foo goes in <your-.git-directory>/modules/foo (not .../foo.git). It will not crash if some module is not present in the repo, it just skips it.

There is room for improvement. It does not handle submodules within submodules, it only goes one level down. It may work to just change the submodule git wta call to a git wtas call, but I haven't verified this yet.

-- clacke


See also git worktree move (with Git 2.17+, Q2 2018).


Actually, before Git 2.21 (Q1 2019), "git worktree remove" and "git worktree move" refused to work when there is a submodule involved.
This has been loosened to ignore uninitialized submodules.

See commit 00a6d4d (05 Jan 2019) by Nguyễn Thái Ngọc Duy (pclouds).
(Merged by Junio C Hamano -- gitster -- in commit 726f89c, 18 Jan 2019)

worktree: allow to (re)move worktrees with uninitialized submodules

Uninitialized submodules have nothing valuable for us to be worried about. They are just SHA-1.
Let "worktree remove" and "worktree move" continue in this case so that people can still use multiple worktrees on repos with optional submodules that are never populated, like sha1collisiondetection in git.git when checked out by doc-diff script.

Note that for "worktree remove", it is possible that a user initializes a submodule (*), makes some commits (but not push), then deinitializes it.
At that point, the submodule is unpopulated, but the precious new commits are still in:

$GIT_COMMON_DIR/worktrees/<worktree>/modules/<submodule>

directory and we should not allow removing the worktree or we lose those commits forever.
The new directory check is added to prevent this.

(*) yes they are screwed anyway by doing this since "git submodule" would add submodule.* in $GIT_COMMON_DIR/config, which is shared across multiple worktrees.
But it does not mean we let them be screwed even more.


Before Git 2.25 (Q1 2020), if you had the submodule.recurse config option set, issuing git worktree add in a superproject with initialized submodules would fail.

See commit 4782cf2ab6 (27 Oct 2019) by Philippe Blain (phil-blain).
(Merged by Junio C Hamano -- gitster -- in commit 726f89c, 1 Dec 2019)

worktree: teach "add" to ignore submodule.recurse config

"worktree add" internally calls "reset --hard", but if submodule.recurse is set, reset tries to recurse into initialized submodules, which makes start_command try to cd into non-existing submodule paths and die.

Fix that by making sure that the call to reset in "worktree add" does not recurse.

A workaround for earlier versions is to momentarily deactivate the config:

git -c submodule.recurse=0 worktree add ...

Before Git 2.26 (Q2 2020), issuing git checkout --recurse-submodules (or reset or read-tree) in a linked worktree of a repo with initialized submodules would incorrectly move the submodule(s) HEAD(s) in the git repository of the main worktree, and clobber the .git gitfile in the working directory of the submodules in the linked worktree:

See commit a9472afb63 (21 Jan 2020) by Philippe Blain (phil-blain).
(Merged by Junio C Hamano -- gitster -- in commit ff5134b2, 5 Fev 2020)

submodule.c: use get_git_dir() instead of get_git_common_dir()

Ever since df56607 (git-common-dir: make "modules/" per-working-directory directory, 2014-11-30), submodules in linked worktrees are cloned to $GIT_DIR/modules, i.e. $GIT_COMMON_DIR/worktrees/<name>/modules.

However, this convention was not followed when the worktree updater commands checkout, reset and read-tree learned to recurse into submodules. Specifically, submodule.c::submodule_move_head, introduced in 6e3c159 (update submodules: add submodule_move_head, 2017-03-14) and submodule.c::submodule_unset_core_worktree, (re)introduced in 898c2e6 (submodule: unset core.worktree if no working tree is present, 2018-12-14) use get_git_common_dir() instead of get_git_dir() to get the path of the submodule repository.

This means that, for example, 'git checkout --recurse-submodules <branch>' in a linked worktree will correctly checkout <branch>, detach the submodule's HEAD at the commit recorded in <branch> and update the submodule working tree, but the submodule HEAD that will be moved is the one in $GIT_COMMON_DIR/modules/<name>/, i.e. the submodule repository of the main superproject working tree. It will also rewrite the gitfile in the submodule working tree of the linked worktree to point to $GIT_COMMON_DIR/modules/<name>/. This leads to an incorrect (and confusing!) state in the submodule working tree of the main superproject worktree.

Additionally, if switching to a commit where the submodule is not present, submodule_unset_core_worktree will be called and will incorrectly remove 'core.wortree' from the config file of the submodule in the main superproject worktree, $GIT_COMMON_DIR/modules/<name>/config.

Fix this by constructing the path to the submodule repository using get_git_dir() in both submodule_move_head and submodule_unset_core_worktree.

Bloated answered 7/8, 2015 at 7:33 Comment(13)
Thanks for the research! But I’m still not sure about what actually breaks, and what actions would work nevertheless. Can you try to describe that that from a git user point of view?Kenyon
@JoachimBreitner from a git user, you don't know where the submodule are when you do a git submodule update --init ---recursive. Are they duplicated across all worktrees, or are they centralized somewhere?Bloated
As of today, I'm on this question because the confusion has been resolved, but not in an optimal manner. Submodules work fine now as far as I can see, but each worktree has its own set of submodule repos (under motherrepo.git/worktree/<worktreename>/modules/<submodule>), so if you have a submodule that's big, you are facing some serious disk usage.Stephi
@Stephi Thank you. I have included your comment in the answer for more visibility. Are you using git 2.9?Bloated
I'm not including it yet, because it's a bit specific for several quirks with our layout, but I now have a bare motherrepo.git/modules/<submodule> for each submodule, and I have an alias wta <name> that creates a worktree for motherrepo, then goes to each bare submodule and runs git wta <name> there as well, and the aliases there know where each submodule is supposed to end up in the mother worktree. Seems to not do insane things so far.Stephi
To help with the worktree naming issue, I actually create a blah/<name>_<worktree> worktree, move it to blah/<name>, then put a symlink in its place. (for worktree prune to work properly). .git/info/excludes in motherrepo knows to ignore these.Stephi
@Bloated Nine months later a git alias baby is born. :-)Stephi
@Bloated After I learned about git worktree --detach it became much easier, and I could make it generic for arbitrary submodules and for Windows. The part that took the most time was the correct levels of quoting and other indirect details, like the path handling. :-)Stephi
As far as I understood, the only problem is that the “administrative files” (HEAD etc.) of the subrepo(s) might be different for different workdirs, because they (workdirs) may have different associated subrepo commits. But why can’t we solve this problem in exactly the same way as with the workdirs themselves, i.e. store the “meat” of the subrepo in e.g. $GIT_COMMON_DIR/modules/<submodule>, and store administrative files in $GIT_COMMON_DIR/worktrees/<worktree>/modules/<submodule>? This way the workdirs would be independent without duplication of the objects of their submodules.Carrollcarronade
@Carrollcarronade Interesting suggestion. I suppose you would get a more informed answer on the Git mailing list though: public-inbox.org/gitBloated
@philb Thank you for your edit, and for your contribution to Git itself! I mentioned it already, but in https://mcmap.net/q/11514/-git-submodule-update-is-slow-how-can-i-debug-why-it-39-s-slow.Bloated
"git submodule update --init --recursive does not know exactly where to checkout the submodules" okay, what am I missing? Because at a glance, the obvious solution literally enabled by having multiple worktrees is to give each submodule a linked worktree in each of the parent repo's linked worktrees.Cavanaugh
@Cavanaugh True, but... not so easy!Bloated

© 2022 - 2024 — McMap. All rights reserved.