Why do I need to add the `--remote` to git's submodule when I specify the branch in the .gitmodule file?
Asked Answered
M

2

0

I want to pull/update the submodules at the right branch. Doing git submodule update pulls/updates the submodules but it changes to the wrong branch even when the branch I want to ALWAYS use is specified in the .gitsubmodule file.

Only when I do --remote does it work (but then I don't know what other unintended consequences it might have in the rest of my submodules).

I want to updat my modules exactly as specified in my .modules files. How do I do this?

e.g.

[submodule "pytorch-meta-dataset"]
    path = pytorch-meta-dataset
    url = [email protected]:brando90/pytorch-meta-dataset.git
    branch = hdb
[submodule "meta-dataset"]
    path = meta-dataset
    url = [email protected]:brando90/meta-dataset.git

Is what I should be running:

git submodule update
git submodule update --remote
git submodule init
git submodule status 

I did read the --remote:

--remote
           This option is only valid for the update command. Instead of using the superproject’s recorded SHA-1 to update the submodule, use the status of the submodule’s
           remote-tracking branch. The remote used is branch’s remote (branch.<name>.remote), defaulting to origin. The remote branch used defaults to the remote HEAD, but the branch
           name may be overridden by setting the submodule.<name>.branch option in either .gitmodules or .git/config (with .git/config taking precedence).

           This works for any of the supported update procedures (--checkout, --rebase, etc.). The only change is the source of the target SHA-1. For example, submodule update
           --remote --merge will merge upstream submodule changes into the submodules, while submodule update --merge will merge superproject gitlink changes into the submodules.

           In order to ensure a current tracking branch state, update --remote fetches the submodule’s remote repository before calculating the SHA-1. If you don’t want to fetch, you
           should use submodule update --remote --no-fetch.

           Use this option to integrate changes from the upstream subproject with your submodule’s current HEAD. Alternatively, you can run git pull from the submodule, which is
           equivalent except for the remote branch name: update --remote uses the default upstream repository and submodule.<name>.branch, while git pull uses the submodule’s
           branch.<name>.merge. Prefer submodule.<name>.branch if you want to distribute the default upstream branch with the superproject and branch.<name>.merge if you want a more
           native feel while working in the submodule itself.

My install script ends up looking retarded:

# -- gitsubmodules
# - set up pytorch-meta-dataset git submodule
cd ~/diversity-for-predictive-success-of-meta-learning/
# adds the submodule to the .gitmodules file & pull the project
git submodule add -f -b hdb --name pytorch-meta-dataset [email protected]:brando90/pytorch-meta-dataset.git pytorch-meta-dataset/
git submodule update --init --recursive --remote pytorch-meta-dataset

# - set up meta-dataset git submodule
# adds the submodule to the .gitmodules file & pull the project
git submodule add -f -b master --name meta-dataset [email protected]:brando90/meta-dataset.git meta-dataset/
# - git submodule update to fetch all the data from that project
git submodule update --init --recursive --remote meta-dataset

# - initialize your local configuration file
git submodule init
# - check the submodules
git submodule status

why do I need to specify the same thing so many times? What is the point of the .gitmodules file then at all? It can't even update things properly without screwing up the rest of the subdmodules

Look at the branchdes:

(meta_learning) brandomiranda~/diversity-for-predictive-success-of-meta-learning ❯ git submodule status                                         
 ca81edbf5093ec5ea1a1f5a4b31ec4078825f44b meta-dataset (arxiv_v1-200-gca81edb)
 6e60161962ae3fa309335da7aa1c675c75ecca54 pytorch-meta-dataset (heads/hdb)

they don't even match my .gitmodules

[submodule "pytorch-meta-dataset"]
    path = pytorch-meta-dataset
    url = [email protected]:brando90/pytorch-meta-dataset.git
    branch = hdb
[submodule "meta-dataset"]
    path = meta-dataset
    url = [email protected]:brando90/meta-dataset.git
    branch = master

related:


Extra: Why does git submodule status not match the output of git branch of my submodule?

Why does it still not work even if I specified the --remote?

(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule add -f -b hdb --name meta-dataset [email protected]:brando90/meta-dataset.git meta-dataset/

Cloning into '/Users/brandomiranda/ultimate-utils/tutorials_for_myself/my_git/meta-dataset'...
remote: Enumerating objects: 2947, done.
remote: Counting objects: 100% (740/740), done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 2947 (delta 689), reused 675 (delta 675), pack-reused 2207
Receiving objects: 100% (2947/2947), 3.17 MiB | 4.51 MiB/s, done.
Resolving deltas: 100% (2248/2248), done.
(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule init

(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule update --init

(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule update --init --remote

(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule status
 ca81edbf5093ec5ea1a1f5a4b31ec4078825f44b meta-dataset (arxiv_v1-200-gca81edb)
(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule update --init --recursive --remote meta-dataset
(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule status                                         
 ca81edbf5093ec5ea1a1f5a4b31ec4078825f44b meta-dataset (arxiv_v1-200-gca81edb)
(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ cd meta-dataset 
(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git/meta-dataset ❯ git branch
* hdb
Multipurpose answered 3/1, 2023 at 0:49 Comment(4)
@VonC git version 2.37.0 (Apple Git-136) it seems.Multipurpose
OK that should be recent enough although if you can update to a more recent version (2.39), you can check if the issue persists.Urticaria
btw, this a highly helpful related question: stackoverflow.com/questions/3796927/… on how to git clone & pull all submodules all at onceMultipurpose
True, I write it in 2010 but kept it up-to-date with the Git 2.23 (Q3 2019) new feature.Urticaria
U
3

It seems I can specify the branch in the .gitmodules file, but when I do git submodule update and variants (e.g. --all, --recursive, etc) it doesn't pull the Git submodule to the right branch.
This is obvious from the git submodule status.
How do I pull and make sure it's in the right branch?
Otherwise what is the point of specifying the branch then?

By default, a submodule does not switch to a branch it would pull. It only check out a SHA1: either the one registered in the index of its parent repository, or the one of a remote tracking branch).

  • either the one registered in the index: that is what a sumodule is: a remote repository URL and a gitlink, that is a SHA1 recorded as a special entry in the index)
  • or the one of a remote tracking branch, meaning the HEAD SHA1 of a remote repository branch, specified as submodule.<name>.branch in the .gitmodules

(That is the main source of "un-intuitivness")

With the --remote, it fetches from the remote, and set HEAD of the submodule to the fetched specified remote tracking branch.

To quickly set all your submodules to an actual branch:

git submodule foreach -q --recursive \
  'git switch \
  $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master)'

The $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master)' part:

  • execute a command in a subshell $(...) ($(command) is known as command substitution. It allows the output of a command to be used as an argument to another command. )
  • get the submodule.$name.branch of the current ($name) submodule, as visited by the git submodule foreach command
  • or return "master" if submodule.$name.branch is not set for that submodule: cmd1 || cmd2 executes cmd2 if cmd1 fails.
  • note: $toplevel gives path to super proj, so it gives the path to .gitmodule

Replace master by main, depending on your remote repositories default branch naming convention.

That won't scale if I have hundreds of submodules. I specified it in my .gitmodules file.

That is what git submodule foreach is for: scaling.


See also "Git: track branch in submodule but commit in other submodule (possibly nested)".
A script like the one below can reliably update/pull all the submodules where a branch is specified.

export top=$(pwd)
git submodule foreach --recursive \
  'b=$(git config -f ${top}/.gitmodules submodule.${path}.branch); \
   case "${b}" in \
     "") git switch ${sha1};; \
      *) git switch ${b}; git pull origin ${b};; \
   esac' 

Make sure to use the latest Git version (2.39+): submodules issues have been fixed over time.


When is the --init needed for git submodules update?

I usually always uses --init with git submodules update simply because I do not have to think to the corner case where the submodule was not yet initialized.
If it was, --init does nothing anyway.

Urticaria answered 3/1, 2023 at 13:53 Comment(13)
this is extremely unintuitive to me. I have .gitmodules file that specifies exactly where repo will be, the url I'm pulling, the branch I want to use. Why would the default beheaviour of git submodule udpate is to ignore my .gitmodule? What is it for then?Multipurpose
I must admit I am in shock this is the right way to do things -- especially that I need to supply arbitrary bash code to git (but appreciate your help) -- or I must have a very wrong common sense for how software tools ought to be built.Multipurpose
what does either the one registered in the index of its parent repository this mean?Multipurpose
one quick question, when is the --init needed for git submodules update?Multipurpose
also, if I already have git submodule file written but the submodules are not in the repo how do I add pull them without doing git submodule add -f -b hdb --name meta-dataset [email protected]:brando90/meta-dataset.git meta-dataset/?Multipurpose
may I request to make the answer self contained and explain what $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master) does? Happy to reward a bounty once SO lets me.Multipurpose
@CharlieParker I have edited the answer to include an explanation regarding $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master)Urticaria
what about $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master || main) to solve the weird renaming of github branches? Would that work?Multipurpose
btw what is this $toplevel env var?Multipurpose
@CharlieParker It should work, yes. And $toplevel is one of the predefined variables when you are using git submodule foreachUrticaria
btw, I worry the above answer we suggested might break git submodule status. I wonder if that is why I have this issue: stackoverflow.com/questions/74998463/… because the branch is correct if one cds to the submodule dir but the submodule status command doesn't think so.Multipurpose
@CharlieParker Let us wait if others can bring answer to your question.Urticaria
Can't award it until 24 hours :/Multipurpose
M
1

My full tested end-to-end example with comments:

# https://mcmap.net/q/13387/-why-do-i-need-to-add-the-remote-to-git-39-s-submodule-when-i-specify-the-branch-in-the-gitmodule-file

# -- pretend you've add the submodules so far
git submodule add -f -b hdb --name meta-dataset [email protected]:brando90/meta-dataset.git meta-dataset/
git submodule add -f -b hdb --name pytorch-meta-dataset [email protected]:brando90/pytorch-meta-dataset.git pytorch-meta-dataset/

# - init local config & try to pull (from remote/branch or initializes your local configuration file and clones the submodules for you, using the commit specified in the main repository.)
#   ref: https://youtu.be/wTGIDDg0tK8?t=119, https://mcmap.net/q/13653/-what-is-the-point-of-39-git-submodule-init-39
git submodule init
git submodule update --init
#git submodule update --init --recursive --remote

git submodule status

# - for each submodule pull from the right branch according to .gitmodule file
# ref: doc for "foreach" cmd: https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-foreach--recursiveltcommandgt
# ref: https://mcmap.net/q/13387/-why-do-i-need-to-add-the-remote-to-git-39-s-submodule-when-i-specify-the-branch-in-the-gitmodule-file#74994315
# note: The command has access to the variables $name, $sm_path, $displaypath, $sha1 and $toplevel...
# note: $toplevel is: $toplevel is the absolute path to the top-level of the immediate superproject.
# note: execute a command in a subshell $(...) ($(command) is known as command substitution. It allows the output of a command to be used as an argument to another command. )
# note: get the submodule.$name.branch of the current ($name) submodule, as visited by the git submodule foreach command.
git submodule foreach -q --recursive \
  'git switch \
  $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master || echo main )'

# - check status of one of the submodules for unit test above worked: https://mcmap.net/q/13650/-why-does-git-submodule-status-not-match-the-output-of-git-branch-of-my-submodule
# note: in case response bellow says origin: "origin" typically refers to a remote repository that is associated with your local repository.
git submodule status
cd meta-dataset
git branch  # should show hdb
cd ..

credit to VonC!


one last lingering issue: Why does git submodule status not match the output of git branch of my submodule?

Multipurpose answered 3/1, 2023 at 0:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.