Your mistake here lies in thinking that submodules should work. 😀
OK, to be fair to submodules and Git, let's make that: should work automatically. Submodules can be made to work, but it's painful. (This is why some call them sob-modules.)
The root of the problem is that a submodule is some other Git repository. Moreover, it's usually a clone of a third Git repository over which you may have little or no control. Each Git repository—each clone of a Git repository—is an island unto itself. ("No man is an island", but every Git repository is one.)
For a Git repository to be a submodule, it must—by definition—be controllled by some other Git repository. Yet the two Git repositories involved in this insist that they shall never be controlled. So we have a problem.
Git's solution to this problem goes like this:
In the superproject repository R, which would like to control submodule repository S, we place two things:
- There is a file called
.gitmodules
(in every commit, as files always are in Git, so that it's in the current commit no matter which commit you check out in R). This file lists what the superproject Git will need to run git clone
to create S.
- In each commit in R that uses some commit from S, there is an entity that Git calls a gitlink. Git will copy this entity out of a commit into Git's index / staging-area.
Once the submodule S exists—whether you made it yourself, or let a Git command run in R create it—we'll have the Git commands that you run in R run git switch --detach hash
in S.
What this means is that R is in charge of which commit is to be used in S. Every commit you make in R lists the exact commit hash ID in S that will go with that commit in R.
Running:
git submodule update
(with no other options) is a directive to the Git commands controlling R that they should:
- read the hash ID for S from R's index / staging-area;
- run
git switch --detach hash
in S using that hash ID.
Until you change the hash ID there, git submodule update
will keep checking out that particular commit.
On the other hand, running:
git submodule update --remote
means something very different. Here, the Git operating in R enters S and runs:
git fetch
This causes the Git operating in S to reach out to the Git from which S was first cloned (S's origin
) and see what new commits they have that S doesn't. Those new commits go into the S clone you have locally. They aren't being used yet, but now they exist. The git fetch
operation also updates the various remote-tracking names such as origin/main
and origin/xyzbranch
within your clone S.
Now that this is done, the Git running on behalf of R executes:
git rev-parse origin/main
or whatever other name you've chosen, to find out what commit S's origin
's main
identifies, by hash ID. That hash ID, whatever it is, is now used with the usual:
git switch --detach <hash>
so that S's current commit is now the commit found by their origin/main
or whatever.
That commit is checked out in S, but it's not listed in R anywhere. Running git submodule status
or git status
in R will show that S is out of sync with the hash ID that the index/staging-area for R says that S should have.
To update the Git index in R, you must now run:
git add path/to/submodule
which records the hash ID that's actually checked out in S, in the index that the R Git is using for R. This is not yet committed: like anything in Git's index / staging-area, it's simply ready to go into the next commit you make. You can now update any other files in R as well if necessary, and git add
those, and then run git commit
to make a new commit with a new gitlink.
The new R commit will now call for the commit in S that you obtained when you ran git submodule update --remote
from R to update your S from S's origin
. Note that none of these have anything to do with R itself, and you don't have to pick out an S commit by doing git submodule update --remote
. Since S is a repository, you can enter the submodule:
cd path/to/submodule
and you're now operating in a Git for S instead of a Git for R. You can now do everything you'd do in any ordinary repository, because you're in any ordinary repository. It's just that this ordinary repository is acting as a submodule too. So once you get S onto a commit you like—even if you have to make this commit—you can pop back over to repository R and git add path/to/submodule
to get the new hash ID recorded.
Remember, though, that if and when you make a new commit in R and git push
that commit to R's origin
, someone else can grab the new commit from that (fourth) Git repository to their (fifth) Git repository that's a clone of R. There's no problem so far, but if they now check out your new commit, that commit you just made says that they should control their S clone by checking out the commit you made in your S clone. If you have not yet sent this commit to someplace that they can find it, they will now get an error if they run git submodule update
in their R clone.
(By this point we're up to six or eight or maybe even 42 clones depending on how many submodules you're using, and it's pretty confusing. The key is to remember that superprojects—Rs in the above notation—call for commits in their Submodules by raw hash ID, which means that anyone who clones the submodule needs to get a commit with that hash ID, which means that you typically need to git push
in the submodules before you git push
in the superprojects. Since all we ever do with any repository is add new commits—we never run git reset
or git push --force
or git rebase
, right?—this always works. Well, until we start using reset, rebase, and forced pushing, or forget about the restrictions.)
.gitmodules
path
and running git submodule update --remote. Although things look right, I'd still like to know if this is a valid way to go, and what exactly happened here – Cleveland/mybranch
and now it points to the latest commit, but I'm sure there's an easier way to fix this – Cleveland