Git submodule update gets stuck at old commit
Asked Answered
C

1

7

Ok this one is driving me crazy. I tried committing/pushing/updating a parent repository with a file from its submodule open in another program (a XLS spreadsheet). The operation "succeeded" with only a Couldn't unlink old somefile.xls warning.

Now I'm trying to git submodule update and it keeps pointing to an old commit several steps back. Git log on the submodule main branch shows that HEAD is the latest commit both locally and remotely, but whenever I cd back and forth to the parent repository it ends detached on this old commit.

I tried manually updating the reference in .git/modules/mysubmodule/HEAD (which is pointing to this old commit) but apparently that's not how things work. How can I get out of this frustrating loop? I suppose making some insignificant changes to the submodule and making a new commit could fix it (I tried an empty commit without luck though), but I want to better understand what happened so I can avoid this situtation in the future.

Here's my submodule git log:

commit 713a39e531463eb9a9a608344ca39acbe520c7c4 (HEAD -> main, origin/main, origin/HEAD)

Here's what git submodule update outputs:

Submodule path 'data': checked out '7e4dc2354f5e60a8efb101a5d8a03466a911d86f'

Cleveland answered 19/11, 2021 at 18:59 Comment(2)
Ok so apparently I fixed it by adding /mybranch to .gitmodules path and running git submodule update --remote. Although things look right, I'd still like to know if this is a valid way to go, and what exactly happened hereCleveland
on the parent Github repo, the submodule was no longer a link to the child repo but is instead a 'static' line. I made yet another commit removing /mybranch and now it points to the latest commit, but I'm sure there's an easier way to fix thisCleveland
A
11

Your mistake here lies in thinking that submodules should work. 😀

OK, to be fair to submodules and Git, let's make that: should work automatically. Submodules can be made to work, but it's painful. (This is why some call them sob-modules.)

The root of the problem is that a submodule is some other Git repository. Moreover, it's usually a clone of a third Git repository over which you may have little or no control. Each Git repository—each clone of a Git repository—is an island unto itself. ("No man is an island", but every Git repository is one.)

For a Git repository to be a submodule, it must—by definition—be controllled by some other Git repository. Yet the two Git repositories involved in this insist that they shall never be controlled. So we have a problem.

Git's solution to this problem goes like this:

  • In the superproject repository R, which would like to control submodule repository S, we place two things:

    1. There is a file called .gitmodules (in every commit, as files always are in Git, so that it's in the current commit no matter which commit you check out in R). This file lists what the superproject Git will need to run git clone to create S.
    2. In each commit in R that uses some commit from S, there is an entity that Git calls a gitlink. Git will copy this entity out of a commit into Git's index / staging-area.
  • Once the submodule S exists—whether you made it yourself, or let a Git command run in R create it—we'll have the Git commands that you run in R run git switch --detach hash in S.

What this means is that R is in charge of which commit is to be used in S. Every commit you make in R lists the exact commit hash ID in S that will go with that commit in R.

Running:

git submodule update

(with no other options) is a directive to the Git commands controlling R that they should:

  • read the hash ID for S from R's index / staging-area;
  • run git switch --detach hash in S using that hash ID.

Until you change the hash ID there, git submodule update will keep checking out that particular commit.

On the other hand, running:

git submodule update --remote

means something very different. Here, the Git operating in R enters S and runs:

git fetch

This causes the Git operating in S to reach out to the Git from which S was first cloned (S's origin) and see what new commits they have that S doesn't. Those new commits go into the S clone you have locally. They aren't being used yet, but now they exist. The git fetch operation also updates the various remote-tracking names such as origin/main and origin/xyzbranch within your clone S.

Now that this is done, the Git running on behalf of R executes:

git rev-parse origin/main

or whatever other name you've chosen, to find out what commit S's origin's main identifies, by hash ID. That hash ID, whatever it is, is now used with the usual:

git switch --detach <hash>

so that S's current commit is now the commit found by their origin/main or whatever.

That commit is checked out in S, but it's not listed in R anywhere. Running git submodule status or git status in R will show that S is out of sync with the hash ID that the index/staging-area for R says that S should have.

To update the Git index in R, you must now run:

git add path/to/submodule

which records the hash ID that's actually checked out in S, in the index that the R Git is using for R. This is not yet committed: like anything in Git's index / staging-area, it's simply ready to go into the next commit you make. You can now update any other files in R as well if necessary, and git add those, and then run git commit to make a new commit with a new gitlink.

The new R commit will now call for the commit in S that you obtained when you ran git submodule update --remote from R to update your S from S's origin. Note that none of these have anything to do with R itself, and you don't have to pick out an S commit by doing git submodule update --remote. Since S is a repository, you can enter the submodule:

cd path/to/submodule

and you're now operating in a Git for S instead of a Git for R. You can now do everything you'd do in any ordinary repository, because you're in any ordinary repository. It's just that this ordinary repository is acting as a submodule too. So once you get S onto a commit you like—even if you have to make this commit—you can pop back over to repository R and git add path/to/submodule to get the new hash ID recorded.

Remember, though, that if and when you make a new commit in R and git push that commit to R's origin, someone else can grab the new commit from that (fourth) Git repository to their (fifth) Git repository that's a clone of R. There's no problem so far, but if they now check out your new commit, that commit you just made says that they should control their S clone by checking out the commit you made in your S clone. If you have not yet sent this commit to someplace that they can find it, they will now get an error if they run git submodule update in their R clone.

(By this point we're up to six or eight or maybe even 42 clones depending on how many submodules you're using, and it's pretty confusing. The key is to remember that superprojects—Rs in the above notation—call for commits in their Submodules by raw hash ID, which means that anyone who clones the submodule needs to get a commit with that hash ID, which means that you typically need to git push in the submodules before you git push in the superprojects. Since all we ever do with any repository is add new commits—we never run git reset or git push --force or git rebase, right?—this always works. Well, until we start using reset, rebase, and forced pushing, or forget about the restrictions.)

Amylene answered 20/11, 2021 at 2:27 Comment(8)
hi @Amylene thanks for the rather detailed explanation. I'm somewhat used to the basic or "ideal" submodule workflow (commit/push changes in S, cd back to R, add/commit/push the submodule changes). Long story short, if I understand this correctly, whenever R stops pointing at S's HEAD and gets stuck at some old commit (which I still don't understand why happened with the "couldn't unlink error", if you could kindly expand on that), I should run update --remote to force the change in the <hash> that will be used by switch, then run add and commit as usual to update the submodule referenceCleveland
did I get it right?Cleveland
For the unlink error: you're probably on Windows, which is (in)famous for that sort of thing.It just means that whatever file got that complaint, that one file is now "stale". To fix that, make whatever it is that has the file open let it go (close the file / exit the app) and then Git can update the file.Amylene
For the submodule: a submodule S is always "stuck on some old commit". It's just that when you update --remote from R and then add and commit, the "old" commit it's now "stuck on" is the latest one (i.e., the "age" of this "old" commit is "super-fresh", not old at all). So, yes, that's right.Amylene
right, but when I don't mess up I'm normally able to make changes to S, commit/push them, then cd back to R and run 'submodule update' (without --remote) successfully, which updates the commit hash before I commit/push the updated submodule link. This time around this workflow wasn't working no matter what, in my mind because of the unlink. For Git to update the file after I close the app, is that simply a new commit?Cleveland
sorry for insisting on this, I just want to make sure I understand what happenedCleveland
The git submodule update operation normally means "run git -C path/to/submodule checkout <hash>" for some <hash>. The options, if any, to git submodule update control the hash, and in a few cases, the command as well (replacing checkout). Without some argument after update, the update mode comes from git config --get submodule.<name>.update. If you have that set to merge, you could (I think) get the behavior you describe.Amylene
Meanwhile: the failure to update a file in the working tree (due to the unlink issue) doesn't affect any commits, and does not create a new commit. So the difference in behavior you saw is mysterious. If you can come up with a reproducer, we could track it down precisely.Amylene

© 2022 - 2024 — McMap. All rights reserved.