Why does git say 'Changes not staged for commit' and indicate the submodule folder?
Asked Answered
P

2

12

I have a git submodule inside another module, added via git submodule add <...> (command issued from parent repo), so the .gitmodules file is automatically generated inside the parent repo.

Suppose I make a change to the submodule (edit: and do not commit those changes) and then navigate back out to the parent and do git add -A and then git status, it says "Changes not staged for commit: submodule dir name ... etc".

I thought git would read .gitmodules file (which the parent git generated!), realise its a git submodule directory and therefore not mention its unstaged status when I ask the parent for its status?

Pseudoscope answered 28/1, 2019 at 4:44 Comment(8)
"edit: and do not commit those changes": yes, that is the point of git status --ignore-submodules=dirty: ignore the state of the submodule, which is "dirty" since you did not commit anything within the submodule.Unhand
Is this an accurate understanding: The parent git keeps track of which SHA1 commit of the submodule it "should" be compliant with; if changes are made in the submodule and committed in the submodule then a git status in parent will warn you that the submodule has commits you haven't "acknowledged" as a parent - even though the parent still sees the version of the submodule which is checked out. Until you actually then commit in parent repo to tell git "go ahead and update the commit hash for the latest submodule version".?Pseudoscope
Yes: when you commit in the submodule, you need to push, then go back to the parent repo, add, commit and push: you would add the revised "gitlink" (the SHA1 reference) of the submodule and its new commit.Unhand
But when you just modifies files within the submodule (without committing anything), what a git status (done in the parent repo) shows is that you have a submodule in a "dirty" state: in other words, you do not know if your parent repo compile or work because of those local non-committed changes done in the submodule. Or if it would compile/work without those local changes.Unhand
So dirty means submodule has non-committed changes? But it's not really very different to just having changes in submodule committed in submodule but parent repo still not being up to date with latest version - the submodule maybe isn't called dirty in the latter case, but in both cases you don't know if the parent repo will compile/work etc.Pseudoscope
In the second case (changes committed in the submodule) you know with which exact version of the submodule your parent projet works: the one with the new commit (immutable). That is the "dirty" state. In the first case, you don't: those local changes can be overridden at any moment and are not referenced by Git: that is "untracked".Unhand
Forgive me but two comments ago you have written that modifications in submodule which are not committed is what we call the dirty submodule state. Your latest comment begins by saying dirty refers to when the changes are committed in the submodule.:/Pseudoscope
Agreed. Between those comments, I re-read the doc ;) (the one I mention in my answer: git-scm.com/docs/…)Unhand
M
16

What's going on here is that your submodule repository is on a different commit than the hash ID recorded in the superproject. Your git status, run in the superproject, is telling you this, without changing it, and your git add -A apparently did not change it either.

This last part seems wrong. When I do something similar, and then use git add -A, I get:

Changes to be committed:
(use "git reset HEAD <file>..." to unstage)

        modified:   [submodule path]

If I then run two more commands, it goes back, as I expect:

$ git reset
Unstaged changes after reset:
M       [submodule path]
$ git submodule update
Submodule path [path]: checked out '[hash]'
$ git status
On branch ...
nothing to commit, working tree clean

(I suspect that you've made some change(s) in the submodule but never committed them there.)

What's going on, in fine grained detail that will let you diagnose the problem

We have one Git repository, called the superproject, that is controlling a second repository, called the submodule. The superproject actually has three separate control knobs, one of which is present in each commit, and is therefore also found in the index (since the index controls what will go into the next commit).

One of these control knobs is the file you mentioned, .gitmodules. It tells the superproject how to clone the submodule if the submodule is not yet git cloned. Once the submodule is cloned, its main job is done.

The second is your .git/config file. It contains information copied out of the .gitmodules file, which you can update if needed, if the .gitmodules file is not quite right for your own purposes (which might differ from those of whoever's in charge of the .gitmodules file). Any settings in your .git/config override those in .gitmodules. Otherwise these two places to put settings are essentially equivalent.

The last is the one causing the issue. For a submodule to become checked out into your work-tree, and hence to be useful to you, the Git that's in control of the superproject spins up a second set of Git commands. In general, you might run:

git submodule update --init

to get the submodule checked out (though if you use git clone --recursive, Git does this for you).

At this point, the superproject Git has made an almost-empty directory with the correct path. (The directory contains a .git file naming the path to the cloned repository, or in the old days or using the old style backwards-compatibility mode, contains the actual .git directory itself.) The superproject Git chdirs into this directory and tells the submodule Git:

  • run git checkout hash

Once that's happened, the path is full of files extracted from the commit whose ID is hash, which mostly makes the outer Git (the superproject) "done" with the files. But there is a side effect, because the submodule is itself a full Git repository, with everything that this means.

In particular, the subproject has its own HEAD. This HEAD is now detached and the submodule's repository's current commit is hash, so that this is in the index and work-tree of the submodule, which is of course what we wanted: the work-tree of the submodule is the path in the superproject where all the submodule files go.

But there's an interesting question to answer: Where did the superproject Git get the hash ID? The answer is: it's stored in every snapshot—well, every snapshot that uses the submodule—in the superproject, the same way every snapshot has a full, complete copy of every file. To make that happen, the index for the superproject contains a special entry of type gitlink.

This gitlink entry in the superproject index tells the superproject which hash ID to give to the submodule Git whenever the superproject tells the submodule Git: check out some particular commit.

If you, manually, navigate into the submodule, and git checkout a branch name, or any other commit by hash ID, the submodule repository's HEAD changes. It either becomes attached to the branch name, or it points to the other commit, still in detached-HEAD mode.

At this point, the submodule and the superproject are out of sync. The superproject Git does not do anything about this yet. You are in control, you choose which commit you want. You can even make new commits and git push them to some upstream. Once you've done all of the committing and git checkout-ing that you want, and have everything arranged correctly, you should climb out of the submodule work-tree back into your superproject.

Now git status and git diff will, by default—there are a ton of control knobs here too—tell you that the superproject is calling for some hash H, but the submodule has some other hash S checked-out. (They may or may not also tell you if the submodule itself needs a commit made, if you set the control knobs for this.) If you wish your next superproject commit to record, in the gitlink for this submodule, this new commit hash S, you run:

git add path-to-submodule

(or git add -A should do the same thing, which is why this is puzzling). That will update the gitlink in your index to record hash ID S rather than H, so that the next superproject commit will, on a git submodule update command, tell the submodule Git: check out commit S, as your detached HEAD.

Once the index in the superproject matches the HEAD in the actual checked-out submodule, the submodule won't be listed in the changes not staged for commit section. If the hash in the gitlink in the index does not match the hash in the gitlink in HEAD, git status will list the submodule's path in changes to be committed.

Margarite answered 28/1, 2019 at 5:52 Comment(3)
After I check out with submodule seems I can not come back and checkout main module and every commit I will try is failed cause files are not tracked.Raffinate
@Mahdi: if you can narrow your problem down to a minimal reproducible example you'll probably find an existing StackOverflow answer, and if not, you'll have the example to post as a new question.Margarite
got tier post, still stuck unfortunately but I understand this all so much better now.Wedding
U
2

and therefore not mention its unstaged status when I ask the parent for its status?

It would still report changes in the submodules, unless you are using (with Git 1.7.2 or more):

You can see the original discussion (back in 2010, for Git 1.7.x) here, which lead to that feature:

By the way, I think that route of action would make the resulting git internally consistent in that everything by default will report submodules with untracked paths in its working tree as dirty.

  • In the "Untracked" section of "git status" output, we list an untracked path in the superproject (i.e. the one in which "git status" was run) to remind the user that the path might be a new file forgotten to be added (unless of course it is ignored).
    But it does not make the working tree dirty.

  • When you have an untracked path in a submodule:

    • the submodule is listed in the "Changed but not updated" section.
      This also makes the working tree of the superproject dirty, even though the working tree of the submodule is not.

    • "git diff" output at the superproject level shows that the submodule has modifications (i.e. "-dirty" is shown), but when run inside the submodule, there is no change shown.

I think this is a misdesign at the UI level; reporting an untracked and unignored path as potential mistake to remind the user is a good thing, but the current way "status" and "diff" does so does not make much sense to me.

Unhand answered 28/1, 2019 at 5:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.