TL;DR
I think you have hit a bug in Git. To work around it, use --no-single-branch
or configure the branch manually.
Other things to know:
If you have recursive submodules, make sure your Git is recent and use --recommend-shallow
to enable shallow submodules recursively, or --no-recommend-shallow
to disable them.
You may need to do this in two steps. I'll show this as a two-step sequence below. I know this code has evolved a lot between Git 1.7 and current (2.26 or so) Git, and I expect the two-step sequence will work for most older versions too.
The two steps are:
N=... # set your depth here, or expand it in the two commands
git submodule update --init --depth $N --no-single-branch
git submodule update --remote --depth $N
The Git folks have been fixing various shallow-clone submodule bugs recently as part of adding --recommend-shallow
with recursive submodules, so this might all work as one command. Based on the analysis below, it should all work as one command in current Git. However, --no-single-branch
fetches more objects than --single-branch
.
Another option may be to allow single-branch mode but fix the fetch
refspec in the submodule. This requires three steps—well, three separate Git commands, anyway:
branch=... # set this to the branch you want
git submodule update --init --depth $N
(cd path/to/submodule &&
git config remote.origin.fetch +refs/heads/$branch:refs/remotes/origin/$branch)
git submodule update --remote --depth $N
(You could do this in all submodules with git submodule foreach
, but remember to pick the right branch name per-submodule.)
Just in general—this is not specific to your error—I recommend avoiding shallow submodules: they tend not to work very well. If you really want to use them, use a pretty-big depth: e.g., 50, or 100, or more. Tune this based on your own repositories and needs. (Your current setup does allow --depth 1
, provided you work around the other problem.)
Long: it's probably a bug in Git
Note that the analysis below is based on the source code. I have not actually tested this so it's possible I missed something. The principles are all sound, though.
All submodules are always "sha commits", or maybe "sha1" commits—Git used to call them that, but now calls them OIDs, where OID stands for Object ID. A future Git will probably use SHA-2.1 So "OID", or "hash ID" if one wishes to avoid TLA syndrome,2 is certainly a better term. So let me put it this way: all submodules use OID / hash-ID commits.
What do I mean by "all submodules always use OIDs / hash IDs"? Well, that's one of the key to shallow submodules. Shallow submodules are inherently fragile, and it's tricky to get Git to use them correctly in all cases. This claim:
The submodule is tracked by a branch name and not by a sha commit number.
is wrong, in an important way. No matter how hard you try, submodules—or more precisely, submodule commits—are tracked by hash ID.
Now, it's true that there are branch names involved in cloning and fetching in the submodules. When you use --shallow
with submodules, this can become very important, because most servers do not allow fetch-by-hash-ID (side note, Jan 2021: this is changing because some new features in Git need it—GitHub already allow fetch by ID—so over time this situation should improve). The depth you choose—and the single branch name, since --depth
implies --single-branch
—must therefore be deep enough to reach the commit the superproject Git chooses.
If you override Git's tracked-by-hash-ID commit tracking with submodules, you can bypass one fragility issue. That's what you're doing, but you've hit a bug.
1And won't that be fun. Git depends rather heavily on each commit having a unique OID; the introduction of a new OID namespace, so that each Git has two OIDs, with each one being unique within its namespace, means commits won't necessarily have the appropriate OID. All of the protocols get more complicated: any Git that only supports the old scheme requires a SHA-1 hash for the (single) OID, while any Git that uses the new scheme would like a SHA-2 hash, perhaps along with a SHA-1 hash to give to old Gits. Once we have the object, we can use it to compute the other hash(es), but if we only have one of the two hashes, it needs to be the right one.
The straightforward way to handle this is to put the burden of computing the "other guy's hash" on the Git that has the object, in the case of an object existing in a repository that uses a different OID namespace. But SHA-1 Gits cannot be changed, so we can't use that method. The burden has to be on new SHA-2 Gits.
2Note that "SHA" itself is a TLA: a Three Letter Acronym. TLAS, which stands for TLA Syndrome, is an ETLA: an Extended Three Letter Acronym. 😀
How does a superproject Git choose a submodule Git commit?
The git submodule
command is currently still a big shell script, but uses a C language helper for much of its operation. While it is a complex shell script, the heart of it is to run:
(cd $path && git $command)
in order to do things within each submodule. The $path
is the path for the submodule, and $command
is the command to run within that submodule.
There's some chicken-and-egg stuff here though, because $path
is initially just an empty directory: there's no actual clone yet, right after cloning the superproject. Until there is a clone, no Git command will work! Well, nothing except git clone
itself, that is.
Meanwhile, each superproject commit has two items:
- a
.gitmodules
file, listing the name of the submodule and any configuration data, and instructions for cloning it if/when needed; and
- a gitlink for the submodule(s).
The gitlink contains the directive: this commit requires that submodule S be checked out as commit hash hash-value
. At an interesting point below, we get a chance to use or ignore this hash value, but for now, note that each commit, in effect, says: I need a clone, and in that clone, I need one particular commit, by its hash ID.
Cloning a submodule repository
To clone a submodule, we need its URL. We'll run:
git clone $url $path
or maybe:
git clone --depth $N --no-single-branch $url $path
or similar. The URL and path are the most important parts. They're in the .gitmodules
file, but that's not where Git wants them: Git wants them in the configuration file in the Git repository.
Running git submodule init
copies the data from the .gitmodules
file to where Git wants it. This command otherwise does not do anything interesting, really. Nobody seems to use it because git submodule update --init
will do this for you every time. The separate init
command exists so that you can, as the documentation puts it, "customize ... submodule locations" (tweak the URLs).
Running git submodule update
(with or without --remote
, --init
, and/or --depth
) will notice whether the clone exists. It does need the information that git submodule init
would save, so if you haven't done a git submodule init
yet, you need the --init
option to make that happen. If the submodule itself is missing—if the superproject does not yet have a clone of the submodule—git submodule update
will now run git clone
. It's actually the submodule helper that runs git clone
; see line 558 ff., though the line numbers will no doubt change in future Git releases.
Note these things about this git clone
:
- It gets a
--depth
argument if you use --depth
.
- If it does get a
--depth
argument, it sets --single-branch
by default, unless you use --no-single-branch
.
- It creates the actual repository for the submodule, but it is always told
--no-checkout
so it never does an initial git checkout
of any commit.
- It never gets a
-b
/ --branch
argument. This is surprising to me, and possibly wrong, but see clone_submodule
in the submodule--helper.c
source.
Now, combine item 2 with item 4. Cloning with --depth
implies --single-branch
, which sets up the submodule repository to have:
remote.origin.fetch=+refs/heads/<name>:refs/remotes/origin/<name>
as its pre-configured fetch
setting. But Git did not supply a branch name here so the default name
is the one recommended by the other Git, i.e., the Git that you're cloning. It's not any name you have configured yourself, in your superproject.
Using --no-single-branch
on the git submodule update --init
line forces the clone to be made without --single-branch
mode. This gets you --depth
commits from the tip commit of all branches, and leaves the fetch
line configured as:
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
so that your submodule repository has all branch names in it (plus the depth-50, or however deep you specified, commits reachable from those names). Or, as I mentioned at the top, you could use git config
in the submodule, at this point, to fix the remote.origin.fetch
setting.
Checking out the right commit
Once we have a clone, the remaining task is to run the right git checkout
or (other Git command) in the submodule. That is, of the:
(cd $path; git $command)
commands, we now have the path with the submodule work-tree; all we need is to find a hash ID and run git checkout
on that hash ID.
The hash ID is stored in the gitlink. Normally, that's what Git would use here. With --remote
, though, the git submodule
script will now run the submodule helper to figure out the "right" branch name. That is, the submodule helper will find the name you configured, if you configured one, or use the superproject's branch name, if you didn't.
Note that this is rather late: the submodule is already cloned, and already has its remote.origin.fetch
set to some other name. (Unless, perhaps, you're lucky: perhaps the other Git recommended the same name you'll get here with --remote
. But probably not.)
Here is the interesting bit of code, from those source lines I linked above:
# enter here with:
# $sm_path: set to the submodule path
# $sha1: set to the hash from the gitlink
# $just_cloned: a flag set to 1 if we just ran `git clone`
if test $just_cloned -eq 1
then
subsha1= # i.e., set this to the empty string
else
subsha1=(...find hash ID that is currently checked out...)
fi
if test -n "$remote"
then
branch=(...find the branch you want...)
... fetch_in_submodule "$sm_path" $depth ...
sha1=(...use git rev-parse to find the hash ID for origin/$branch...)
fi
if test "$subsha1" != "$sha1" || test -n "$force"; then
... do stuff to the submodule ...
... in this case, git checkout -q $sha1 ...
fi
(I've omitted some irrelevant pieces and replaced a few $(...)
sections with descriptions of what they do, rather than actual code).
What all of this work is about is this:
A submodule repository is normally in detached HEAD mode, with one particular commit checked out by hash ID. Even if it's in the other mode—on a branch, or attached HEAD mode to use the obvious opposite—it still has one particular commit hash ID checked out.
(The only real exception here is right after the initial clone, when literally nothing is checked out.)
The subsha1
code section figures out which hash ID that is.
The remainder of the code figures out which hash ID should be checked out. With the --remote
option, you tell the superproject Git: ignore the gitlink setting entirely. All other options use the gitlink setting, and any of those can cause trouble with --depth 1
.
Your error message is triggered here
You're using --remote
to tell your superproject Git: ignore the gitlink hash ID. This uses the branch=(...)
and then sha1=(...)
assignments to override the gitlink hash ID.
That sha1=
assignment is literally this code:
sha1=$(sanitize_submodule_env; cd "$sm_path" &&
git rev-parse --verify "${remote_name}/${branch}") ||
die "$(eval_gettext "Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"
and here you'll recognize the error message you are getting:
Unable to find current origin/version/3.2.0-era revision in submodule path '...'
Now, a git fetch
command should, one might hope, have fetched the commit named by the branch-name version/3.2.0-era
. If it did fetch that commit, one would hope that it would have updated the right remote-tracking name, in this case, origin/version/3.2.0-era
.
The only candidate git fetch
command, however, is the one invoked by:
fetch_in_submodule "$sm_path" $depth
This command runs git fetch
with the --depth
parameter you provided. It doesn't provide any branch names! Other fetch_in_submodule
calls, particularly this one on line 628, provide a raw hash ID (still not a branch name), but this only provides the --depth
argument if you gave one.
Without a refspec, such as a branch name, git fetch origin
only fetches whatever is configured in remote.origin.fetch
. That's the name from the other Git.
If the fetch=
setting doesn't fetch the desired branch name—and with a single-branch clone, that's pretty likely here—the git fetch
won't fetch the commit we want, and the subsequent git rev-parse
to turn the remote-tracking name origin/$branch
into a hash ID will fail. That's the error you're seeing.
I am not going to try to say exactly where the bug is—and therefore, how to fix it, in terms of setting the right configuration and/or issuing a git fetch
with appropriate arguments—here, but clearly the current Git setup doesn't work for your case. In the end, though, what Git tries to do here is find the right OID, or in this case, fail to find it.
Having found the right OID—using git rev-parse origin/version/3.2.0-era
for your particular case—your superproject Git would then run:
(cd $path; git checkout $hash)
in the submodule, leaving you with a detached HEAD pointing to the same hash ID you asked for by branch-name. When you fix the problem, you will be in this commit-by-OID detached-HEAD mode. The only way to get out of it is manual: you have to do your own (cd $path; git checkout branch-name)
operation.
If you ever don't use git submodule update --remote
—if you have your CI system build the commit that the superproject repository says to build, rather than depending on some branch name that's under someone else's control—a shallow clone must contain that commit after a git fetch
. This is where the depth stuff is fragile: how deep should N be? There isn't a right answer, which is why you have to set it yourself.
If you configure the origin
Git with uploadpack.allowReachableSHA1InWant
or uploadpack.allowAnySHA1InWant
set to true
, the git fetch
-by-hash-ID can fetch an arbitrary commit, allowing --depth 1
to work, but you need to have control over the origin
Git repository to do this (and see the caveats in the git config
documentation regarding these settings).
--depth 5
or10
, just for testing) – Twospot