How does Git's transfer protocol work
Asked Answered
G

5

6

I am working with Git for more than one year and now I have to explain it to others in our group. That is why I need a bit more backround. I went thourgh most of the Git Book in the last year and recently I continued with chapter 10. In chapter 10.6 I got completely stuck:

Let’s follow the http-fetch process for the simplegit library:

$ git clone http://server/simplegit-progit.git

The first thing this command does is pull down the info/refs file. This file is written by the update-server-info command, which is why you need to enable that as a post-receive hook in order for the HTTP transport to work properly:

=> GET info/refs
ca82a6dff817ec66f44342007202690a93763949     refs/heads/master

I have a small test repo https://github.com/to_my/repo and git clone works well. But

  • Where is the folder info/refs? I only find a /.git/info/exclude afther the clone...
  • How should I use the update-server-info command? Is it part of git clone somehow?
  • I am competely lost with "...which is why you need to enable that as a post-receive hook" although I understand hooks (I thought) and use a pre-commit hook for automatically increasing the package version.
  • I can't get the command GET info/refs in git bash work.

Sorry if the questions are stupid, but I just don't understand how to put these pieces from the documentation together.

Glasscock answered 23/3, 2017 at 13:17 Comment(1)
Starting Q2 2018 and Git 2.18, you will have the Git transfer protocol v2: See my answer belowChoctaw
C
3

Where is the folder info/refs? I only find a /.git/info/exclude afther the clone...

There is no such folder (it's not a directory), but that—.git/info/refs—would be where the file would be, if there were a file there.

How should I use the update-server-info command? Is it part of git clone somehow?

In general, you should not use it: it's only for "dumb" transports. "Smart" (two way conversation) transports don't need it.

I am competely lost with "...which is why you need to enable that as a post-receive hook" although I understand hooks (I thought) and use a pre-commit hook for automatically increasing the package version.

If, for some reason, you want to enable dumb transports, you need to run something to create or update several files every time they need creating or updating. The info/refs file needs to be updated whenever references change, so a good place to run the "something" is in a post-receive hook. The "something" is the command git update-server-info.

Note that if you are not running a push-only bare repository on a server, having a post-receive script run git update-server-info is not sufficient, since commits and other objects can be added by other means (manual git commits for instance). In this case you might use, e.g., a cron job to create-or-update dumb-transport information on a clock-driven basis.

I can't get the command GET info/refs in git bash work.

If the file existed, you would obtain it via HTTP, e.g., from a browser or with the curl command.

Carnauba answered 23/3, 2017 at 13:53 Comment(9)
This clarifies a bit. Your link to larsks's answer is broken? Can you update that? After reading the first sections of chapter 10 I wonder what I can learn from chapter 10.6: Either I know everything and I don't need the chapter, or I read it and even with your additional explanations my gutt feeling is that I only gain little. I had the hope to get a better understanding of how https-server/master, origin/master and master (locally) play together and why the additional origin/master is really needed...Glasscock
@Glasscock - the relationship between origin/master and master - and the purpose of origin/master - aren't really related to the transport. I'd check out chapter 3.5 for info on remote branches. I don't know what you mean https-server/master; do you have a remote named https-server or are you denoting something different here?Daggett
Ah, he deleted that answer, unfortunately. I'm not sure the Pro Git book's description of the transfer protocols is all that useful anyway, though. I started writing my own book (not specific to Git) but have had no time to work on it; but in it, I do have a chapter on distributing repositories... raw PDF is available hereCarnauba
@MarkAdelsberger I mean the following: There is a server like github (with an master branch) and a local repo with a master branch. That is always clear. But why is there a second local origin/master branch introduced? My feeling is, that it is needed for pushing and pulling correctly (you must know the last status of the server) but I don't feel like I am 100% sure what I say. That is why I got interested in what is going on, when you transfer data...Glasscock
@Glasscock - Well, like I said, you'll learn more about that from Chapter 3 than you will from Chapter 10. As an example of what origin/master is for, consider that only fetch, push, or pull access the remote(s); so how does git status know that your branch is n commits ahead of / behind the origin?Daggett
Git's problem is that it needs at least one name to find each node in its commit DAG. Names like origin/master provide reachability for these commits. See chapter 2 of my proto-book for details.Carnauba
@MarkAdelsberger Really good point about git status!!! Perhaps this should go to the docs (unless I didn't get it;-) Thanks a lot.Glasscock
@Carnauba Thanks a lot for the book. Seems to be worth reading, I already found good explanations in there! Where can I find something similar to GET info/refs? I found a lot of gets, but not within this context.Glasscock
You can't, yet—I have not gotten to those details. But you can demonstrate what one Git can see from another Git, as it starts up the whole git fetch process: from your own repository that has an origin, run git ls-remote origin. This calls up the other Git and queries it for all of its references, then simply prints them out for you.Carnauba
C
8

Note: starting Git 2.18 (Q2 2018), the git transfer protocol evolves with a v2 which is implemented.
With Git 2.26 (Q1 2020), it is the default. It is not in 2.27 (Q2 2020, see the end of this answer, and the follow-up answer). It is again in 2.28 (Q3 2020)

See commit a4d78ce, commit 0f1dc53, commit 237ffed, commit 884e586, commit 8ff14ed, commit 49e85e9, commit f08a5d4, commit f1f4d8a, commit edc9caf, commit 176e85c, commit b1c2edf, commit 1aa8dde, commit 40fc51e, commit f7e2050, commit 685fbd3, commit 3145ea9, commit 5b872ff, commit 230d7dd, commit b4be741, commit 1af8ae1 (15 Mar 2018) by Brandon Williams (mbrandonw).
(Merged by Junio C Hamano -- gitster -- in commit 9bfa0f9, 08 May 2018)

The full specification is in Documentation/technical/protocol-v2.txt:

Protocol v2 will improve upon v1 in the following ways:

  • Instead of multiple service names, multiple commands will be supported by a single service
  • Easily extendable as capabilities are moved into their own section of the protocol, no longer being hidden behind a NUL byte and limited by the size of a pkt-line
  • Separate out other information hidden behind NUL bytes (e.g. agent string as a capability and symrefs can be requested using 'ls-refs')
  • Reference advertisement will be omitted unless explicitly requested
  • ls-refs command to explicitly request some refs
  • Designed with http and stateless-rpc in mind. With clear flush semantics the http remote helper can simply act as a proxy

In protocol v2 communication is command oriented.
When first contacting a server a list of capabilities will advertised. Some of these capabilities will be commands which a client can request be executed. Once a command has completed, a client can reuse the connection and request that other commands be executed.

info/refs remains server endpoint to be queried by a client, as explained in HTTP Transport section:

When using the http:// or https:// transport a client makes a "smart" info/refs request as described in http-protocol.txt and requests that v2 be used by supplying "version=2" in the Git-Protocol header.

C: Git-Protocol: version=2
C:
C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0

A v2 server would reply:

   S: 200 OK
   S: <Some headers>
   S: ...
   S:
   S: 000eversion 2\n
   S: <capability-advertisement>

Subsequent requests are then made directly to the service $GIT_URL/git-upload-pack. (This works the same for git-receive-pack).

The goal is to have more capabilities:

There are two different types of capabilities:

  • normal capabilities, which can be used to to convey information or alter the behavior of a request, and
  • commands, which are the core actions that a client wants to perform (fetch, push, etc).

Protocol version 2 is stateless by default.
This means that all commands must only last a single round and be stateless from the perspective of the server side, unless the client has requested a capability indicating that state should be maintained by the server.

Clients MUST NOT require state management on the server side in order to function correctly.
This permits simple round-robin load-balancing on the server side, without needing to worry about state management.

Finally:

ls-refs is the command used to request a reference advertisement in v2.
Unlike the current reference advertisement, ls-refs takes in arguments which can be used to limit the refs sent from the server.

And:

fetch is the command used to fetch a packfile in v2.
It can be looked at as a modified version of the v1 fetch where the ref-advertisement is stripped out (since the ls-refs command fills that role) and the message format is tweaked to eliminate redundancies and permit easy addition of future extensions.


Since that commit (May 10th), the protocol V2 has officially been announced (May 28th) in the Google blog post "Introducing Git protocol version 2" by Brandon Williams.

In both cases:

Additional features not supported in the base command will be advertised as the value of the command in the capability advertisement in the form of a space separated list of features: "<command>=<feature 1> <feature 2>"


See also commit 5e3548e, commit ff47322, commit ecc3e53 (23 Apr 2018) by Brandon Williams (mbrandonw).
(Merged by Junio C Hamano -- gitster -- in commit 41267e9, 23 May 2018)

serve: introduce the server-option capability

Introduce the "server-option" capability to protocol version 2.
This enables future clients the ability to send server specific options in command requests when using protocol version 2.

fetch: send server options when using protocol v2

Teach fetch to optionally accept server options by specifying them on the cmdline via '-o' or '--server-option'.
These server options are sent to the remote end when performing a fetch communicating using protocol version 2.

If communicating using a protocol other than v2 the provided options are ignored and not sent to the remote end.

Same is done for git ls-remote.


And the transfer protocol v2 learned to support the partial clone seen in Dec. 2017 with Git 2.16.

See commit ba95710, commit 5459268 (03 May 2018), and commit 7cc6ed2 (02 May 2018) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit 54db5c0, 30 May 2018)

{fetch,upload}-pack: support filter in protocol v2

The fetch-pack/upload-pack protocol v2 was developed independently of the filter parameter (used in partial fetches), thus it did not include support for it. Add support for the filter parameter.

Like in the legacy protocol, the server advertises and supports "filter" only if uploadpack.allowfilter is configured.

Like in the legacy protocol, the client continues with a warning if "--filter" is specified, but the server does not advertise it.


Git 2.19 (Q3 2018) improves the fetch part of the git transfer protocol v2:

See commit ec06283, commit d093bc7, commit d30fe89, commit af1c90d, commit 21bcf6e (14 Jun 2018), and commit af00855, commit 34c2903 (06 Jun 2018) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit af8ac73, 02 Aug 2018)

fetch-pack: introduce negotiator API

Introduce the new files fetch-negotiator.{h,c}, which contains an API behind which the details of negotiation are abstracted

fetch-pack: use ref adv. to prune "have" sent

In negotiation using protocol v2, fetch-pack sometimes does not make full use of the information obtained in the ref advertisement: specifically, that if the server advertises a commit that the client also has, the client never needs to inform the server that it has the commit's parents, since it can just tell the server that it has the advertised commit and it knows that the server can and will infer the rest.


Git 2.20 (Q4 2018) fixes git ls-remotes:

See commit 6a139cd, commit 631f0f8 (31 Oct 2018) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 81c365b, 13 Nov 2018)

git ls-remote $there foo was broken by recent update for the protocol v2 and stopped showing refs that match 'foo' that are not refs/{heads,tags}/foo, which has been fixed.


And Git 2.20 fixes git fetch, which was a bit loose in parsing responses from the other side when talking over the protocol v2.

See commit 5400b2a (19 Oct 2018) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit 67cf2fa, 13 Nov 2018)

fetch-pack: be more precise in parsing v2 response

Each section in a protocol v2 response is followed by either a DELIM packet (indicating more sections to follow) or a FLUSH packet (indicating none to follow).

But when parsing the "acknowledgments" section, do_fetch_pack_v2() is liberal in accepting both, but determines whether to continue reading or not based solely on the contents of the "acknowledgments" section, not on whether DELIM or FLUSH was read.

There is no issue with a protocol-compliant server, but can result in confusing error messages when communicating with a server that serves unexpected additional sections. Consider a server that sends "new-section" after "acknowledgments":

  • client writes request
    • client reads the "acknowledgments" section which contains no "ready", then DELIM
    • since there was no "ready", client needs to continue negotiation, and writes request
    • client reads "new-section", and reports to the end user "expected 'acknowledgments', received 'new-section'"

For the person debugging the involved Git implementation(s), the error message is confusing in that "new-section" was not received in response to the latest request, but to the first one.

One solution is to always continue reading after DELIM, but in this case, we can do better.

We know from the protocol that:

  • "ready" means at least the packfile section is coming (hence, DELIM) and that:
  • no "ready" means that no sections are to follow (hence, FLUSH).

So teach process_acks() to enforce this.


Git 2.21 will bring an actual official support of the V2 protocol for fetch pack:

See commit e20b419 (18 Dec 2018) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit d3b0178, 29 Jan 2019)

fetch-pack: support protocol version 2

When the scaffolding for protocol version 2 was initially added in 8f6982b ("protocol: introduce enum protocol_version value protocol_v2", 2018-03-14, Git v2.18). As seen in:

git log -p -G'support for protocol v2 not implemented yet' --full-diff --reverse v2.17.0..v2.20.0

Many of those scaffolding "die" placeholders were removed, but we hadn't gotten around to fetch-pack yet.

The test here for "fetch refs from cmdline" is very minimal. There's much better coverage when running the entire test suite under the WIP GIT_TEST_PROTOCOL_VERSION=2 mode, we should ideally have better coverage without needing to invoke a special test mode.


Git 2.22 (Q2 2019) adds: "git clone" learned a new --server-option option when talking over the protocol version 2.

See commit 6e98305, commit 35eb824 (12 Apr 2019) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit 6d3df8e, 08 May 2019)

clone: send server options when using protocol v2

Commit 5e3548e ("fetch: send server options when using protocol v2", 2018-04-24, Git v2.18.0-rc0) taught "fetch" the ability to send server options when using protocol v2, but not "clone".
This ability is triggered by "-o" or "--server-option".

Teach "clone" the same ability, except that because "clone" already has "-o" for another parameter, teach "clone" only to receive "--server-option".

Explain in the documentation, both for clone and for fetch, that server handling of server options are server-specific.
This is similar to receive-pack's handling of push options - currently, they are just sent to hooks to interpret as they see fit.


Note: Git 2.12 has introduced a git serve command in commit ed10cb9 by Brandon Williams:

serve: introduce git-serve

Introduce git-serve, the base server for protocol version 2.

Protocol version 2 is intended to be a replacement for Git's current wire protocol.
The intention is that it will be a simpler, less wasteful protocol which can evolve over time.

Protocol version 2 improves upon version 1 by eliminating the initial ref advertisement.
In its place a server will export a list of capabilities and commands which it supports in a capability advertisement.
A client can then request that a particular command be executed by providing a number of capabilities and command specific parameters.
At the completion of a command, a client can request that another command be executed or can terminate the connection by sending a flush packet.

But... Git 2.22 does amend that, with commit b7ce24d by Johannes Schindelin:

Turn git serve into a test helper

The git serve built-in was introduced in ed10cb9 (serve: introduce git-serve, 2018-03-15, Git v2.18.0-rc0) as a backend to serve Git protocol v2, probably originally intended to be spawned by git upload-pack.

However, in the version that the protocol v2 patches made it into core Git, git upload-pack calls the serve() function directly instead of spawning git serve; The only reason in life for git serve to survive as a built-in command is to provide a way to test the protocol v2 functionality.

Meaning that it does not even have to be a built-in that is installed with end-user facing Git installations, but it can be a test helper instead.

Let's make it so.


Git 2.23 (Q2 2019) will make update-server-info more efficient, since it learned not to rewrite the file with the same contents.

See commit f4f476b (13 May 2019) by Eric Wong (ele828).
(Merged by Junio C Hamano -- gitster -- in commit 813a3a2, 13 Jun 2019)

update-server-info: avoid needless overwrites

Do not change the existing info/refs and objects/info/packs files if they match the existing content on the filesystem.
This is intended to preserve mtime and make it easier for dumb HTTP pollers to rely on the If-Modified-Since header.

Combined with stdio and kernel buffering; the kernel should be able to avoid block layer writes and reduce wear for small files.

As a result, the --force option is no longer needed.
So stop documenting it, but let it remain for compatibility (and debugging, if necessary).

And Git 2.22.1 will also fix the server side support for "git fetch", which used to show incorrect value for the HEAD symbolic ref when the namespace feature is in use.

See commit 533e088 (23 May 2019) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 5ca0db3, 25 Jul 2019)

upload-pack: strip namespace from symref data

Since 7171d8c (upload-pack: send symbolic ref information as capability, 2013-09-17, Git v1.8.4.3), we've sent cloning and fetching clients special information about which branch HEAD is pointing to, so that they don't have to guess based on matching up commit ids.

However, this feature has never worked properly with the GIT_NAMESPACE feature. Because upload-pack uses head_ref_namespaced(find_symref), we do find and report on refs/namespaces/foo/HEAD instead of the actual HEAD of the repo.
This makes sense, since the branch pointed to by the top-level HEAD may not be advertised at all.

But we do two things wrong:

  1. We report the full name refs/namespaces/foo/HEAD, instead of just HEAD.
    Meaning no client is going to bother doing anything with that symref, since we're not otherwise advertising it.
  2. We report the symref destination using its full name (e.g., refs/namespaces/foo/refs/heads/master). That's similarly useless to the client, who only saw "refs/heads/master" in the advertisement.

We should be stripping the namespace prefix off of both places (which this patch fixes).

Likely nobody noticed because we tend to do the right thing anyway.
Bug (1) means that we said nothing about HEAD (just refs/namespace/foo/HEAD). And so the client half of the code, from a45b5f0 (connect: annotate refs with their symref information in get_remote_head(), 2013-09-17, Git v1.8.4.3), does not annotate HEAD, and we use the fallback in guess_remote_head(), matching refs by object id.
Which is usually right. It only falls down in ambiguous cases, like the one laid out in the included test.

This also means that we don't have to worry about breaking anybody who was putting pre-stripped names into their namespace symrefs when we fix bug (2).
Because of bug (1), nobody would have been using the symref we advertised in the first place (not to mention that those symrefs would have appeared broken for any non-namespaced access).

Note that we have separate fixes here for the v0 and v2 protocols.
The symref advertisement moved in v2 to be a part of the ls-refs command.
This actually gets part (1) right, since the symref annotation piggy-backs on the existing ref advertisement, which is properly stripped.
But it still needs a fix for part (2).


With Git 2.25.1 (Feb. 2020), the unnecessary round-trip when running "ls-remote" over the stateless RPC mechanism is reduced.

See discussion:

A colleague (Jon Simons) today pointed out an interesting behavior of git ls-remote with protocol v2: it makes a second POST request and sends only a flush packet.
This can be demonstrated with the following:

GIT_CURL_VERBOSE=1 git -c protocol.version=2 ls-remote origin

The Content-Length header on the second request will be exactly 4 bytes.

See commit 4d8cab9 (08 Jan 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 45f47ff, 22 Jan 2020)

transport: don't flush when disconnecting stateless-rpc helper

Signed-off-by: Jeff King

Since ba227857d2 ("Reduce the number of connects when fetching", 2008-02-04, Git v1.5.5-rc0 -- merge), when we disconnect a git transport, we send a final flush packet.
This cleanly tells the other side that we're done, and avoids the other side complaining "the remote end hung up unexpectedly" (though we'd only see that for transports that pass along the server stderr, like ssh or local-host).

But when we've initiated a v2 stateless-connect session over a transport helper, there's no point in sending this flush packet. Each operation we've performed is self-contained, and the other side is fine with us hanging up between operations.

But much worse, by sending the flush packet we may cause the helper to issue an entirely new request _just_ to send the flush packet. So we can incur an extra network request just to say "by the way, we have nothing more to send".

Let's drop this extra flush packet. As the test shows, this reduces the number of POSTs required for a v2 ls-remote over http from 2 to 1.


With Git 2.26 (Q1 2020), The test-lint machinery knew to check "VAR=VAL shell_function" construct, but did not check "VAR= shell_function", which has been corrected.

See commit d6509da, commit a7fbf12, commit c7973f2 (26 Dec 2019) by Jonathan Nieder (artagnon).
(Merged by Junio C Hamano -- gitster -- in commit c7372c9, 30 Jan 2020)

fetch test: mark test of "skipping" haves as v0-only

Signed-off-by: Jonathan Nieder

Since 633a53179e (fetch test: avoid use of "VAR= cmd" with a shell function, 2019-12-26), t5552.5 (do not send "have" with ancestors of commits that server ACKed) fails when run with GIT_TEST_PROTOCOL_VERSION=2.

The cause:

The progression of "have"s sent in negotiation depends on whether we are using a stateless RPC based transport or a stateful bidirectional one (see for example 44d8dc54e7, "Fix potential local deadlock during fetch-pack", 2011-03-29, Git v1.7.5-rc0).

In protocol v2, all transports are stateless transports, while in protocol v0, transports such as local access and SSH are stateful.

In stateful transports, the number of "have"s to send multiplies by two each round until we reach PIPESAFE_FLUSH (that is, 32), and then it increases by PIPESAFE_FLUSH each round.

In stateless transport, the count multiplies by two each round until we reach LARGE_FLUSH (which is 16384) and then multiplies by 1.1 each round after that.

Moreover, in stateful transports, as fetch-pack.c explains:

We keep one window "ahead" of the other side, and will wait for an ACK only on the next one.

This affects t5552.5 because it looks for "have"s from the negotiator that appear in that second window.

With protocol version 2, the second window never arrives, and the test fails.

Until 633a53179e (2019-12-26), a previous test in the same file contained

GIT_TEST_PROTOCOL_VERSION= trace_fetch client origin to_fetch

In many common shells (e.g. bash when run as "sh"), the setting of GIT_TEST_PROTOCOL_VERSION to the empty string lasts beyond the intended duration of the trace_fetch invocation.

This causes it to override the GIT_TEST_PROTOCOL_VERSION setting that was passed in to the test during the remainder of the test script, so t5552.5 never got run using protocol v2 on those shells, regardless of the GIT_TEST_PROTOCOL_VERSION setting from the environment.

633a53179e fixed that, revealing the failing test.

Choctaw answered 10/5, 2018 at 13:23 Comment(0)
D
4

Well, you're getting into plumbing details; even if you have to explain Git to a team of coworkers, I'm surprised by the idea that this level of detail would be needed...

Anyway, the info/refs file would only exist on a remote meant to be accessed by HTTP with a dumb server. You probably won't find it (and don't need it) in your local repo. (The remote in this scenario is probably a bare repo, btw, so info would be at the repo root, as bare repos don't have a work tree and place the files you're used to seeing in .git at the root instead.)

If our remote is in something like github, tfs, etc... then you just don't need to worry about any of this as the server will manage things just fine. I guess if you served the repo as static content from a plain old web server then this would matter, and you'd have to set up the hook.

Most users will never use or see the update-server-info command; as its name suggests, it's for repos on the server side - remotes - to compensate for the lack of a git-aware HTTP server.

The post-receive hook is invoked after receiving a push; so on a dumb server scenario, you set this hook on the remote so that when you push to it, it responds by updating certain information (like the refs file).

The GET command you're looking at is an HTTP command, run when necessary by the git client when you do a fetch.

Daggett answered 23/3, 2017 at 13:53 Comment(0)
C
3

Where is the folder info/refs? I only find a /.git/info/exclude afther the clone...

There is no such folder (it's not a directory), but that—.git/info/refs—would be where the file would be, if there were a file there.

How should I use the update-server-info command? Is it part of git clone somehow?

In general, you should not use it: it's only for "dumb" transports. "Smart" (two way conversation) transports don't need it.

I am competely lost with "...which is why you need to enable that as a post-receive hook" although I understand hooks (I thought) and use a pre-commit hook for automatically increasing the package version.

If, for some reason, you want to enable dumb transports, you need to run something to create or update several files every time they need creating or updating. The info/refs file needs to be updated whenever references change, so a good place to run the "something" is in a post-receive hook. The "something" is the command git update-server-info.

Note that if you are not running a push-only bare repository on a server, having a post-receive script run git update-server-info is not sufficient, since commits and other objects can be added by other means (manual git commits for instance). In this case you might use, e.g., a cron job to create-or-update dumb-transport information on a clock-driven basis.

I can't get the command GET info/refs in git bash work.

If the file existed, you would obtain it via HTTP, e.g., from a browser or with the curl command.

Carnauba answered 23/3, 2017 at 13:53 Comment(9)
This clarifies a bit. Your link to larsks's answer is broken? Can you update that? After reading the first sections of chapter 10 I wonder what I can learn from chapter 10.6: Either I know everything and I don't need the chapter, or I read it and even with your additional explanations my gutt feeling is that I only gain little. I had the hope to get a better understanding of how https-server/master, origin/master and master (locally) play together and why the additional origin/master is really needed...Glasscock
@Glasscock - the relationship between origin/master and master - and the purpose of origin/master - aren't really related to the transport. I'd check out chapter 3.5 for info on remote branches. I don't know what you mean https-server/master; do you have a remote named https-server or are you denoting something different here?Daggett
Ah, he deleted that answer, unfortunately. I'm not sure the Pro Git book's description of the transfer protocols is all that useful anyway, though. I started writing my own book (not specific to Git) but have had no time to work on it; but in it, I do have a chapter on distributing repositories... raw PDF is available hereCarnauba
@MarkAdelsberger I mean the following: There is a server like github (with an master branch) and a local repo with a master branch. That is always clear. But why is there a second local origin/master branch introduced? My feeling is, that it is needed for pushing and pulling correctly (you must know the last status of the server) but I don't feel like I am 100% sure what I say. That is why I got interested in what is going on, when you transfer data...Glasscock
@Glasscock - Well, like I said, you'll learn more about that from Chapter 3 than you will from Chapter 10. As an example of what origin/master is for, consider that only fetch, push, or pull access the remote(s); so how does git status know that your branch is n commits ahead of / behind the origin?Daggett
Git's problem is that it needs at least one name to find each node in its commit DAG. Names like origin/master provide reachability for these commits. See chapter 2 of my proto-book for details.Carnauba
@MarkAdelsberger Really good point about git status!!! Perhaps this should go to the docs (unless I didn't get it;-) Thanks a lot.Glasscock
@Carnauba Thanks a lot for the book. Seems to be worth reading, I already found good explanations in there! Where can I find something similar to GET info/refs? I found a lot of gets, but not within this context.Glasscock
You can't, yet—I have not gotten to those details. But you can demonstrate what one Git can see from another Git, as it starts up the whole git fetch process: from your own repository that has an origin, run git ls-remote origin. This calls up the other Git and queries it for all of its references, then simply prints them out for you.Carnauba
C
1

Another aspect of the git transfer protocol is in its packet management, including ACKs when requesting "HAVE":

Before Git 2.27 (Q2 2020), the server-end of the v2 protocol to serve "git clone" and "git fetch" was not prepared to see a delim packets at unexpected places, which led to a crash.

See commit cacae43 (29 Mar 2020), and commit 4845b77, commit 88124ab (27 Mar 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 5ee5788, 22 Apr 2020)

upload-pack: handle unexpected delim packets

Signed-off-by: Jeff King

When processing the arguments list for a v2 ls-refs or fetch command, we loop like this:

while (packet_reader_read(request) != PACKET_READ_FLUSH) {
        const char *arg = request->line;
 ...handle arg...
}

to read and handle packets until we see a flush. The hidden assumption here is that anything except PACKET_READ_FLUSH will give us valid packet data to read. But that's not true; PACKET_READ_DELIM or PACKET_READ_EOF will leave >packet->line as NULL, and we'll segfault trying to look at it.

Instead, we should follow the more careful model demonstrated on the client side (e.g., in process_capabilities_v2): keep looping as long as we get normal packets, and then make sure that we broke out of the loop due to a real flush. That fixes the segfault and correctly diagnoses any unexpected input from the client.


Before Git 2.27 (Q2 2020), the upload-pack protocol v2 gave up too early before finding a common ancestor, resulting in a wasteful fetch from a fork of a project.

This has been corrected to match the behaviour of v0 protocol.

See commit 2f0a093, commit 4fa3f00, commit d1185aa (28 Apr 2020) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit 0b07eec, 01 May 2020)

fetch-pack: in protocol v2, in_vain only after ACK

Signed-off-by: Jonathan Tan
Reviewed-by: Jonathan Nieder

When fetching, Git stops negotiation when it has sent at least MAX_IN_VAIN (which is 256) "have" lines without having any of them ACK-ed.
But this is supposed to trigger only after the first ACK, as pack-protocol.txt says:

However, the 256 limit only turns on in the canonical client implementation if we have received at least one "ACK %s continue" during a prior round. This helps to ensure that at least one common ancestor is found before we give up entirely.

The code path for protocol v0 observes this, but not protocol v2, resulting in shorter negotiation rounds but significantly larger packfiles.
Teach the code path for protocol v2 to check this criterion only after at least one ACK was received.


As a result of the work in 2.27 (where v2 was not the default), v2 is again the default with 2.28.

See commit 3697caf:

config: let feature.experimental imply protocol.version=2

Git 2.26 used protocol v2 as its default protocol, but soon after release, users noticed that the protocol v2 negotiation code was prone to fail when fetching from some remotes that are far ahead of others (such as linux-next.git versus Linus's linux.git).
That has been fixed by 0b07eec (Merge branch 'jt/v2-fetch-nego-fix', 2020-05-01, Git v2.27.0-rc0), but to be cautious, we are using protocol v0 as the default in 2.27 to buy some time for any other unanticipated issues to surface.

To that end, let's ensure that users requesting the bleeding edge using the feature.experimental flag do get protocol v2.
This way, we can gain experience with a wider audience for the new protocol version and be more confident when it is time to enable it by default for all users in some future Git version.

Implementation note: this isn't with the rest of the feature.experimental options in repo-settings.c because those are tied to a repository object, whereas this code path is used for operations like "git ls-remote" that do not require a repository.


With Git 2.28 (Q3 2020), the "fetch/clone" protocol has been updated to allow the server to instruct the clients to grab pre-packaged packfile(s) in addition to the packed object data coming over the wire.

See commit cae2ee1 (15 Jun 2020) by Ramsay Jones (``).
See commit dd4b732, commit 9da69a6, commit acaaca7, commit cd8402e, commit fd194dd, commit 8d5d2a3, commit 8e6adb6, commit eb05349, commit 9cb3cab (10 Jun 2020) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit 34e849b, 25 Jun 2020)

fetch-pack: support more than one pack lockfile

Signed-off-by: Jonathan Tan

Whenever a fetch results in a packfile being downloaded, a .keep file is generated, so that the packfile can be preserved (from, say, a running "git repack") until refs are written referring to the contents of the packfile.

In a subsequent patch, a successful fetch using protocol v2 may result in more than one .keep file being generated. Therefore, teach fetch_pack() and the transport mechanism to support multiple .keep files.

Implementation notes:

  • builtin/fetch-pack.c normally does not generate .keep files, and thus is unaffected by this or future changes.
    However, it has an undocumented "--lock-pack" feature, used by remote-curl.c when implementing the "fetch" remote helper command.
    In keeping with the remote helper protocol, only one "lock" line will ever be written; the rest will result in warnings to stderr.
    However, in practice, warnings will never be written because the remote-curl.c "fetch" is only used for protocol v0/v1 (which will not generate multiple .keep files). (Protocol v2 uses the "stateless-connect" command, not the "fetch" command.)

  • connected.c has an optimization in that connectivity checks on a ref need not be done if the target object is in a pack known to be self-contained and connected. If there are multiple packfiles, this optimization can no longer be done.

Cf. Packfile URIs

This feature allows servers to serve part of their packfile response as URIs. This allows server designs that improve scalability in bandwidth and CPU usage (for example, by serving some data through a CDN), and (in the future) provides some measure of resumability to clients.

This feature is available only in protocol version 2.


"git fetch --depth=(man) " over the stateless RPC / smart HTTP transport handled EOF from the client poorly at the server end.

This is fixed, as part of the transport protocol, in Git 2.30 (Q1 2021).

See commit fb3d1a0 (30 Oct 2020) by Daniel Duvall (marxarelli).
(Merged by Junio C Hamano -- gitster -- in commit d1169be, 18 Nov 2020)

upload-pack: allow stateless client EOF just prior to haves

Signed-off-by: Daniel Duvall

During stateless packfile negotiation where a depth is given, stateless RPC clients (e.g. git-remote-curl) will send multiple upload-pack requests with the first containing only the wants/shallows/deepens/filters and the subsequent containing haves/done.

When upload-pack handles such requests, entering get_common_commits without checking whether the client has hung up can result in unexpected EOF during the negotiation loop and a die() with message "fatal: the remote end hung up unexpectedly".

Real world effects include:

  • A client speaking to git-http-backend via a server that doesn't check the exit codes of CGIs (e.g. mod_cgi) doesn't know and doesn't care about the fatal. It continues to process the response body as normal.
  • A client speaking to a server that does check the exit code and returns an errant HTTP status as a result will fail with the message "error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500."
  • Admins running servers that surface the failure must workaround it by patching code that handles execution of git-http-backend to ignore exit codes or take other heuristic approaches.
  • Admins may have to deal with "hung up unexpectedly" log spam related to the failures even in cases where the exit code isn't surfaced as an HTTP server-side error status.

To avoid these EOF related fatals, have upload-pack gently peek for an EOF between the sending of shallow/unshallow lines (followed by flush) and the reading of client haves.
If the client has hung up at this point, exit normally.

Choctaw answered 2/5, 2020 at 20:24 Comment(0)
C
1

With Git 2.30 (Q1 2021), the transport layer was taught to optionally exchange the session ID assigned by the trace2 subsystem during fetch/push transactions.

See commit a2a066d, commit 8c48700, commit 8295946, commit 1e905bb, commit 23bf486, commit 6b5b6e4, commit 8073d75, commit 791e1ad, commit e97e1cf, commit 81bd549, commit f5cdbe4 (11 Nov 2020) by Josh Steadmon (steadmon).
(Merged by Junio C Hamano -- gitster -- in commit 01b8886, 08 Dec 2020)

serve: advertise session ID in v2 capabilities

Signed-off-by: Josh Steadmon

When transfer.advertiseSID is true, advertise the server's session ID for all protocol v2 connections via the new session-id capability.

And:

docs: new capability to advertise session IDs

Signed-off-by: Josh Steadmon

In future patches, we will add the ability for Git servers and clients to advertise unique session IDs via protocol capabilities. This allows for easier debugging when both client and server logs are available.

technical/protocol-capabilities now includes in its man page:

session-id=<session id>


The server may advertise a session ID that can be used to identify this process across multiple requests. The client may advertise its own session ID back to the server as well.

Session IDs should be unique to a given process. They must fit within a packet-line, and must not contain non-printable or whitespace characters.

technical/protocol-v2 now includes in its man page:

session-id=<session id>


The server may advertise a session ID that can be used to identify this process
across multiple requests. The client may advertise its own session ID back to
the server as well.

Session IDs should be unique to a given process. They must fit within a
packet-line, and must not contain non-printable or whitespace characters. 

Other new fix With Git 2.30 (Q1 2021):

"fetch-pack" could pass NULL pointer to unlink when it sees an invalid filename; the error checking has been tightened to make this impossible.

See commit 6031af3 (30 Nov 2020) by René Scharfe (rscharfe).
(Merged by Junio C Hamano -- gitster -- in commit eae47db, 08 Dec 2020)

fetch-pack: disregard invalid pack lockfiles

Signed-off-by: René Scharfe
Reviewed-by: Taylor Blau

9da69a6539 ("fetch-pack: support more than one pack lockfile", 2020-06-10, Git v2.28.0-rc0 -- merge listed in batch #5) started to use a string_list for pack lockfile names instead of a single string pointer.
It removed a NULL check from transport_unlock_pack() as well, which is the function that eventually deletes these lockfiles and releases their name strings.

index_pack_lockfile() can return NULL if it doesn't like the contents it reads from the file descriptor passed to it.
unlink(2) is declared to not accept NULL pointers (at least with glibc).
Undefined Behavior Sanitizer together with Address Sanitizer detects a case where a NULL lockfile name is passed to unlink(2) by transport_unlock_pack() in t1060 (make SANITIZE=address,undefined; cd t; ./t1060-object-corruption.sh).

Reinstate the NULL check to avoid undefined behavior, but put it right at the source, so that the number of items in the string_list reflects the number of valid lockfiles.


That transport layer v2 might not be compatible with the commit graph introduced originally in Git 2.18 (Q2 2018) (the precomputed information necessary for ancestry traversal in a separate file to optimize graph walking)

Ævar Arnfjörð Bjarmason describes in this thread the error messages you would see:

$ git status
    error: graph version 2 does not match version 1
$ ~/g/git/git --exec-path=$PWD status
    error: commit-graph version 2 does not match version 1
    On branch master
    [...]

With Git 2.31 (Q1 2021), the commit-graph learned to use corrected commit dates instead of the generation number to help topological revision traversal, in order to differentiate itself from protocol v1.

See commit 5a3b130, commit 8d00d7c, commit 1fdc383, commit e8b6300, commit c1a0911, commit d7f9278, commit 72a2bfc, commit c0ef139, commit f90fca6, commit 2f9bbb6, commit e30c5ee (16 Jan 2021) by Abhishek Kumar (abhishekkumar2718).
(Merged by Junio C Hamano -- gitster -- in commit 8b4701a, 17 Feb 2021)

commit-graph: implement generation data chunk

Signed-off-by: Abhishek Kumar
Reviewed-by: Taylor Blau
Reviewed-by: Derrick Stolee

As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2.
Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner.

We are going to introduce a new chunk called Generation DATa chunk (or GDAT).
GDAT will store corrected committer date offsets whereas CDAT will still store topological level.

Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT.
New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git).

To minimize the space required to store corrected commit date, Git stores corrected commit date offsets into the commit-graph file, instead of corrected commit dates.
This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage.
As such overflows are and should be exceedingly rare, we use the following overflow management scheme:

We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX.

If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained.

We test the overflow-related code with the following repo history:

          F - N - U
         /         \
U - N - U            N
         \          /
          N - F - N

Where:

  • the commits denoted by U have committer date of zero seconds since Unix epoch,
  • the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and
  • the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch.

The largest offset observed is 2 ^ 31, just large enough to overflow.

This is backward compatible with v1 because:

commit-graph: use generation v2 only if entire chain does

Signed-off-by: Derrick Stolee
Signed-off-by: Abhishek Kumar
Reviewed-by: Taylor Blau
Reviewed-by: Derrick Stolee

Since there are released versions of Git that understand generation numbers in the commit-graph's CDAT chunk but do not understand the GDAT chunk, the following scenario is possible:

  1. "New" Git writes a commit-graph with the GDAT chunk.
  2. "Old" Git writes a split commit-graph on top without a GDAT chunk.

If each layer of split commit-graph is treated independently, as it was the case before this commit, with Git inspecting only the current layer for chunk_generation_data pointer, commits in the lower layer (one with GDAT) whould have corrected commit date as their generation number, while commits in the upper layer would have topological levels as their generation.
Corrected commit dates usually have much larger values than topological levels.
This means that if we take two commits, one from the upper layer, and one reachable from it in the lower layer, then the expectation that the generation of a parent is smaller than the generation of a child would be violated.

It is difficult to expose this issue in a test.
Since we start with artificially low generation numbers, any commit walk that prioritizes generation numbers will walk all of the commits with high generation number before walking the commits with low generation number.
In all the cases I tried, the commit-graph layers themselves "protect" any incorrect behavior since none of the commits in the lower layer can reach the commits in the upper layer.

This issue would manifest itself as a performance problem in this case, especially with something like "git log --graph"(man) since the low generation numbers would cause the in-degree queue to walk all of the commits in the lower layer before allowing the topo-order queue to write anything to output (depending on the size of the upper layer).

Therefore, When writing the new layer in split commit-graph, we write a GDAT chunk only if the topmost layer has a GDAT chunk.
This guarantees that if a layer has GDAT chunk, all lower layers must have a GDAT chunk as well.

Rewriting layers follows similar approach: if the topmost layer below the set of layers being rewritten (in the split commit-graph chain) exists, and it does not contain GDAT chunk, then the result of rewrite does not have GDAT chunks either.

What is a "corrected commit date"?

commit-graph: implement corrected commit date

Signed-off-by: Abhishek Kumar
Reviewed-by: Taylor Blau
Reviewed-by: Derrick Stolee

With most of preparations done, let's implement corrected commit date.

The corrected commit date for a commit is defined as:

  • A commit with no parents (a root commit) has corrected commit date equal to its committer date.
  • A commit with at least one parent has corrected commit date equal to the maximum of its commit date and one more than the largest corrected commit date among its parents.

As a special case, a root commit with timestamp of zero (01.01.1970 00:00:00Z) has corrected commit date of one, to be able to distinguish from GENERATION_NUMBER_ZERO (that is, an uncomputed corrected commit date).

To minimize the space required to store corrected commit date, Git stores corrected commit date offsets into the commit-graph file.
The corrected commit date offset for a commit is defined as the difference between its corrected commit date and actual commit date.

Storing corrected commit date requires sizeof(timestamp_t) bytes, which in most cases is 64 bits (uintmax_t).
However, corrected commit date offsets can be safely stored using only 32-bits.
This halves the size of GDAT chunk, which is a reduction of around 6% in the size of commit-graph file.

However, using offsets be problematic if a commit is malformed but valid and has committer date of 0 Unix time, as the offset would be the same as corrected commit date and thus require 64-bits to be stored properly.

While Git does not write out offsets at this stage, Git stores the corrected commit dates in member generation of struct commit_graph_data.
It will begin writing commit date offsets with the introduction of generation data chunk.

And that improves performances:

commit-reach: use corrected commit dates in paint_down_to_common()

Signed-off-by: Abhishek Kumar
Reviewed-by: Taylor Blau
Reviewed-by: Derrick Stolee

091f4cf ("commit: don't use generation numbers if not needed", 2018-08-30, Git v2.19.0-rc2 -- merge) changed paint_down_to_common() to use commit dates instead of generation numbers v1 (topological levels) as the performance regressed on certain topologies.
With generation number v2 (corrected commit dates) implemented, we no longer have to rely on commit dates and can use generation numbers.

For example, the command git merge-base(man) v4.8 v4.9 on the Linux repository walks 167468 commits, taking 0.135s for committer date and 167496 commits, taking 0.157s for corrected committer date respectively.

While using corrected commit dates, Git walks nearly the same number of commits as commit date, the process is slower as for each comparision we have to access a commit-slab (for corrected committer date) instead of accessing struct member (for committer date).

As this has already causes problems (as noted in 859fdc0 (commit-graph: define GIT_TEST_COMMIT_GRAPH, 2018-08-29, Git v2.20.0-rc0 -- merge listed in batch #1), we disable commit graph within t6404-recursive-merge.


Then, still with Git 2.31 (Q1 2021), fix incremental update of commit-graph file around corrected commit date data.

See commit bc50d6c, commit fde55b0, commit 9c2c0a8, commit 448a39e (02 Feb 2021), and commit 90cb1c4, commit c4cc083 (01 Feb 2021) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 5bd0b21, 17 Feb 2021)

commit-graph: compute generations separately

Signed-off-by: Derrick Stolee
Reviewed-by: Taylor Blau

The compute_generation_numbers() method was introduced by 3258c66 ("commit-graph: compute generation numbers", 2018-05-01, Git v2.19.0-rc0 -- merge listed in batch #1) to compute what is now known as "topological levels".
These are still stored in the commit-graph file for compatibility sake while c1a0911 ("commit-graph: implement corrected commit date", 2021-01-16, Git v2.31.0 -- merge listed in batch #9) updated the method to also compute the new version of generation numbers: corrected commit date.

It makes sense why these are grouped.
They perform very similar walks of the necessary commits and compute similar maximums over each parent.
However, having these two together conflates them in subtle ways that is hard to separate.

In particular, the topo_level slab is used to store the topological levels in all cases, but the commit_graph_data_at(c)->generation member stores different values depending on the state of the existing commit-graph file. * If the existing commit-graph file has a "GDAT" chunk, then these values represent corrected commit dates. * If the existing commit-graph file doesn't have a "GDAT" chunk, then these values are actually the topological levels.

This issue only occurs only when upgrading an existing commit-graph file into one that has the "GDAT" chunk.
The current change does not resolve this upgrade problem, but splitting the implementation into two pieces here helps with that process, which will follow in the next change.

The important thing this helps with is the case where the num_generation_data_overflows was being incremented incorrectly, triggering a write of the overflow chunk.

And:

commit-graph: be extra careful about mixed generations

Signed-off-by: Derrick Stolee
Reviewed-by: Taylor Blau

When upgrading to a commit-graph with corrected commit dates from one without, there are a few things that need to be considered.

When computing generation numbers for the new commit-graph file that expects to add the generation_data chunk with corrected commit dates, we need to ensure that the 'generation' member of the commit_graph_data struct is set to zero for these commits.

Unfortunately, the fallback to use topological level for generation number when corrected commit dates are not available are causing us harm here: parsing commits notices that read_generation_data is false and populates 'generation' with the topological level.

The solution is to iterate through the commits, parse the commits to populate initial values, then reset the generation values to zero to trigger recalculation.
This loop only occurs when the existing commit-graph data has no corrected commit dates.


And also:

With Git 2.32 (Q2 2021), over-the-wire protocol learns a new request type to ask for object sizes given a list of object names.

See commit a2ba162 (20 Apr 2021) by Bruno Albuquerque (brunoga2).
(Merged by Junio C Hamano -- gitster -- in commit eede711, 14 May 2021)

object-info: support for retrieving object info

Signed-off-by: Bruno Albuquerque

Sometimes it is useful to get information of an object without having to download it completely.

Add the "object-info" capability that lets the client ask for object-related information with their full hexadecimal object names.

Only sizes are returned for now.

technical/protocol-v2 now includes in its man page:

object-info

object-info is the command to retrieve information about one or more objects. Its main purpose is to allow a client to make decisions based on this information without having to fully fetch objects. Object size is the only information that is currently supported.

An object-info request takes the following arguments:

  • size
    Requests size information to be returned for each listed object id.

  • oid <oid>
    Indicates to the server an object which the client wants to obtain information for.

The response of object-info is a list of the the requested object ids and associated requested information, each separated by a single space.

output = info flush-pkt

info = PKT-LINE(attrs) LF)
*PKT-LINE(obj-info LF)

attrs = attr | attrs SP attrs

attr = "size"

obj-info = obj-id SP obj-size

"git receive-pack"(man) that responds to git push(man) requests failed to clean a stale lockfile when killed in the middle, which has been corrected with Git 2.41 (Q2 2023).

See commit c55c306 (10 Mar 2023) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit ea09dff, 21 Mar 2023)

receive-pack: fix stale packfile locks when dying

Helped-by: Jeff King
Signed-off-by: Patrick Steinhardt

When accepting a packfile in git-receive-pack, we feed that packfile into git-index-pack to generate the packfile index.
As the packfile would often only contain unreachable objects until the references have been updated, concurrently running garbage collection might be tempted to delete the packfile right away and thus cause corruption.
To fix this, we ask git-index-pack to create a .keep file before moving the packfile into place, which is getting deleted again once all of the reference updates have been processed.

Now in production systems we have observed that those .keep files are sometimes not getting deleted as expected, where the result is that repositories tend to grow packfiles that are never deleted over time.
This seems to be caused by a race when git-receive-pack is killed after we have migrated the kept packfile from the quarantine directory into the main object database.
While this race window is typically small it can be extended for example by installing a proc-receive hook.

Fix this race by registering the lockfile as a tempfile so that it will automatically be removed at exit or when receiving a signal.


Here is another use-case/fix which illustrates how the protocol is working:

Transports that do not support protocol v2 did not correctly fall back to protocol v0 under certain conditions, which has been corrected with Git 2.41 (Q2 2023).

See commit eaa0fd6 (17 Mar 2023) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit f879501, 28 Mar 2023)

git_connect(): fix corner cases in downgrading v2 to v0

Signed-off-by: Jeff King

There's code in git_connect() that checks whether we are doing a push with protocol_v2, and if so, drops us to protocol_v0 (since we know how to do v2 only for fetches).
But it misses some corner cases:

  1. it checks the "prog" variable, which is actually the path to receive-pack on the remote side.
    By default this is just "git-receive-pack", but it could be an arbitrary string (like "/path/to/git receive-pack, etc).
    We'd accidentally stay in v2 mode in this case.
  2. besides "receive-pack" and "upload-pack", there's one other value we'd expect: "upload-archive" for handling "git archive --remote"(man)".
    Like receive-pack, this doesn't understand v2, and should use the v0 protocol.

In practice, neither of these causes bugs in the real world so far.
We do send a "we understand v2" probe to the server, but since no server implements v2 for anything but upload-pack, it's simply ignored.
But this would eventually become a problem if we do implement v2 for those endpoints, as older clients would falsely claim to understand it, leading to a server response they can't parse.

We can fix (1) by passing in both the program path and the "name" of the operation.
I treat the name as a string here, because that's the pattern set in transport_connect(), which is one of our callers (we were simply throwing away the "name" value there before).

We can fix (2) by allowing only known-v2 protocols ("upload-pack"), rather than blocking unknown ones ("receive-pack" and "upload-archive").
That will mean whoever eventually implements v2 push will have to adjust this list, but that's reasonable.
We'll do the safe, conservative thing (sticking to v0) by default, and anybody working on v2 will quickly realize this spot needs to be updated.

Choctaw answered 13/12, 2020 at 3:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.