Are concurrent git pushes always safe if the second push only has fast-forwards from the first push?
Asked Answered
E

2

11

I want to automatically push commits in the post-receive hook from a central repo on our LAN to another central repo in the cloud. The LAN repo is created using git clone --mirror git@cloud:/path/to/repo or equivalent commands.

Because the files being committed will be large relative to our upstream bandwidth, it's entirely possible something like this could happen:

  1. Alice initiates a push to the LAN repo.
  2. Bill pulls from the LAN repo while the post-receive hook is running.
    • The LAN repo is in the middle of pushing to the cloud repo.
    • This also means Bill's local repo contains the commits Alice pushed. Confirmed through testing.
  3. Bill initiates a push to the LAN repo.
    • Bill's push is a fast-forward of Alice's push, so the LAN repo will accept it.

When the post-receive hook for the LAN repo executes, a second push from the LAN repo to the cloud repo will start and the two will run concurrently.

I'm not worried about the git objects. The worst-case scenario is that both pushes upload all of the objects from Alice's push, but that shouldn't matter as far as I understand git's internals.

I'm concerned about the refs. Suppose Alice pushed using a much slower connection, so that Bill's push finishes first. Suppose packet loss or something else causes the hook's push from the LAN repo to the cloud of Bill's push to finish before the hook's push from the LAN repo to the cloud of Alice's push. If both Alice and Bill are pushing the master branch and Bill's push finishes first, What will the master ref be on the cloud repo? I want it to be Bill's HEAD, since that's the later push, but I'm concerned it will be Alice's HEAD.

Further clarification:

I realize Alice's push from her machine to the LAN repo will fail if Bill's push from his machine to the LAN repo finishes first. In that case, the LAN repo's post-receive hook will not execute. Furthermore, please assume nobody will be doing force pushes, so if the post-receive hook runs on the LAN repo, all ref changes are fast-forwards.

Execrable answered 7/12, 2011 at 22:0 Comment(1)
Note: atomic pushes will soon be a reality (Git 2.3.1+ Q1/Q2 2015): see my answer below.Trichinopoly
E
4

If Bill's push finishes first Alice's push will fail because before the refs are updated git makes sure the ref for the repo is still the same one as before. In this scenario it will not be. Alice will end up seeing the error message and needs to resolve the issues. The same goes for Bill in the vice versa case. So in your post-receive hook you must make sure that the original and new refs for the repo are different now. If not, then do not push up to the new repo at all to save some work.

I still see a problem in your scenario though and it is with the push to the cloud. You can have the SAME issue with the hook pushing two valid refs up to the cloud location. Except now you wont know if you need to push to the repo in the script if it fails the first time because you won't know if the failed ref was older or newer than the one pushed... especially if they weren't simple fast forwards which can happen from time to time. If you just forced the push no matter what that would have a chance the cloud will have an OLD ref until another hook pushes something else up later. In the case with Alice he would have merged the changes from upstream or any number of other solutions, but the script probably shouldn't have such decision making capability.

In the hook you might be able to do some script magic on the current repo to determine timestamps and the like and only push if there is a fast forward, but that seems messy and it is more likely a merge is needed anyway. I think a better solution than using a post-receive hook is to use a cron, or scheduled, task every five minutes (or however frequent you want) that simply runs a git pull on the master branch of your remote mirror. If you don't have access to that repo, you can do the force push from your LAN repo with a cron job instead. I think this is safer than the hook and less complicated. This will assure you the branch on the backup cloud is always in the correct place every few minutes and doesn't risk pushing an older ref and never getting the newest one until there is another push from a user, like the hook does.

Engelbert answered 8/12, 2011 at 16:18 Comment(4)
"in your post-receive hook you must make sure that the original and new refs for the repo are different now" -- I don't understand what you mean by that. If the old and new values for a given ref are the same, it appears the ref is not passed to the post-receive hook anyway. If no refs change, the post-receive hook is not even called. So I don't see a purpose in checking for the difference.Execrable
Actually, I'm only concerned about the issue between the LAN repo and the cloud, but I see that I didn't state that part of my question very clearly and will edit it.Execrable
I'm interpreting your answer as a No, the push to the cloud repo from the LAN repo is not safe. I think the problem is that git push --mirror (which is default after git clone --mirror) force updates the refs.Execrable
Correct, the push to the cloud from the LAN repo is not safe. I think the crontab job is the safest option.Engelbert
T
3

Git 2.4+ (Q2 2015) will introduce atomic pushes, which should make easier for the server to manage the pushes order.
See the work done by Stefan Beller (stefanbeller):

  • commit ad35eca t5543-atomic-push.sh: add basic tests for atomic pushes

This adds tests for the atomic push option.
The first four tests check if the atomic option works in good conditions and the last three patches check if the atomic option prevents any change to be pushed if just one ref cannot be updated.

Use an atomic transaction on the remote side if available.
Either all refs are updated, or on error, no refs are updated.
If the server does not support atomic pushes the push will fail.

This adds support to send-pack to negotiate and use atomic pushes iff the server supports it. Atomic pushes are activated by a new command line flag --atomic.

This adds the atomic protocol option to allow receive-pack to inform the client that it has atomic push capability.
This commit makes the functionality introduced in the previous commits go live for the serving side.
The changes in documentation reflect the protocol capabilities of the server.

   atomic
   ------

If the server sends the 'atomic' capability it is capable of accepting atomic pushes.
If the pushing client requests this capability, the server will update the refs in one atomic transaction.
Either all refs are updated or none.


With Git 2.29 (Q4 2020), "git push"(man) that wants to be atomic and wants to send push certificate learned not to prepare and sign the push certificate when it fails the local check (hence due to atomicity it is known that no certificate is needed).

See commit a4f324a (19 Sep 2020) by Han Xin (chiyutianyi).
(Merged by Junio C Hamano -- gitster -- in commit b5847b9, 25 Sep 2020)

send-pack: run GPG after atomic push checking

Signed-off-by: Han Xin

The refs update commands can be sent to the server side in two different ways: GPG-signed or unsigned.
We should run these two operations in the same "Finally, tell the other end!" code block, but they are separated by the "Clear the status for each ref" code block.
This will result in a slight performance loss, because the failed atomic push will still perform unnecessary preparations for shallow advertise and GPG-signed commands buffers, and user may have to be bothered by the (possible) GPG passphrase input when there is nothing to sign.

Add a new test case to t5534 to ensure GPG will not be called when the GPG-signed atomic push fails.

Trichinopoly answered 15/2, 2015 at 4:39 Comment(2)
While this is interesting and I'll be happy to upgrade and use --atomic when it's available, I'm not terribly concerned about multiple refs in a single push. Our team almost always pushes just a single ref. I'm concerned about the behavior of git regarding multiple pushes of a single ref. On the cloud server, if Bill's push of ref/some-branch somehow finishes before Alice's push of ref/some-branch, will Alice's fail? I don't see those commits changing this behavior. This implies that git is already safe for the single-ref question that I'm asking.Execrable
And regarding my previous comment, assume Bill's ref/some-branch contains Alice's ref/some-branch.Execrable

© 2022 - 2024 — McMap. All rights reserved.