What is the difference between git push and git pull?
Asked Answered
O

5

64

I just stumbled over something peculiar today. I asked a co-worker at my summer job to help me set up a new remote git repo for my code and there was a lot of confusion about what he did and what I wanted to do. I asked him to send over his config to be able to see the path to his remote and found out that he didn't have a remote. When I asked him about this he explained his workflow like this:

  1. Change something locally
  2. Commit
  3. Move to remote dir
  4. git pull c:\localdir

So instead of pushing to a remote he constantly pulled from his local repo to the one on our server. Sort of working backwards. When I confronted him about this he asked me what the difference was and I could not really answer him, but I think there are something right?

So my question to you all is: What is the difference in pushing to a remote and pulling from a remote?

Oversupply answered 28/6, 2012 at 8:23 Comment(0)
D
22

Pushing to a remote : send some commits you have to a another git repo. The git repo is considered as "remote", but it can be a repo in another folder of your hard drive. pulling from a remote : get some commits from a remote repo and merge them in your current HEAD (your current checkout of your repo)

Your coworker might have use pull instead of push because your repository might not have been available (no git daemon running, or gitweb, or ssh server on), but his was avalaible from your computer. As it is a server, he might not want to expose a git daemon/service which could be a vector of attack.

But if your repository was shared/available, he would just have been able to do :

  1. change something locally
  2. commit
  3. push to your repository
Decline answered 28/6, 2012 at 8:32 Comment(3)
I think this was the case since I also hade some problems pushing to the server.Oversupply
Then it is. And as told by eckes the server might have a checked out directory already which reflects master as a production version. So you won't be able to push from your local branch to the remote master branch since it is already checked out to be available for production needs.Decline
There's something I just wanted clarification on: when you pull, do you only get the commits in a particular branch, or is it automatically the same branch as on your local system (since you say you merge them into your current HEAD)?Digitalism
A
20

In my view you can either let users push their commits to some repository that's considered to be "the master", or you let them send pull requests to a single user that has permission to modify said "master".

Github, for example, won't let non-contributors push to the repository, but will allow them to send pull requests, so that the contributors can integrate their changes.

Abacus answered 28/6, 2012 at 8:43 Comment(0)
S
18

TL;DR

Push, fetch, and pull let two different Gits talk to each other. In a special case—including the one that's the basis of the question, with c:\localdir—the two different Git repositories are on the same computer, but in general, the two different repositories can be on any two different computers.

  • Push: sends commits and asks them to update their branch. This requires that things be right on their end. This cannot combine parallel development.

  • Pull: runs git fetch, which gets commits and has your Git update your remote-tracking name, then runs a second Git command to update your branch. The second command can combine parallel development.

When the repositories are on different computers, the direction of transfer tends to be much more important, since you can't easily switch your point of view.

Long

Besides the accepted answer, which is accurate enough as far as it goes, there are a few other key differences between git pull and git push. We need to start with this:

The opposite of push is fetch

Git accidentally used the wrong verb here. In Mercurial, we have hg pull to get commits from another repository, and hg push to send commits to another repository. But Git made git pull do two things: (1) get commits; (2) check out or merge those commits. Git then had to separate out the two steps because sometimes you don't want to do step 2 right away.

This means that in Git, the actual opposite of git push is not git pull, but rather git fetch. The git pull command means:

  1. run git fetch; then
  2. run a second Git command.

This second command is where things get the most complicated. If we can leave it out—if we address just fetch vs push—it's simpler. We can add the second command back later.

git fetch is always safe, but git push isn't

The next problem we have here is simple enough, but if you haven't "gotten it" yet, it's terribly confusing until suddenly you "get it" and it makes sense.

When we have a Git repository, we really have three things:

  1. We have a database of commits (and other objects, but commits are the interesting part). The commits are numbered, but the numbers look random. They aren't simple counting numbers: commit #1 is not followed by commit #2, and in fact there is no "commit #1" in the first place. The numbers are hash IDs and they look like random scribbles: 84d06cdc06389ae7c462434cb7b1db0980f63860 for instance.

    The stuff inside a commit is completely read-only. Each commit acts like a full snapshot of every file. This is great for archival, and useless for getting any new work done. Therefore, in a normal (non-bare) repository, we also have:

  2. A normal everyday repository has a place where we get work done. We won't cover this in any detail at all here, but this is important and matters for fetch-vs-push. Some repositories deliberately omit this work area. These are called bare repositories, and we find them on servers, typically.

  3. Last, each repository has a database of names, including branch names. These names allow your Git to find your commits. They mean that you don't have to memorize 84d06cdblahblahwhatever.

When you run git fetch, your Git calls up some other Git, often over the Internet-phone at an https:// or ssh:// address. You can call up some other Git with a c:\localdir or /mnt/some/path or whatever. In that special case, your computer might be talking to itself—but usually it's talking to another computer, with its own totally separate Git repository. That other Git repository can also have all three of these. If it's on a server, it might be a bare repository, and not have a work-area. It always, however, has its own database of commits, and its own database of names.

What this means is that your Git has your commits (and maybe theirs too) and your branch names. Their Git has their commits (and maybe yours too) and their branch names. With git fetch, you have your Git call up their Git and get their commits (so now you have yours and theirs); with git push, you have your Git call up their Git, and give them your commits (so now they have theirs and yours).

So far, the key difference between fetch and push is the direction of the data transfer. With fetch, you get commits, and with push, you give commits. But the difference does not stop here.

When git fetch finishes, your Git knows about all the commits. That's great—but we just noted that the commit numbers, by which Git finds the commits, are big ugly random-looking messes. So what git fetch does is take all their branch names—the names they are using to find their commits—and copy them into your own Git, but change them into remote-tracking names. Their main becomes your origin/main, for instance. If they have a develop, your Git creates or updates your origin/develop, and so on. This means git fetch never touches any of your own branches at all, which is why it's always safe. You either get new commits, or you don't. You never lose any of your own commits. Then your Git updates your remote-tracking names if necessary. And then it's done. That's the entire normal git fetch action: bring in some commits if appropriate, and update some non-branch names if appropriate.1

The last part of git push, just before it finishes, though, consists of a request. Your Git asks their Git to please, if it's OK, change some of their names. For instance, if you run git push origin develop, your Git sends over any commits you have, that they don't, that they need to complete the operation, and then it sends a polite request: please, if it's OK, make your branch name develop find commit ________. Your Git fills in this blank with the commit that your branch name develop finds.

The key difference here is that git fetch updates your remote-tracking name but git push asks them to update their branch name. If they are doing development, they might think it's not OK to update their branch name.


1There are ways you can run git fetch and tell it to update your own branch names. It does not happen by accident; you have to make Git do it. You should not make Git do it. If you're a Git Master, this rule becomes: you probably should not make Git do it.


The second command

It's now time to look at the second command that git pull invokes. Well, almost time. First we should look at how Git finds commits.

I mentioned earlier that Git finds commits using branch names. That's true, but not a complete picture. I also mentioned remote-tracking names. Git can find commits with remote-tracking names. That's more complete, but still not really complete. Here's the complete set of tricks Git has:

  • Git can always find a commit if you give it the raw hash ID. Well, that and if it's actually in your repository—if not, you might need to use git fetch to get it, first. If Git can't find a commit from the hash ID, that just means it's not in your repository yet. Just use git fetch to get it from some Git that does have it, and then you're good.

  • Git can find a commit from a name. All kinds of names work here: branch names like main and develop, remote-tracking names like origin/main and origin/develop, tag names like v1.2, and even funky special purpose names. Git has a bunch of those that you don't see very often. The rules for turning a name into a hash ID are described in the gitrevisions documentation.

  • Git can find a commit from another commit. This leads to many of the rules in gitrevisions. This sentence is in bold here because it's so important.

  • Last, Git can find commits with various searching operations, also described in gitrevisions.

There is a lot of stuff in gitrevisions, and you don't need to memorize all of it. Just remember that there are lots of ways to find commits. Using git log, then cutting and pasting hash IDs is a fine way to do it, but sometimes you might want to experiment with the various shortcuts. But, do remember one more thing: git log works by finding commits using commits to find commits!

Each commit stores two things: it has a full snapshot of all files, as we mentioned earlier, but it also has metadata: information about the commit itself. This includes the name and email address of the person who made the commit, for instance. It includes another name and email address as well (the "committer" vs the "author"), and two date-and-time stamps. It has a bunch of stuff in this metadata, and the key thing for Git itself here is that it has the raw hash ID of the commit that comes before this commit.

What this all means is that commits, in Git, form a backwards-looking chain. Merge commits store two or more previous commit hash IDs, so from a merge, we can go backwards down two chains, or perhaps even more than two. In any non-empty repository there is also at least one root commit, that doesn't point backwards: that's where history ends, or starts, depending on how you look at it. But most commits just store one hash ID, giving us the simple chain:

... <-F <-G <-H

If H here stands in for the hash ID of the last commit in some chain, and if we have some way to find commit H, we'll be able to find commit G too. That's because commit H stores the raw hash ID of earlier commit G. Likewise, from G, we can find commit F, because G stores the hash ID of F. F of course also stores a hash ID, and so on—so by starting at H, and then working backwards, one commit at a time, we can find all the commits that end at H.

A branch name in Git just records the hash ID of that last commit. We say that the branch name points to the last commit, and the last commit then points to the next-to-last commit, which points back to a still-earlier commit, and so on.

Parallel development

Suppose we clone some repository from some central server (e.g., GitHub). We get a large collection of commits. Our git clone operation actually works by creating a new empty repository, then copying all of their commits, but none of their branch names. Then, after filling up our repository's commit-database with commits, and creating remote-tracking names for their branch names, our Git creates one new branch name.

The branch name we get is the one we pick with git clone's -b option. If we don't pick one, the name we get is the one their Git recommends. Typically these days that's main. Sometimes that's their only branch name. If so, we'll get some series of commits, plus the one remote-tracking name origin/main:

...--F--G--H   <-- origin/main

and then our Git will create our own main to match their main (and then git checkout or git switch to our new main):

...--F--G--H   <-- main (HEAD), origin/main

We can now work and make new commits. Whatever new commits we make, they will get new, universally-unique hash IDs. Let's make two new commits on our main:

             I--J   <-- main (HEAD)
            /
...--F--G--H   <-- origin/main

Now let's suppose that, by whatever means, their Git has gotten two new commits added to their main. Those new commits will get new universally-unique hash IDs. When we run git fetch origin, we'll pick up the new commits:

             I--J   <-- main (HEAD)
            /
...--F--G--H
            \
             K--L   <-- origin/main

Note how our work and their work have diverged. This happens when there is parallel development. It doesn't happen when there isn't parallel development: if they don't get two new commits, we will still have our origin/main—our memory of their main—pointing to commit H. Our new I-J commits add on to H.

If we don't have parallel development, we can probably git push now

Let's say we didn't have any parallel development. We now run:

git push origin main

to send our new I-J commits to them, and ask them to set their main to point to commit J. If they obey, they will get this:

...--F--G--H--I--J   <-- main

(note that they don't have an origin/main, and we don't care what their HEAD is, not that I've told you what our HEAD is about here).

If we do have parallel development, this is a problem

If they have:

...--F--G--H--K--L   <-- main

in their repository when we run git push, we will send them our I-J. But our commit I connects back to commit H. Our Git will then ask them to set their main to point to commit J:

             I--J   <-- (polite-request: set main to point here)
            /
...--F--G--H--K--L   <-- main

If they were to obey this request, they would lose their K-L. So they will reject the request. The specific error we will see is the claim that this is not a fast-forward.

It is possible, depending on permissions,2 to force them to obey anyway. As in footnote 1, though, this is not something you should do, at least not until you really understand the idea of "losing" commits.


2Git as distributed does not have this kind of permission checking, but most hosting services, such as GitHub, have added it. If you set up your own hosting service, you should consider a way to add it, too.


In the face of parallel development, we need a way to combine work

Let's suppose that, in whatever way, we find ourselves in this situation:

             I--J   <-- main (HEAD)
            /
...--F--G--H
            \
             K--L   <-- origin/main

What we need now is a way to combine our work—the stuff we did to make commits I and J—with their work, whoever they are: the stuff they did to make commits K-L.

Git has many ways of combining work, but we won't go into a lot of detail here. The two principle ways of doing this are with git merge and with git rebase. So, after a git fetch that results in this kind of fork—where we and they both have new commits—we will need a second Git command, probably either git merge or git rebase.

The correct choice of second command is partly a matter of opinion. There's no one universally correct choice here. But what git pull does is this:

  • You pick a choice in advance, before you even see if you have this kind of "need to combine work" as a result of the git fetch you are about to run. Note that you have not yet run this git fetch, even though you're making this decision right now.

  • Then, having decided, you run a git pull with either an option or two, to say how to deal with this, or with a configuration setting, to say how to deal with this, or with no options at all, which means use git merge if necessary.

Your git pull now runs the git fetch. This obtains any new commits they have that you don't, and updates your remote-tracking name.3 Then it looks to see if that special second combine-work operation is required. If so, it uses it to combine work. If not, it just does a git checkout or git switch to the latest commit while also bringing your current branch name forward.4


3In extremely out of date versions of Git (predating 1.8.4), git pull doesn't update the remote-tracking name. Be aware of this in case you encounter one of these ancient Git versions.

4There are two things of note here:

  • Git calls this a fast forward merge. This is not actually a merge, so this is a poor name. (Mercurial just calls it an update.) Since Git 2.0, you can tell git pull to do only a fast-forward operation: if work-combining is required, git pull will do the fetch, but then stop with an error. This is probably what git pull should have done since the beginning, and probably what it will do eventually, but for compatibility reasons, it does not do that today. Edit, July 2022: the day has more or less arrived, and git pull now sometimes defaults to this. I'll leave out any further detail though.

    If you do have the option, and if you like git pull, I recommend using git pull --ff-only or configuring pull.ff to only, with git config pull.ff only. (I personally tend to just run git fetch, then git log or some similar operation to check, and then run git merge --ff-only manually, but my habits were set long before Git 2.0.)

  • The git switch command was new in Git 2.23. There's no real difference between git switch and git checkout for this particular case. The new command was added because the Git folks found that git checkout is too complicated—it has a lot of modes—and that some of its modes were destructive. This destruction hit even experienced Git users sometimes. (That's been fixed: since 2.23, git checkout errors out for these cases now.) To make Git more user-friendly, git checkout got split into two separate commands. It's a good idea to use the new commands, but the old one still works, because Git has to be compatible for a long time.


Summary

Push sends commits and asks them to update their branch. This requires that things be right on their end. This cannot combine parallel development.

Pull gets commits and has your Git update your remote-tracking name, then runs a second Git command to update your branch. The second command can combine parallel development.

You can avoid running the second command immediately by using git fetch instead of git pull. This is useful if you want to see what you're dealing with, before you make any decisions about how to use it.

Studer answered 29/3, 2021 at 8:52 Comment(0)
A
2

None, the repos are copies of each other and pull and push are just direction flows. The difference with your co-worker's method is that he added a 4th unneeded command.

Arzola answered 28/6, 2012 at 8:27 Comment(0)
P
2

Yes, it is working backwards.

Principle workflow is:

  1. change something locally
  2. commit
  3. push to remote dir

One use case (another is explained by Dolanor) for not pushing to remote is that a working copy is checked out on the remote (i.e. it's no bare repo). When he wants to push a branch that is checked out on the remote box (e.g. master:master), this will not succeed since pushes to checked-out branches are forbidden.

In my opinion, that's the only use case for hopping over to the remote machine and pulling instead of pushing from the local machine.

Pail answered 28/6, 2012 at 8:28 Comment(3)
I explained the other case where the repo is unavailaible (no git daemon, etc). I forgot about the already checked out repository. So there seems to be 2 case for doing that kind of workflowDecline
So basically pushing and pulling is just tossing commits in opposite directions?Oversupply
Yes. And they are merged in the HEAD branch (or the one given on command line) automatically.Decline

© 2022 - 2024 — McMap. All rights reserved.