TL;DR
Push, fetch, and pull let two different Gits talk to each other. In a special case—including the one that's the basis of the question, with c:\localdir
—the two different Git repositories are on the same computer, but in general, the two different repositories can be on any two different computers.
Push: sends commits and asks them to update their branch. This requires that things be right on their end. This cannot combine parallel development.
Pull: runs git fetch
, which gets commits and has your Git update your remote-tracking name, then runs a second Git command to update your branch. The second command can combine parallel development.
When the repositories are on different computers, the direction of transfer tends to be much more important, since you can't easily switch your point of view.
Long
Besides the accepted answer, which is accurate enough as far as it goes, there are a few other key differences between git pull
and git push
. We need to start with this:
The opposite of push is fetch
Git accidentally used the wrong verb here. In Mercurial, we have hg pull
to get commits from another repository, and hg push
to send commits to another repository. But Git made git pull
do two things: (1) get commits; (2) check out or merge those commits. Git then had to separate out the two steps because sometimes you don't want to do step 2 right away.
This means that in Git, the actual opposite of git push
is not git pull
, but rather git fetch
. The git pull
command means:
- run
git fetch
; then
- run a second Git command.
This second command is where things get the most complicated. If we can leave it out—if we address just fetch vs push—it's simpler. We can add the second command back later.
git fetch
is always safe, but git push
isn't
The next problem we have here is simple enough, but if you haven't "gotten it" yet, it's terribly confusing until suddenly you "get it" and it makes sense.
When we have a Git repository, we really have three things:
We have a database of commits (and other objects, but commits are the interesting part). The commits are numbered, but the numbers look random. They aren't simple counting numbers: commit #1 is not followed by commit #2, and in fact there is no "commit #1" in the first place. The numbers are hash IDs and they look like random scribbles: 84d06cdc06389ae7c462434cb7b1db0980f63860
for instance.
The stuff inside a commit is completely read-only. Each commit acts like a full snapshot of every file. This is great for archival, and useless for getting any new work done. Therefore, in a normal (non-bare) repository, we also have:
A normal everyday repository has a place where we get work done. We won't cover this in any detail at all here, but this is important and matters for fetch-vs-push. Some repositories deliberately omit this work area. These are called bare repositories, and we find them on servers, typically.
Last, each repository has a database of names, including branch names. These names allow your Git to find your commits. They mean that you don't have to memorize 84d06cdblahblahwhatever
.
When you run git fetch
, your Git calls up some other Git, often over the Internet-phone at an https://
or ssh://
address. You can call up some other Git with a c:\localdir
or /mnt/some/path
or whatever. In that special case, your computer might be talking to itself—but usually it's talking to another computer, with its own totally separate Git repository. That other Git repository can also have all three of these. If it's on a server, it might be a bare repository, and not have a work-area. It always, however, has its own database of commits, and its own database of names.
What this means is that your Git has your commits (and maybe theirs too) and your branch names. Their Git has their commits (and maybe yours too) and their branch names. With git fetch
, you have your Git call up their Git and get their commits (so now you have yours and theirs); with git push
, you have your Git call up their Git, and give them your commits (so now they have theirs and yours).
So far, the key difference between fetch and push is the direction of the data transfer. With fetch, you get commits, and with push, you give commits. But the difference does not stop here.
When git fetch
finishes, your Git knows about all the commits. That's great—but we just noted that the commit numbers, by which Git finds the commits, are big ugly random-looking messes. So what git fetch
does is take all their branch names—the names they are using to find their commits—and copy them into your own Git, but change them into remote-tracking names. Their main
becomes your origin/main
, for instance. If they have a develop
, your Git creates or updates your origin/develop
, and so on. This means git fetch
never touches any of your own branches at all, which is why it's always safe. You either get new commits, or you don't. You never lose any of your own commits. Then your Git updates your remote-tracking names if necessary. And then it's done. That's the entire normal git fetch
action: bring in some commits if appropriate, and update some non-branch names if appropriate.1
The last part of git push
, just before it finishes, though, consists of a request. Your Git asks their Git to please, if it's OK, change some of their names. For instance, if you run git push origin develop
, your Git sends over any commits you have, that they don't, that they need to complete the operation, and then it sends a polite request: please, if it's OK, make your branch name develop
find commit ________. Your Git fills in this blank with the commit that your branch name develop
finds.
The key difference here is that git fetch
updates your remote-tracking name but git push
asks them to update their branch name. If they are doing development, they might think it's not OK to update their branch name.
1There are ways you can run git fetch
and tell it to update your own branch names. It does not happen by accident; you have to make Git do it. You should not make Git do it. If you're a Git Master, this rule becomes: you probably should not make Git do it.
The second command
It's now time to look at the second command that git pull
invokes. Well, almost time. First we should look at how Git finds commits.
I mentioned earlier that Git finds commits using branch names. That's true, but not a complete picture. I also mentioned remote-tracking names. Git can find commits with remote-tracking names. That's more complete, but still not really complete. Here's the complete set of tricks Git has:
Git can always find a commit if you give it the raw hash ID. Well, that and if it's actually in your repository—if not, you might need to use git fetch
to get it, first. If Git can't find a commit from the hash ID, that just means it's not in your repository yet. Just use git fetch
to get it from some Git that does have it, and then you're good.
Git can find a commit from a name. All kinds of names work here: branch names like main
and develop
, remote-tracking names like origin/main
and origin/develop
, tag names like v1.2
, and even funky special purpose names. Git has a bunch of those that you don't see very often. The rules for turning a name into a hash ID are described in the gitrevisions documentation.
Git can find a commit from another commit. This leads to many of the rules in gitrevisions. This sentence is in bold here because it's so important.
Last, Git can find commits with various searching operations, also described in gitrevisions.
There is a lot of stuff in gitrevisions, and you don't need to memorize all of it. Just remember that there are lots of ways to find commits. Using git log
, then cutting and pasting hash IDs is a fine way to do it, but sometimes you might want to experiment with the various shortcuts. But, do remember one more thing: git log
works by finding commits using commits to find commits!
Each commit stores two things: it has a full snapshot of all files, as we mentioned earlier, but it also has metadata: information about the commit itself. This includes the name and email address of the person who made the commit, for instance. It includes another name and email address as well (the "committer" vs the "author"), and two date-and-time stamps. It has a bunch of stuff in this metadata, and the key thing for Git itself here is that it has the raw hash ID of the commit that comes before this commit.
What this all means is that commits, in Git, form a backwards-looking chain. Merge commits store two or more previous commit hash IDs, so from a merge, we can go backwards down two chains, or perhaps even more than two. In any non-empty repository there is also at least one root commit, that doesn't point backwards: that's where history ends, or starts, depending on how you look at it. But most commits just store one hash ID, giving us the simple chain:
... <-F <-G <-H
If H
here stands in for the hash ID of the last commit in some chain, and if we have some way to find commit H
, we'll be able to find commit G
too. That's because commit H
stores the raw hash ID of earlier commit G
. Likewise, from G
, we can find commit F
, because G
stores the hash ID of F
. F
of course also stores a hash ID, and so on—so by starting at H
, and then working backwards, one commit at a time, we can find all the commits that end at H
.
A branch name in Git just records the hash ID of that last commit. We say that the branch name points to the last commit, and the last commit then points to the next-to-last commit, which points back to a still-earlier commit, and so on.
Parallel development
Suppose we clone some repository from some central server (e.g., GitHub). We get a large collection of commits. Our git clone
operation actually works by creating a new empty repository, then copying all of their commits, but none of their branch names. Then, after filling up our repository's commit-database with commits, and creating remote-tracking names for their branch names, our Git creates one new branch name.
The branch name we get is the one we pick with git clone
's -b
option. If we don't pick one, the name we get is the one their Git recommends. Typically these days that's main
. Sometimes that's their only branch name. If so, we'll get some series of commits, plus the one remote-tracking name origin/main
:
...--F--G--H <-- origin/main
and then our Git will create our own main
to match their main
(and then git checkout
or git switch
to our new main
):
...--F--G--H <-- main (HEAD), origin/main
We can now work and make new commits. Whatever new commits we make, they will get new, universally-unique hash IDs. Let's make two new commits on our main
:
I--J <-- main (HEAD)
/
...--F--G--H <-- origin/main
Now let's suppose that, by whatever means, their Git has gotten two new commits added to their main
. Those new commits will get new universally-unique hash IDs. When we run git fetch origin
, we'll pick up the new commits:
I--J <-- main (HEAD)
/
...--F--G--H
\
K--L <-- origin/main
Note how our work and their work have diverged. This happens when there is parallel development. It doesn't happen when there isn't parallel development: if they don't get two new commits, we will still have our origin/main
—our memory of their main
—pointing to commit H
. Our new I-J
commits add on to H
.
If we don't have parallel development, we can probably git push
now
Let's say we didn't have any parallel development. We now run:
git push origin main
to send our new I-J
commits to them, and ask them to set their main
to point to commit J
. If they obey, they will get this:
...--F--G--H--I--J <-- main
(note that they don't have an origin/main
, and we don't care what their HEAD
is, not that I've told you what our HEAD
is about here).
If we do have parallel development, this is a problem
If they have:
...--F--G--H--K--L <-- main
in their repository when we run git push
, we will send them our I-J
. But our commit I
connects back to commit H
. Our Git will then ask them to set their main
to point to commit J
:
I--J <-- (polite-request: set main to point here)
/
...--F--G--H--K--L <-- main
If they were to obey this request, they would lose their K-L
. So they will reject the request. The specific error we will see is the claim that this is not a fast-forward.
It is possible, depending on permissions,2 to force them to obey anyway. As in footnote 1, though, this is not something you should do, at least not until you really understand the idea of "losing" commits.
2Git as distributed does not have this kind of permission checking, but most hosting services, such as GitHub, have added it. If you set up your own hosting service, you should consider a way to add it, too.
In the face of parallel development, we need a way to combine work
Let's suppose that, in whatever way, we find ourselves in this situation:
I--J <-- main (HEAD)
/
...--F--G--H
\
K--L <-- origin/main
What we need now is a way to combine our work—the stuff we did to make commits I
and J
—with their work, whoever they are: the stuff they did to make commits K-L
.
Git has many ways of combining work, but we won't go into a lot of detail here. The two principle ways of doing this are with git merge
and with git rebase
. So, after a git fetch
that results in this kind of fork—where we and they both have new commits—we will need a second Git command, probably either git merge
or git rebase
.
The correct choice of second command is partly a matter of opinion. There's no one universally correct choice here. But what git pull
does is this:
You pick a choice in advance, before you even see if you have this kind of "need to combine work" as a result of the git fetch
you are about to run. Note that you have not yet run this git fetch
, even though you're making this decision right now.
Then, having decided, you run a git pull
with either an option or two, to say how to deal with this, or with a configuration setting, to say how to deal with this, or with no options at all, which means use git merge
if necessary.
Your git pull
now runs the git fetch
. This obtains any new commits they have that you don't, and updates your remote-tracking name.3 Then it looks to see if that special second combine-work operation is required. If so, it uses it to combine work. If not, it just does a git checkout
or git switch
to the latest commit while also bringing your current branch name forward.4
3In extremely out of date versions of Git (predating 1.8.4), git pull
doesn't update the remote-tracking name. Be aware of this in case you encounter one of these ancient Git versions.
4There are two things of note here:
Git calls this a fast forward merge. This is not actually a merge, so this is a poor name. (Mercurial just calls it an update.) Since Git 2.0, you can tell git pull
to do only a fast-forward operation: if work-combining is required, git pull
will do the fetch, but then stop with an error. This is probably what git pull
should have done since the beginning, and probably what it will do eventually, but for compatibility reasons, it does not do that today. Edit, July 2022: the day has more or less arrived, and git pull
now sometimes defaults to this. I'll leave out any further detail though.
If you do have the option, and if you like git pull
, I recommend using git pull --ff-only
or configuring pull.ff
to only
, with git config pull.ff only
. (I personally tend to just run git fetch
, then git log
or some similar operation to check, and then run git merge --ff-only
manually, but my habits were set long before Git 2.0.)
The git switch
command was new in Git 2.23. There's no real difference between git switch
and git checkout
for this particular case. The new command was added because the Git folks found that git checkout
is too complicated—it has a lot of modes—and that some of its modes were destructive. This destruction hit even experienced Git users sometimes. (That's been fixed: since 2.23, git checkout
errors out for these cases now.) To make Git more user-friendly, git checkout
got split into two separate commands. It's a good idea to use the new commands, but the old one still works, because Git has to be compatible for a long time.
Summary
Push sends commits and asks them to update their branch. This requires that things be right on their end. This cannot combine parallel development.
Pull gets commits and has your Git update your remote-tracking name, then runs a second Git command to update your branch. The second command can combine parallel development.
You can avoid running the second command immediately by using git fetch
instead of git pull
. This is useful if you want to see what you're dealing with, before you make any decisions about how to use it.