Definition of "downstream" and "upstream"
Asked Answered
J

6

1060

I've started playing with Git and have come across the terms "upstream" and "downstream". I've seen these before but never understood them fully. What do these terms mean in the context of SCMs (Software Configuration Management tools) and source code?

Judicature answered 29/4, 2010 at 17:18 Comment(3)
There are two different contexts for upstream/downstream in git: remotes, and time/history. Upstream/downstream with respect to remotes is, the downstream repo will be pulling from the upstream repo (changes will flow downstream naturally). Upstream/downstream with respect to time/history can be confusing, because upstream in time means downstream in history, and vice-versa (genealogy terminology works much better here - parent/ancestor/child/descendant).Supercilious
Related: What does 'upstream' mean? at OSBasso
Related: Difference between origin and upstream on gitHubConsole
A
839

In terms of source control, you're downstream when you copy (clone, checkout, etc) from a repository. Information flowed "downstream" to you.

When you make changes, you usually want to send them back "upstream" so they make it into that repository so that everyone pulling from the same source is working with all the same changes. This is mostly a social issue of how everyone can coordinate their work rather than a technical requirement of source control. You want to get your changes into the main project so you're not tracking divergent lines of development.

Sometimes you'll read about package or release managers (the people, not the tool) talking about submitting changes to "upstream". That usually means they had to adjust the original sources so they could create a package for their system. They don't want to keep making those changes, so if they send them "upstream" to the original source, they shouldn't have to deal with the same issue in the next release.

Agna answered 29/4, 2010 at 17:36 Comment(8)
"Download" and "upload" are verbs. "Upstream" and "downstream" describe a relative position.Agna
I would say upstream and downstream are adjectivesRescue
They are adjectives when they are used as modifiers, but those terms are often used as nouns.Agna
When "upstream" and "downstream" describe a relative position, I think, technically, that makes them adjectives. This is just a comment on what I understand, not correcting anyone.Petersham
@Petersham words can be used as adjectives and nouns depending on the contextHanuman
In fact, "upstream" and "downstream" can be used not only as adjectives ("an upstream repository") or nouns ("I don't care about the downstream"), but as adverbs (as in "that repository is upstream of mine" or "everything you need is already downstream").Picul
Let us ask in english.stackexchange.com. whether it is noun or adj. or adv :DCosmopolitan
@NusratNuriyev, let's consider "preposition" while we're at it : )Congratulation
S
289

When you read in git tag man page:

One important aspect of git is it is distributed, and being distributed largely means there is no inherent "upstream" or "downstream" in the system.

, that simply means there is no absolute upstream repo or downstream repo.
Those notions are always relative between two repos and depends on the way data flows:

If "yourRepo" has declared "otherRepo" as a remote one, then:

  • you are pulling from upstream "otherRepo" ("otherRepo" is "upstream from you", and you are "downstream for otherRepo").
  • you are pushing to upstream ("otherRepo" is still "upstream", where the information now goes back to).

Note the "from" and "for": you are not just "downstream", you are "downstream from/for", hence the relative aspect.


The DVCS (Distributed Version Control System) twist is: you have no idea what downstream actually is, beside your own repo relative to the remote repos you have declared.

  • you know what upstream is (the repos you are pulling from or pushing to)
  • you don't know what downstream is made of (the other repos pulling from or pushing to your repo).

Basically:

In term of "flow of data", your repo is at the bottom ("downstream") of a flow coming from upstream repos ("pull from") and going back to (the same or other) upstream repos ("push to").


You can see an illustration in the git-rebase man page with the paragraph "RECOVERING FROM UPSTREAM REBASE":

It means you are pulling from an "upstream" repo where a rebase took place, and you (the "downstream" repo) is stuck with the consequence (lots of duplicate commits, because the branch rebased upstream recreated the commits of the same branch you have locally).

That is bad because for one "upstream" repo, there can be many downstream repos (i.e. repos pulling from the upstream one, with the rebased branch), all of them having potentially to deal with the duplicate commits.

Again, with the "flow of data" analogy, in a DVCS, one bad command "upstream" can have a "ripple effect" downstream.


Note: this is not limited to data.
It also applies to parameters, as git commands (like the "porcelain" ones) often call internally other git commands (the "plumbing" ones). See rev-parse man page:

Many git porcelainish commands take mixture of flags (i.e. parameters that begin with a dash '-') and parameters meant for the underlying git rev-list command they use internally and flags and parameters for the other commands they use downstream of git rev-list. This command is used to distinguish between them.

Sandisandidge answered 1/5, 2010 at 7:10 Comment(2)
you pull from upstream, and you push to upstream. pushing to downstream sounds very wrong to meRobustious
@knittl: you are right. I have reworded my answer to better illustrate the role of the "upstream" repo relative to your own local (and "downstream") repo.Sandisandidge
C
91

Upstream (as related to) Tracking

The term upstream also has some unambiguous meaning as comes to the suite of GIT tools, especially relative to tracking

For example :

   $git rev-list --count --left-right "@{upstream}"...HEAD
   >4   12

will print (the last cached value of) the number of commits behind (left) and ahead (right) of your current working branch, relative to the (if any) currently tracking remote branch for this local branch. It will print an error message otherwise:

    >error: No upstream branch found for ''
  • As has already been said, you may have any number of remotes for one local repository, for example, if you fork a repository from github, then issue a 'pull request', you most certainly have at least two: origin (your forked repo on github) and upstream (the repo on github you forked from). Those are just interchangeable names, only the 'git@...' url identifies them.

Your .git/configreads :

   [remote "origin"]
       fetch = +refs/heads/*:refs/remotes/origin/*
       url = [email protected]:myusername/reponame.git
   [remote "upstream"]
       fetch = +refs/heads/*:refs/remotes/upstream/*
       url = [email protected]:authorname/reponame.git
  • On the other hand, @{upstream}'s meaning for GIT is unique :

it is 'the branch' (if any) on 'said remote', which is tracking the 'current branch' on your 'local repository'.

It's the branch you fetch/pull from whenever you issue a plain git fetch/git pull, without arguments.

Let's say want to set the remote branch origin/master to be the tracking branch for the local master branch you've checked out. Just issue :

   $ git branch --set-upstream  master origin/master
   > Branch master set up to track remote branch master from origin.

This adds 2 parameters in .git/config :

   [branch "master"]
       remote = origin
       merge = refs/heads/master

now try (provided 'upstream' remote has a 'dev' branch)

   $ git branch --set-upstream  master upstream/dev
   > Branch master set up to track remote branch dev from upstream.

.git/config now reads:

   [branch "master"]
       remote = upstream
       merge = refs/heads/dev

git-push(1) Manual Page :

   -u
   --set-upstream

For every branch that is up to date or successfully pushed, add upstream (tracking) reference, used by argument-less git-pull(1) and other commands. For more information, see branch.<name>.merge in git-config(1).

git-config(1) Manual Page :

   branch.<name>.merge

Defines, together with branch.<name>.remote, the upstream branch for the given branch. It tells git fetch/git pull/git rebase which branch to merge and can also affect git push (see push.default). \ (...)

   branch.<name>.remote

When in branch < name >, it tells git fetch and git push which remote to fetch from/push to. It defaults to origin if no remote is configured. origin is also used if you are not on any branch.

Upstream and Push (Gotcha)

take a look at git-config(1) Manual Page

   git config --global push.default upstream
   git config --global push.default tracking  (deprecated)

This is to prevent accidental pushes to branches which you’re not ready to push yet.

Colicroot answered 5/6, 2011 at 17:14 Comment(1)
Excerpt of git branch --help as of 2018: As this option had confusing syntax, it is no longer supported. Please use --track or --set-upstream-to instead.Flywheel
P
65

That's a bit of informal terminology.

As far as Git is concerned, every other repository is just a remote.

Generally speaking, upstream is where you cloned from (the origin). Downstream is any project that integrates your work with other works.

The terms are not restricted to Git repositories.

For instance, Ubuntu is a Debian derivative, so Debian is upstream for Ubuntu.

Parabolize answered 30/4, 2010 at 21:6 Comment(0)
M
60

Upstream Called Harmful

There is, alas, another use of "upstream" that the other answers here are not getting at, namely to refer to the parent-child relationship of commits within a repo. Scott Chacon in the Pro Git book is particularly prone to this, and the results are unfortunate. Do not imitate this way of speaking.

For example, he says of a merge resulting a fast-forward that this happens because

the commit pointed to by the branch you merged in was directly upstream of the commit you’re on

He wants to say that commit B is the only child of the only child of ... of the only child of commit A, so to merge B into A it is sufficient to move the ref A to point to commit B. Why this direction should be called "upstream" rather than "downstream", or why the geometry of such a pure straight-line graph should be described "directly upstream", is completely unclear and probably arbitrary. (The man page for git-merge does a far better job of explaining this relationship when it says that "the current branch head is an ancestor of the named commit." That is the sort of thing Chacon should have said.)

Indeed, Chacon himself appears to use "downstream" later to mean exactly the same thing, when he speaks of rewriting all child commits of a deleted commit:

You must rewrite all the commits downstream from 6df76 to fully remove this file from your Git history

Basically he seems not to have any clear idea what he means by "upstream" and "downstream" when referring to the history of commits over time. This use is informal, then, and not to be encouraged, as it is just confusing.

It is perfectly clear that every commit (except one) has at least one parent, and that parents of parents are thus ancestors; and in the other direction, commits have children and descendants. That's accepted terminology, and describes the directionality of the graph unambiguously, so that's the way to talk when you want to describe how commits relate to one another within the graph geometry of a repo. Do not use "upstream" or "downstream" loosely in this situation.

[Additional note: I've been thinking about the relationship between the first Chacon sentence I cite above and the git-merge man page, and it occurs to me that the former may be based on a misunderstanding of the latter. The man page does go on to describe a situation where the use of "upstream" is legitimate: fast-forwarding often happens when "you are tracking an upstream repository, you have committed no local changes, and now you want to update to a newer upstream revision." So perhaps Chacon used "upstream" because he saw it here in the man page. But in the man page there is a remote repository; there is no remote repository in Chacon's cited example of fast-forwarding, just a couple of locally created branches.]

Monicamonie answered 27/9, 2012 at 14:21 Comment(5)
The git-rebase man page also suffers from this overloading: the commit that is checked out before rebasing is termed the "upstream". This, too, may have affected Chacon's usage.Pater
@Pater strange - in the git html documentation, the branch checked out before rebasing is referred to as <branch>.Papa
Good point. Would be kind of helpful to gather common "git-terminology" somewhere. Especially for newbies ( or ppl contributing to git ). Would have saved me good time getting used to the wording of the git man pages.Wanids
Came here from the git-rebase docs because I was totally confused why a commit ref would be called "upstream" there (in fact, I was doubting myself as I haven't seen this terminology before). Thanks @Pater & @Monicamonie for clearing things up!Dunbar
git-scm docs/ref/book is actively harmful to understanding what are already questionable design decisions (e.g. overloaded methods, inconsistent syntax).Nursling
W
9

Using the analogy of a river, we can follow a resource upstream from us until we find the headwaters (the source of a stream or river).

Continuing with the river analogy, downstream is the direction that the water in a river flows. Downhill.

So, if I fork someone's project, The project I forked is upstream. And my fork is downstream.

if someone forks my forked project, Then my fork becomes upstream relative to the fork of my project.

And the fork of my fork becomes downstream.

Example Time!

Suppose Project B forked Project A and Project C forked Project B.

Then, Project A is the upstream project.

Project B is the downstream project relative to Project A.

Project B is the upstream project relative to Project C.

Project C is the downstream project relative to Project B.

And the circle of life continues.

NOTE: Please note that, this is a rather common development style in open source projects to create a fork of a project, fix a bug or add a feature in that fork and then submit a patch to the original project.

Also note that, a clear lesson from the “quality movement” and statistical process control is that interventions that fix quality problems at their source are almost always a better investment than repeated work to fix problems that were preventable. So please contribute patches (send Pull requests).

Whimsy answered 19/8, 2021 at 15:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.