What exactly do we mean by "branch"?
Asked Answered
N

2

55

Long story short...

As far as I can tell, the term "branch" (in Git parlance) may refer to related but different things:

  1. a non-symbolic reference/pointer to a commit,
  2. the name of such a reference (e.g. "master"),
  3. the subgraph of the repository's commit DAG composed of all the commits reachable from the commit pointed to by such a reference.

However, I've seen the term used to apparently refer to something other than those three possible usages (more details below). In a Git context, are there other valid and unambiguous usages of the term "branch" that my list above is missing?

More details

After using Git for about a year, I'm preparing a short tutorial for CS students. I really want to nail down the Git terminology, so as to avoid any confusion.

Of course, I've been using Git branches for a while now; I'm comfortable using them and find the Git branching model awesome. However, I still find the term "branch" problematic and ambiguous, because it seems to refer to at least two different things, depending on the context in which it's used... sometimes even in the same tutorial/manual.

Usage 1: branch = pointer/reference to a commit

The Pro Git book (in 3.1 - What a branch is), after showing the following diagram,

enter image description here

goes on to define a branch as

simply a lightweight movable pointer to one of these commits.

As far as I can tell, this is also the meaning "branch" has in the Git man pages.

I'm perfectly comfortable with this definition. I think of a branch as just a reference that points to a particular commit in the DAG, and the "tip commit" of a branch is the commit pointed to by that reference. So far, so good. But wait...

Usage 2: branch = a subgraph of the DAG

The Atlassian Git tutorial introduces branches as follows:

A branch represents an independent line of development.

What they mean by that, I guess, is a string of commits. Let me refine that thought... The only interpretation that makes sense to me is that the term "branch" can also refer to the subgraph of the repository's commit DAG composed of all the commits reachable from the tip commit considered.

However, the Pro Git book, for instance, also contains the following diagram (see 3.4 - Branching workflows),

enter image description here

which seems to contradict my interpretation, because it seems to imply that only commits C2-C5 (not C1) belong to the develop branch, and that only commits C6-C7 (not C1-C5) belong to the topic branch.

I find this usage ambiguous and vague because, if I were to draw the DAG at that stage, without knowing where the branch references have pointed to in the past, and without any assumption of any hierarchy between the three branches, all I would get is

enter image description here

I also find some diagrams in other Git learning resources confusing. Consider, in particular, the following one (taken from the introduction video of the Lynda.com - Git Essential Training):

enter image description here

Here, the tip of master is actually 534de (and HEAD points to master), but the position of the "master" label on the diagram is very misleading. What that label is supposed to describe in this case is unclear to me...

Edit: I've since found this excellent post on Marc's blog; the Branches section echoes my remarks above.

Nelsonnema answered 31/7, 2014 at 20:41 Comment(2)
This is the most helpful question I've ever read about git. I learned something even before reading the answer. Well done.Parisparish
The other way in which the term “branch” defined as “the subgraph of the repository's commit DAG composed of all the commits reachable from the tip commit considered” is problematic is when one encounters merge commits in the chain of reachable commits. All of a sudden we would call a branch something that could split up into several ramifications, when going back in history - which probably wasn't the intention.Faxon
S
12

You are correct.

We can further split your item 1 by separating "local" and "remote" branch labels: local branches (local labels) are names that start (internally—many front-end command hide this) with refs/heads/, while "remote branches"—which are also called "remote-tracking branches"—start with refs/remotes/ and then have one more path component naming the specific remote before the part naming the branch. (Edit, April 2018: I dislike the phrase "remote branch" or "remote-tracking branch"; I think it's better to just call these remote-tracking names. But there is a lot of existing documentation that uses the other two phrases, so we need to be aware of this usage.)

For instance, you are no doubt familiar with refs/remotes/origin/master, but if you have a remote named bob you might also have refs/remotes/bob/hacks/feep that tracks Bob's hacks/feep.

A local branch name refs/heads/branch has the distinguishing feature that git checkout will put you "on" that branch by default, by writing that name into the special HEAD reference; and once it you are set up this way, new commits (created by git commit, git merge, git cherry-pick, etc.) cause the new commit's SHA-1 to be written into the branch-file. (The new commit has as its parent, or one of its parents, the old tip-of-branch.)

I have attempted to use terms like "branch tip" to denote specifically the commit to which a branch name like refs/heads/master points, "branch name" or "local branch name" to refer to the name itself (whether prefixed by refs/heads/ or not), and—I think this is the least successful—"branch structure" to refer to the DAG subset. However, given a DAG with a fork-and-merge like this:

         o--o
        /    \
...-o--o      o--o-...
        \    /
         o--o

I sometimes want to refer to one or the other half of the little benzene-ring-like object as "a branch" as well, and I have no really good term for this.

(Incidentally, if you were a topologist, the fact that the Atlassian diagram can also be drawn linearly would not bother you. However, as the old joke goes, topologists keep trying to drink out of their donuts and eat their coffee mugs since each one is just a torus.)

Salzburg answered 1/8, 2014 at 9:4 Comment(9)
I should have mentioned the distinction between local and remote branches in my question; thanks for adding that. I'm also uncomfortable about the term "branch" (or "branch structure") to refer to a DAG subset. Strictly from a graph-theory perspective, the term "branch" is only valid in case the graph considered is a tree; however, the Git commit DAG is not necessarily a tree.Nelsonnema
Although I also dislike "branch structure", I salute your endeavour to distinguish "branch tip", "branch name", and "branch structure". I think I'll make the distinction in my tutorial as well.Nelsonnema
If you come up with a better term (graphlet? graph chunk?) let me know, I'll probably start using it myself. :-)Salzburg
How about "branch history"? That may be miscontrued as where the branch reference has pointed to in the past (i.e. the contents of branch's reflog), though.Nelsonnema
Hmm, yeah, "branch history" sounds like you're talking about git's reflogs.Salzburg
"Branch ancestry" might be less ambiguous, but doesn't seem that widespread (Google only reports 1300 or so hits on "git 'branch ancestry'", at the time of writing).Nelsonnema
And it has technical merit, since we're looking at the ancestor/descendent relationships of the commit nodes. Though "DAGlet" has the dubious advantage of being easy and kind of fun to say. I wonder if it would work well to describe it as "branch ancestry", pointing to the parent/child relationship, and then just start using "DAGlet" after that :-)Salzburg
You write: I dislike the phrase "remote branch" or "remote-tracking branch"; I think it's better to just call these remote-tracking names [sic]... I'm confused. Do you prefer "remote-tracking branch" to "remote branch" (as I do) or not?Nelsonnema
@jub0bs: I prefer remote-tracking name now, i.e., drop the word "branch" entirely. I wish there were a phrase for "branch name" that didn't use the word "branch" at all too :-)Salzburg
P
6

In the second case, we mean "the commits that are reachable from the commit pointed to by the branch".

In the Pro Git example, assuming the topic branch points to commit C7, that branch contains commits C7, C6, C5, C4, C3, C2, and C1. There is no other notion of a commit being "on" a branch than this in Git, and you are correct that you could redraw the DAG linearly.

The Lynda.com diagram is terribly unclear, and I suspect you're right that it's misleading.

Portauprince answered 31/7, 2014 at 21:15 Comment(2)
Thank you VERY much for the link to the "Think Like a Git" tutorial—most helpful reference I have seen to date for git, and the references section of the site as well.Parisparish
@Wildcard, I'm glad you found it helpful. It's one of my favourite Git references as well.Portauprince

© 2022 - 2024 — McMap. All rights reserved.