Part of your question is:
Do these lines represent branches? If so, how to figure out which branch does each line represent?
I mentioned in two comments that they do represent ... something we might well call branches:
The word branch tends to lose all meaning in Git so it's not really clear when to use the word at all, other than as a modifier to name. If we say that master
is a branch name, and develop
is a branch name, and so on, these are pretty clear. Technically master
is shorthand for refs/heads/master
, which is a ref or reference: it's a way to hold a commit hash ID.1
But we also need a way to refer to some group of commits. We tend to call that a branch as well.
Git offers remote-tracking names such as origin/master
. We often call the set of commits found with such a name a remote branch, and Git itself calls these things remote-tracking branch names even though they're not branch names.2
I find that the word branch is overused to the point of often becoming meaningless. But the lines represent links from commits to other, earlier commits.
We find these links by going from commit to commit. When we find the starting commit, for following these commit-to-commit links, using a branch name, we call this a branch. But this isn't the same thing as the branch name, and the problem here becomes clear when we look at how these evolve over time.
Let's start with a repository with just three commits in it, all on master
or main
, and draw those three commits using uppercase letters instead of the big ugly hash IDs that Git really uses for them:
A <-B <-C <--main
The name main
holds the hash ID of the last of the three commits, commit C
. Commit C
itself holds the raw hash ID of earlier commit B
, and commit B
holds the raw hash ID of still-earlier commit A
. We say that C
points to B
, and B
points to A
—and of course main
itself points to C.
Commit A
was the very first commit. There is no earlier commit! So commit A
doesn't point anywhere. This makes it a root commit and Git can stop going backwards here.
We now add a new branch name to our collection of commits. This new name also points to commit C
, like this:
A--B--C <-- main, develop
We pick one name to use with git checkout
:
A--B--C <-- main, develop (HEAD)
We're now using commit C
, via the name develop
. If we git checkout main
we're still using C
, just via a different name, but we'll stick with develop
for now.
Each commit holds a snapshot of every file, frozen for all time, in a special, read-only, Git-only, compressed and de-duplicated form. So the fact that most of the files in C
are probably identical to those in B
means that C
doesn't take much extra space. If B
and C
share a 100 MB file unchanged, there's only one copy of that file.
But these files can't be used by anything else. So git checkout
has to copy them out to a usable form. That's what is in your working tree: the usable-form copies. We won't go any further with this idea, but it's worth keeping in mind and revisiting later.
Anyway, let's now make a new commit D
, in the usual way. Commit D
will hold a (de-duplicated) snapshot of every file as usual, and will point backwards to existing commit C
. When Git has finished making D
, Git will update the current branch name, whatever that is—based on the attachment of HEAD
—to point to D
, like this:
A--B--C <-- main
\
D <-- develop (HEAD)
Commits A-B-C
are now on main
, right? And D
is only on develop
. Well, the last part is right—but commits A-B-C
are also on develop
.
The lines your viewer draws, that connect D
backwards to C
and so on, don't tell you which branch something is on, in part because the branches don't matter. Only the commits matter. Commit C
matters, and can be found from both names, so it's on both branches. That one line linking C
backwards to B
represents two branches right now.
We might go on and make a few more commits:
A--B--C <-- main
\
D--E--F <-- develop (HEAD)
It's tempting to think of these as separate, as if develop
has only D-E-F
. That's not the case in Git: the branch names only matter for getting started. We never know when to stop, except if we hit a root commit like A
.
To get Git to stop earlier than that, we use expressions like main..develop
. This is shorthand for develop ^main
. This uses set theory: via develop
, we find all commits, and then via main
we find the A-B-C
set, and we subtract the excluded set—the ^main
part—from the whole. That leaves us with D-E-F
, which are the commits we might want to think of as being "on branch develop
". But to get there, we had to say: commits that are on develop
, minus commits that are on main
because A-B-C
are on both branches.
Now we may go on to merge the new commits into main
. When we do that, we can let Git do a fast-forward, which is not a merge at all:
git checkout main && git merge --ff-only develop
results in:
A--B--C--D--E--F <-- develop, main (HEAD)
Now all six commits are on both branches. There's now only a single line, instead of two lines: we don't have to break up our drawing to have main
point to C
any more.
Or, we can use an explicit merge, or maybe make a commit on main
first:
git checkout main; (... make new commit ...); git merge develop
which results in:
A--B--C------G--H <-- main (HEAD)
\ /
D--E--F <-- develop
where commit H
is a merge commit. There are two backwards-pointing arrows coming out of H
: one connects to earlier commit G
(if we made it, or C
directly if not), and the other points to commit F
. Now all commits are on main
. The merge commit H
is only on main
(at least for now), but because it points backwards to both ... whatever-we-call-a-set-of-commits, all those commits are reachable from commit H
, so all commits are "on" main
. There are two lines, but only one branch used to find them so far.
Of course there's also that name develop
, which finds commit F
, which finds commit E
, and so on, all the way back to A
. Commit F
is therefore on branch develop
, just like it used to be. It's just that it's on branch main
too now.
But: now we can delete the name develop
entirely, with git branch -d develop
. When we do, we are left with this:
A--B--C------G--H <-- main (HEAD)
\ /
D--E--F
All the commits remain. There are still two lines. But now there is only one branch name involved.
This is why I like to say that branch names do not matter—other than for finding the one commit of course. We need some way to find commit H
. As long as we have that, we find all the earlier commits too, because they're linked, one—or for merge commit H
, two—at a time, backwards.
The lines in your visualizer represent these linkages. The linkages are branches, or aren't branches, or are whatever you like, depending on how you want to view them.
1We also end up with, in .git/config
, a configuration section:
[branch "master"]
remote = origin
merge = refs/heads/master
for instance, so there's more to a branch name than just the commit hash ID. But these two entries are mostly-static: the remote
and merge
settings change if we run git branch --set-upstream-to
, for instance, to change the upstream setting of master
.
2Running git checkout origin/master
and then git status
, we see that this produces a detached HEAD status, rather than an on branch origin/master
status. So origin/master
—whose full spelling is refs/remotes/origin/master
—is not a branch name. Since the word branch is so badly beaten-up from overuse in Git, calling this a remote-tracking branch name is, I think, a bad idea. The phrase remote-tracking name works just as well for identifying what kind of name this is.