The article by the main Git maintainer Junio C. Hamano, is instructive, for grasping the difference between cache and index:
(emphasis mine)
When Linus started writing git, his aim was to allow him to reproduce each and every intermediate state produced by his original "tarball and patches" workflow he used before the BitKeeper days.
Starting from a 2.6.12 tarball, he queues patch-1, patch-2,... so 2.6.12 itself, 2.6.12 with patch-1 applied, 2.6.12 with both patch-1 and patch-2 applied, become three versions.
But this won't obviously scale if you have to shuffle hundreds of patches a day. So he invented "directory cache"; as a concept, this roughly corresponds to "tree" objects in today's git: a collection of records, each of which is a compact representation of what a whole directory structure contains.
The way to build it was to "add the contents to the cache, or update the contents in the cache".
The control directory to host the collection of such version control records was named ".dircache
" (this was renamed to ".git
" after some time).
There was a file called ".dircache/index
", and the contents of this file was read and manipulated in a set of variables in C that were named after a noun, "cache
".
Back then, the concept of what we today call the index, a buffer area to build up the collection of contents you intend to write out as a tree object, was called "cache".
Everybody talked about "cache" and "index" interchangeably, as the file that records what is in the "cache
" was named "index
". It was (and it still is) an index to allow you to find the contents in the cache by giving it a pathname.
As more and more people started using git without having to read its code at all, the use of the word "index" has become more prevalent for obvious reasons.
As something that is on the filesystem, it is much more visible than the variable name in the C source code.
Eventually, we stopped using "cache" as a noun to name what we call "the index" today when explaining the use of git as the end-user.
The word "cache" however is still used as a noun when we want to talk about the internal data structure in the context of discussing git implementation (e.g. "Let's make it possible for programs to work with more than one cache at the same time").
At the end user level, "cache" is only used as an adjective these days; "cached", meaning "contents cached in the index, not the contents in the work tree".
We could have called it "indexed", but "cached contents" was an already established phrase from very early days to mean that exact concept, and we did not need another word that meant the same thing.
[...] In the earlier days, there was a distinction between "adding a new file to the index" and "updating a file that is already in the index with new contents".
[...] Modern (and medieval) versions of git uses "git add
" for both. We could have been just honest and called the act of updating-or-adding-to-the-index "add
", but some people in "git training" industry started teaching the index as "the staging area for the next commit", and as an inevitable consequence, a verb "to stage" started to appear in many documentation to mean "the act of adding contents to the index".
I sometimes use this verb myself, but that is only when I suspect that the audience might have learned git first from these new people. Strictly speaking this is a redundant and fairly recent word in git vocabulary.