Summary
You need to wrap your head around the idea that Git stores at least three, and sometimes up to five active copies of each file: one in the current commit, one (or two or three!) in the index, and one—the only one you can see and work with—in your work-tree. The git ls-files
command looks at these copies, then tells you something about some of them, depending on the flags you supply to git ls-files
.
Without this idea of three-to-five copies of each file, lots of things in Git will never make any sense. (Well, some things are still tricky even with it, but that's another problem entirely. 😀)
Long
I think there are two issues here. One requires some terminology and then the other should fall into place:
Does [git ls-files
] show files from the local repository,
Sort of, but:
the staging repository,
Git does not have a staging repository. Each repository has something that is called, in different Git documentation, either the index or the staging area. (There's an obsoleted third name, cache, that also appears in the Git glossary.)
the remote repository
Definitely not: there need not be any remote repositories—i.e., other Gits with their own repositories—at all, and if there are, only git fetch
and git push
have your Git call up their Git and exchange data with them. (Well, git ls-remote
does the first little bit of git fetch
, and git pull
runs git fetch
, so these two also exchange data with a remote. But git ls-files
doesn't.)
or from somewhere else?
Yes, sort of. That gets us back to the first part. So let's take these three bits of terminology as defined in the Git glossary. Italic (including bold italic) text in below is directly from the linked documentation:
repository
A collection of refs together with an object database containing all objects which are reachable from the refs, possibly accompanied by meta data from one or more porcelains. A repository can share an object database with other repositories via alternates mechanism. (all links theirs)
This of course is full of yet more terminology. To attempt to de-mystify it a bit, what they're saying here is that the repository proper doesn't include the index and work-tree: it's mostly made up of the commits (and their contents). Of course, that requires that we define "index" and "work-tree", so let's move on to:
index
A collection of files with stat information, whose contents are stored as objects. The index is a stored version of your working tree. Truth be told, it can also contain a second, and even a third version of a working tree, which are used when merging.
working tree (I usually call this work-tree):
The tree of actual checked out files. The working tree normally contains the contents of the HEAD commit’s tree, plus any local changes that you have made but not yet committed.
Commits are frozen forever
When you run git commit
, Git makes a snapshot of all of your files—well, all of your tracked files, anyway—and stores that, plus some metadata like your name and email address, in a commit. This commit is mostly permanent—you can get rid of commits, usually with a fair bit of difficulty, but just think of them as permanent for convenience—and is totally, completely, 100% read-only. It's read-only like this on purpose, because that allows other commits to share identical copies of files, so that if you commit the same file once, ten times, or even a million times, there's really only one copy of that file in the repository. It's only when you change the file to a new version that Git has to commit a new, separate copy.
The commits are numbered, but not by a nice easy sequential numbering system. That is, we might draw them as a series of simple numbered or lettered things:
... <-C4 <-C5 <-C6 ...
where each later commit points back to its immediate predecessor. But their actual names are big ugly hash IDs. Each one is guaranteed to be unique, which is why they have to be so big and ugly and random-looking. Each hash ID is actually a cryptographic checksum, calculated over the commit's contents, so that every Git everywhere in the universe will agree that that commit, and only that commit, gets that checksum. That's the other reason you—and even Git—can't change it: if you take a commit out of the repository database, tinker with it, and change even one single bit and then put it back into the database, what you get is a new commit with a new and different hash ID.
So commits are totally frozen, forever. The files inside them are frozen forever as well, and compressed, and in a special Git-only format. I like to call these files "freeze-dried". What this means is that, hey, they're great for archiving, but they are utterly useless for getting any new work done ... and that means that Git must provide some way of taking these freeze-dried files and rehydrating them into a useful form.
The work-tree provides the useful-form copies
Things don't really get much simpler than this: the work-tree has the useful-form, rehydrated copies of your files. Because they're just ordinary everyday files on your computer, you can see them, use them, change them around however you like, and otherwise work with them. They're technically not in the repository at all—they are more just right next to it. In a typical setup, the repository itself is in the .git
directory/folder of the top level of your work-tree.
Obviously, if there's a commit you've extracted to make the work-tree, there must now be two copies of each file: the freeze-dried committed one, plus the regular working one. Git could stop here. Mercurial does stop here: if you use Mercurial instead of Git, you don't need to concern yourself with a third copy, because there is no third copy. But Git goes on to store yet more copies of the files.
The index / staging-area sits between the commit and the work-tree
What Git does here is to interpose a third copy of each file, between the freeze-dried committed copy and the work-tree copy. This third copy is in the committed-file format—i.e., pre-dehydrated–but by not being in a commit, it's not actually totally frozen: it can be replaced at any time. That's what git add
does: git add
takes the ordinary copy of the file from the work-tree, compresses it down into the freeze-dried format, and replaces the copy that's in the index. Or, if the file wasn't in the index at all, it puts a copy into the index.
This is why you have to git add
files all the time. In Mercurial, you only hg add
a file once. After that, you just run hg commit
, and Mercurial looks at all the files it knows about, and freezes them into a new commit. This can take a long time, in a big repository. Git, by contrast, already has all the files it's supposed to know about, and already dehydrated, in the index, so git commit
can just package up those dehydrated files into a new frozen commit. The cost of this speed is git add
, but if you get into playing clever tricks with the index copies—e.g., using git add -p
—you get more benefits than just the speedup.
As the Git glossary mentioned in its description of the index, the index takes on an expanded role during a conflicted merge. When you do a merge operation—whether that's from git merge
, or from git revert
or git cherry-pick
or any other Git command that uses the merge engine—and it doesn't go smoothly, Git winds up putting all three inputs for each file into the index, so that instead of just one copy of file.ext
, you get three. But as long as you're not in the middle of a merge, there's only one copy in the index.
Usually the index copy matches the HEAD
frozen copy, or matches the work-tree copy, or both. For instance, after a fresh git checkout
, all three copies match. Then you modify file.ext
in the work-tree: now the commit and the index match, but they're not the same as the work-tree copy. Then you git add file.ext
, and now the index and work-tree match, but they're different from the frozen copy. Then you git commit
to make a new commit, which becomes the current commit, and all three copies match again.
Note that you can modify the work-tree copy:
vim file.ext
then copy the updated one into the index:
git add file.ext
then edit it again:
vim file.ext
and that way, you can make all three copies different. If you do that, git status
will say that you have changes staged for commit, because the index copy is different from the current-commit copy, and say that you have changes not staged for commit, because the work-tree copy is different from the index copy.
The work-tree can contain files that aren't in the index at all
The index is initially just a copy of the current commit. Git then also copies those files to the work-tree, so that you can use them. But you can create files in the work-tree and not run git add
on them. Those files aren't in the index now, and if you run git commit
, they won't be in the new commit either, because Git builds the new commit from the index.
You can also remove files from the index, without removing them from the work-tree:
git rm --cached file.ext
removes the index copy. It can't touch the current commit frozen copy, of course, but if you now make a new commit, the new commit won't have file.ext
in it at all. (The previous commit still does, of course.)
Any file that is in your work-tree right now, and is not in your index right now, is an untracked file. Its untracked-ness comes from the fact that it's not in your index. Put that file into your index and it's tracked, no matter how you got it into your index. Remove it from your index and it's untracked, no matter how you got it out of your index. So that's the last role of the index: to determine which files are tracked, and will therefore be in the next commit.
Now we can see clearly what git ls-files
does
What git ls-files
does is to read everything: the commit, the index, and the work-tree. Depending on what arguments you give to git ls-files
, it then prints the names of some or all files that are in the index and/or in the work-tree:
git ls-files --stage
lists the files that are in the index / staging-area, along with their staging slot numbers. (It says nothing about the copies in the HEAD
commit and the work-tree.) Or:
git ls-files --others
lists the (names of the) files that are in the work-tree, but not in the index. (It says nothing about the copies in the HEAD
commit.) Or:
git ls-files --modified
lists the (names of the) files that are in the index and are different from their copies in the HEAD
commit (or aren't in the HEAD
commit at all). With no options:
git ls-files
lists the (names of the) files that are in the index, with no regard for what files are in the HEAD
commit or the work-tree.
git status
showing? – Acidulousgit help ls-files
says >git-ls-files - Show information about files in the index and the working tree
– Teamster