Rather than directly answering the question itself—see VonC's answer to the linked question for that—let's concentrate on this:
I currently have a repository with multiple corrupted files which I want to convert into crlf representation.
First, let's note that none of these options can change any existing commit. This is a fundamental Git property: once made, no existing commit can be altered. What you can do is make new commits. That's usually not too big a deal since usually we just want new stuff to be correct (but see git filter-branch
, which copies commits after applying filters to their contents, and can be used to re-copy an entire repository: the new repo is no longer compatible with the old one, but you can "fix history" this way).
Next, I think this is the key to understanding all of these end of line / CRLF attribute options: transformations are applied to files when they move into or out of the index.
Remember that Git's index is where you build the next commit. The contents of the index are initially the same as whatever commit is current: you run git checkout master
, for instance, and Git resolves the name master
to a commit-ID and copies that particular commit to your work-tree—but the copy goes through the index.
In other words, Git first finds that file foo.txt
is in the commit (and needs to be extracted). So Git moves that version of foo.txt
to the index. The index's version exactly matches the HEAD
commmit's version. Git does not apply any filters to the index version, nor change any line endings.
Once the index version is updated, Git copies that version of the file from the index to the work-tree.1 Some transformations take place now, during this extraction process. If there is a smudge filter, Git applies it now. If there are line-ending conversions to make, Git applies those now.
The work-tree file may, during this process, become different from the index version. Now Git has a problem, because now the file is "dirty" (modified in the work-tree). This is where things get particularly confusing, although most of the time, the details here are invisible.
Eventually, after working with your work-tree, you may run git add
on some file path-name (or use git add -a
or whatever to add many files). This copies the file from the work-tree, into the index.2 More transformations happen now, during this copy: if there is a clean filter, Git applies it now. If there are line-ending conversions to make, Git applies them now.
In other words, after git add
-ing these files, the index version may not match the work-tree version. However, Git marks the index version as "matching" anyway. A git status
will skip right over the work-tree version, because Git now claims that the index version matches the work-tree version. It sort of does, because the index version matches what would be added if you ran git add
again.
The actual implementation uses time stamps, usually with one-second resolution. Git will continue to believe that the index version matches the work-tree version unless and until the OS touches the time-stamp on the work-tree version of the file. This is true even if you change the set of filters and/or line-ending conversions to apply. Git doesn't realize that you have changed the way the line endings should work, or changed the "clean" filter to do something different: it just sees that the index's "cache" aspect says "I match work-tree version time-stamp T". As long as the work-tree version's time-stamp is still T, the file must be "clean".
Hence, to update these things after changing any text-conversion settings, you need to make Git realize that the file is not clean. You can touch <path>
to set a new time-stamp of "now", which won't match the older time stamp in the index. Now git add -a
(or whatever) will scan as usual, but since the time stamps don't match, it will find the file this time, and will re-filter it to add it to the index.
Again, these transformations occur when you git add
the file.
Normally, on a Windows-like system, your goal here will be to take LF-only repository-format files and turn them into CR-LF files for Windows to deal with. That transformation occurs on the way out of the index, to the work-tree: i.e., during git checkout
. Then you would want to transform these CR-LF work-tree files into LF-only format during the git add
process, so that the in-repository form is the way Linux (and Linus Torvalds and hence Git :-) ) prefer them. But you can store them inside the repository in CR-LF format, if you really want to annoy all the Unix/Linux folks. It's all a matter of which transforms, if any, you apply at which steps: git checkout
time, and git add
time.
The .gitattributes
file specifies which transforms to apply to which files. The core.autocrlf
and core.eol
settings don't: Git must make its best guess about which files get which transformations at which step.
1Technically, all that's in the index is the hash ID of the file. The file itself is stored as a Git blob object in the repository database. Just as with commit objects, these blob objects are immutable. That's why it cannot be changed in the index: it's really just a hash ID.
2The git add
process simply writes a new blob, with the new blob written after any filtering. If the new blob exactly matches some existing blob, bit-for-bit, the new blob re-uses the existing blob's database entry and hash ID, and is not actually saved—the existing blob suffices. If not, the blob's data gets stored as a new file, with a new ID. It's the new hash ID that goes into the index.
core.eol
andcore.autocrlf
quite detailed. – Kiersten