What is difference between autocrlf and eol
Asked Answered
P

1

3

I'm reading git documentation about .gitattributes to fix my problems with mixed line endings and find out that there is two similar settings.

AUTOCRLF:

End-of-line conversion While Git normally leaves file contents alone, it can be configured to normalize line endings to LF in the repository and, optionally, to convert them to CRLF when files are checked out.

If you simply want to have CRLF line endings in your working directory regardless of the repository you are working with, you can set the config variable "core.autocrlf" without using any attributes.

[core] autocrlf = true This does not force normalization of text files, but does ensure that text files that you introduce to the repository have their line endings normalized to LF when they are added, and that files that are already normalized in the repository stay normalized.

And EOL:

This attribute sets a specific line-ending style to be used in the working directory. It enables end-of-line conversion without any content checks, effectively setting the text attribute.

Set to string value "crlf" This setting forces Git to normalize line endings for this file on checkin and convert them to CRLF when the file is checked out.

Set to string value "lf" This setting forces Git to normalize line endings to LF on checkin and prevents conversion to CRLF when the file is checked out.

Backwards compatibility with crlf attribute For backwards compatibility, the crlf attribute is interpreted as follows:

crlf text

-crlf -text

crlf=input eol=lf

It seems that both are doing the same, but there is something about compatibility. Does it mean, that autocrlf is deprecated and the new flavor is eol or something? I currently have a repository with multiple corrupted files which I want to convert into crlf representation. And you see that documentation confuse us instead of clarify things.

What should I apply in this situation?

Posology answered 20/2, 2017 at 8:52 Comment(1)
This question is similar but I'm not sure if it's a duplicate. On the other hand, the answer from VonC explains the difference between core.eol and core.autocrlf quite detailed.Kiersten
P
8

Rather than directly answering the question itself—see VonC's answer to the linked question for that—let's concentrate on this:

I currently have a repository with multiple corrupted files which I want to convert into crlf representation.

First, let's note that none of these options can change any existing commit. This is a fundamental Git property: once made, no existing commit can be altered. What you can do is make new commits. That's usually not too big a deal since usually we just want new stuff to be correct (but see git filter-branch, which copies commits after applying filters to their contents, and can be used to re-copy an entire repository: the new repo is no longer compatible with the old one, but you can "fix history" this way).

Next, I think this is the key to understanding all of these end of line / CRLF attribute options: transformations are applied to files when they move into or out of the index.

Remember that Git's index is where you build the next commit. The contents of the index are initially the same as whatever commit is current: you run git checkout master, for instance, and Git resolves the name master to a commit-ID and copies that particular commit to your work-tree—but the copy goes through the index.

In other words, Git first finds that file foo.txt is in the commit (and needs to be extracted). So Git moves that version of foo.txt to the index. The index's version exactly matches the HEAD commmit's version. Git does not apply any filters to the index version, nor change any line endings.

Once the index version is updated, Git copies that version of the file from the index to the work-tree.1 Some transformations take place now, during this extraction process. If there is a smudge filter, Git applies it now. If there are line-ending conversions to make, Git applies those now.

The work-tree file may, during this process, become different from the index version. Now Git has a problem, because now the file is "dirty" (modified in the work-tree). This is where things get particularly confusing, although most of the time, the details here are invisible.

Eventually, after working with your work-tree, you may run git add on some file path-name (or use git add -a or whatever to add many files). This copies the file from the work-tree, into the index.2 More transformations happen now, during this copy: if there is a clean filter, Git applies it now. If there are line-ending conversions to make, Git applies them now.

In other words, after git add-ing these files, the index version may not match the work-tree version. However, Git marks the index version as "matching" anyway. A git status will skip right over the work-tree version, because Git now claims that the index version matches the work-tree version. It sort of does, because the index version matches what would be added if you ran git add again.

The actual implementation uses time stamps, usually with one-second resolution. Git will continue to believe that the index version matches the work-tree version unless and until the OS touches the time-stamp on the work-tree version of the file. This is true even if you change the set of filters and/or line-ending conversions to apply. Git doesn't realize that you have changed the way the line endings should work, or changed the "clean" filter to do something different: it just sees that the index's "cache" aspect says "I match work-tree version time-stamp T". As long as the work-tree version's time-stamp is still T, the file must be "clean".

Hence, to update these things after changing any text-conversion settings, you need to make Git realize that the file is not clean. You can touch <path> to set a new time-stamp of "now", which won't match the older time stamp in the index. Now git add -a (or whatever) will scan as usual, but since the time stamps don't match, it will find the file this time, and will re-filter it to add it to the index.

Again, these transformations occur when you git add the file.


Normally, on a Windows-like system, your goal here will be to take LF-only repository-format files and turn them into CR-LF files for Windows to deal with. That transformation occurs on the way out of the index, to the work-tree: i.e., during git checkout. Then you would want to transform these CR-LF work-tree files into LF-only format during the git add process, so that the in-repository form is the way Linux (and Linus Torvalds and hence Git :-) ) prefer them. But you can store them inside the repository in CR-LF format, if you really want to annoy all the Unix/Linux folks. It's all a matter of which transforms, if any, you apply at which steps: git checkout time, and git add time.

The .gitattributes file specifies which transforms to apply to which files. The core.autocrlf and core.eol settings don't: Git must make its best guess about which files get which transformations at which step.


1Technically, all that's in the index is the hash ID of the file. The file itself is stored as a Git blob object in the repository database. Just as with commit objects, these blob objects are immutable. That's why it cannot be changed in the index: it's really just a hash ID.

2The git add process simply writes a new blob, with the new blob written after any filtering. If the new blob exactly matches some existing blob, bit-for-bit, the new blob re-uses the existing blob's database entry and hash ID, and is not actually saved—the existing blob suffices. If not, the blob's data gets stored as a new file, with a new ID. It's the new hash ID that goes into the index.

Palestra answered 20/2, 2017 at 18:2 Comment(6)
Great answer, but core.autocrlf in gitattributes and eol settings (*.txt text eol=crlf) are designed for very this issue - EOL transformation during extracting out and adding in index. I just was confused about two settings for same thing. After some research I found out that latter setting just allowing more precise configuration. You are configuring files by extension rather than doing it for all files in repository. autocrlf already corrupted some files in my repository (xls for example), when I can mark them as binary with eolPosology
If you use *.blah text=auto, then yes, you are doing it by extension (and auto-classification). If you use foo.* text=auto you are doing it by prefix, and if you use * text=auto you are explicitly calling for it globally, rather than depending on core.autocrlf (whose setting is outside the Git-controlled files within the repository). I think the fact that core.autocrlf is not part of the version-controlled files can be a key reason to use it, or to avoid it (personally I avoid it just as I avoid Windows :-) ).Palestra
I expected that core.autocrlf can be stored in gitattributes just like text=.... However, I didn't try it and stucked with new shiny eol.Posology
@AlexZhukovskiy: no, core.autocrlf is a git config setting, and those cannot be stored in the repository.Palestra
Ok, I got it. Then, the last question: is there any difference between -text and binary settings here? Because I found the former on git documentation page while the latter is used in every .gitattributes I see.Posology
According to gitattributes docs, binary is a "macro" that expands to -text -diff. This is not directly related, but there are two places where the literal string binary is meaningful, inside a diff driver (see the discussion near the end of the textconv section) and as a merge driver (see "Built-in merge drivers").Palestra

© 2022 - 2024 — McMap. All rights reserved.