What is the differrence between `* text=auto` and `* text eol=lf` in .gitattributes?
Asked Answered
A

1

11

I am looking again and again at the documentation of .gitattributes but I cannot find a clear answer on what is the differrence between these two:

* text=auto

* text eol=lf

Also is text=auto intended only for use with * or it can also be used with specific extensions? In such a case what is the differrence?

*.txt text=auto

*.txt text eol=lf

Apostil answered 5/10, 2017 at 16:2 Comment(0)
K
22

TL;DR

Before Git 2.36.0, the eol=lf setting overrides any text setting, and since you have chosen to apply this to every path, only the eol=lf setting will matter, if you use that. After Git 2.36.0, the eol=lf only applies if text is set, unspecified, or set to auto and Git determines it is a text file.

Full explanation

Let's start with this and work outwards:

Also is text=auto intended only for use with * or it can also be used with specific extensions?

Patterns may include extensions. The text=auto part is an attribute setting, and the patterns select which attributes to apply to which file(s).

How Git Reads a .gitattributes File

Each line in gitattributes matches, or does not match, some path name such as dir1/dir2/file.ext or README.md or whatever. As the gitattributes documentation says:

Each line in gitattributes file is of form:

pattern attr1 attr2 ...

That is, a pattern followed by an attributes list, separated by whitespaces. Leading and trailing whitespaces are ignored. Lines that begin with # are ignored. Patterns that begin with a double quote are quoted in C style. When the pattern matches the path in question, the attributes listed on the line are given to the path.

Hence, * is the pattern. These "patterns" are the same as those in .gitignore files, except that negative patterns are disallowed. Thus, you can use patterns like *.txt and *.jpg to match file name extensions, or patterns like dir1/* to match files within a specific directory. Both .gitignore and .gitattributes files can be local to a specific directory as well, in which case they apply to files in that directory and its subdirectories, but not to paths higher in the tree.

Now, for text vs text=auto, and for eol=lf or not, we find the following:

Each attribute can be in one of these states for a given path:

Set
The path has the attribute with special value "true"; this is specified by listing only the name of the attribute in the attribute list.

Unset [details snipped, but see below]

Set to a value
The path has the attribute with specified string value; this is specified by listing the name of the attribute followed by an equal sign = and its value in the attribute list.

Unspecified
No pattern matches the path, and nothing says if the path has or does not have the attribute, the attribute for the path is said to be Unspecified.

(The wording on the last one is particularly poor, in my opinion. It really means "of all patterns matching the path," none said anything about this attribute.")

So for text, the attribute is set, and for text=auto, the attribute is set to a value. The value part in this case is auto. Since the pattern is *, it applies to all files.

This same logic applies to the eol=lf item. If, firstly, this eol=lf occurs in some pattern, and secondly, that pattern matches the file in question, then the eol attribute is set to a value, and the value is lf. Since your suggested line was * text eol=lf, this would make eol set to a value, and would make text set, but not set to a value.

If you write, in a single .gitattributes file, the two line sequence:

* text=auto
* text eol=lf

The second line's text overrides the first one's, so that text is set (but not to a value) and eol is set to a value, with the value being lf. Both lines matched, and the second line overrode the first.

If you reverse the two lines:

* text eol=lf
* text=auto

Then again both lines match but now the second line only overrides the text setting, so now you have text set to auto and eol set to lf.

How the text Attribute Applies to Files

The very next section of the gitattributes documentation says:

This attribute [text] enables and controls end-of-line normalization... [If it is]

Set
... enables end-of-line normalization and marks the path as a text file ...

Unset
... tells Git not to attempt any end-of-line conversion upon checkin or checkout ...

Set to string value "auto"
... If Git decides that the content is text ...

Unspecified
... Git uses the core.autocrlf configuration variable ...

(which means you have to go chase down the git config documentation to find out what core.autocrlf does if you leave text unspecified).

You have chosen to either set it for every file or set it to auto for every file. The former means "do conversion for every file" and the latter (the auto setting) means: Hey, Git, please decide for me whether the file is text or not. If you decide that it is text, do the conversion.

How eol=lf Applies to Files

Just below the description for the text setting is this description for the eol setting. Before Git 2.36.0, it read:

This attribute sets a specific line-ending style to be used in the working directory. It enables end-of-line conversion without any content checks, effectively setting the text attribute.

Set to string value "crlf"
... [snipped because you set lf]

Set to string value "lf"
This setting forces Git to normalize line endings to LF on checkin and prevents conversion to CRLF when the file is checked out.

So, if you have eol=lf set for a path (and with * as the pattern, it will be set for every path), Git will treat every file as text and do conversion from CRLF line-endings to LF line-endings on "checkin" (this is poorly phrased, again: the conversion actually occurs during the git add step). Git will do nothing during checkout (this too, is not perfectly phrased: the conversion, or in this case non-conversion, happens during extraction from index to work-tree).

After Git 2.36.0, the description now reads:

This attribute sets a specific line-ending style to be used in the working directory. This attribute has effect only if the text attribute is set or unspecified, or if it is set to auto, the file is detected as text, and it is stored with LF endings in the index.
[rest of the description omitted for brevity]

This means that text is now taken into account for eol. In your case, you either set text or set it to auto. In the first case the eol attribute always applies for matching patterns. In the second case it only applies if git determines that the file is a text file.

If You Use Different Patterns, You Get Different Results

Note that if you choose a pattern like *.txt, then these attributes are set only for paths that match the pattern. For other paths, these attributes remain unset. You should, therefore, look back at the documentation and see what happens when these attributes are unset.

You can, of course, do this:

* -text
*.txt eol=lf

The first line will explicitly unset text on all files, leaving eol unspecified on all files. The second line then sets to a value eol=lf for *.txt files, overriding the unspecified value. Now Git will apply the eol=lf rules to all files whose name matches *.txt, and use the unspecified (eol and unset) text rules for all remaining files.

This special -text syntax is the stuff I snipped above. Using text=false does not unset text, but rather leaves text set to the string value false. This has the same effect as leaving text unspecified (not specifically unset). Using -text gives it the special unset setting.

The difference between an unset text and an unspecified text is that when text is unspecified, Git could attempt to guess (based on the core.* settings like core.autocrlf) whether to do conversions. However, when text is specifically unset, Git will not do any guessing or conversion at all for that file.

Kassey answered 5/10, 2017 at 18:12 Comment(6)
Thank you for the extended explanation. Can I conclude from the above that on this page: help.github.com/articles/dealing-with-line-endings the "text eol=lf: Git will always convert line endings to LF on checkout", is wrong? Should it be, "git will not perform any conversion on checkout", or do they actually have the exact same meaning?Apostil
Yes: with eol=lf the setting for text is irrelevant, and Git will apply the CRLF-to-LF rule on "input" conversion (roughly, at git add time), and no end-of-line changes on "output" conversion (roughly, at git checkout). Note that once there has been any input-side conversion (add and commit), the sacrosanct committed version has LF-only endings, so that "no conversion" and "conversion to LF-only" would have the same effect. This makes it hard to detect that there is no conversion occurring—but the source says "no conversion" here.Kassey
This explanation is very complete and helpful. I think you meant 'the _two line sequence: \n * text=auto \n * text eol=lf' instead of that last text eol being set to elf (unless I'm missing some assembler programming going on here.)Bolling
@bballdave025: oops, yes, that was supposed to be eol=lf, Typos! Will fix.Kassey
I believe this answer is wrong. "eol" is used for checkout/output (working dir), not checkin/input. See this recent documentation: github.com/bk2204/git/commit/….Total
@BrunoGomes: It's very messy, and version-dependent as a bug was fixed in Git 2.10 (Sep 2016). The documentation I used while writing the answer still referred to the pre-2.10 behavior. I'll try to update this answer soon, but it's really tough to describe because every fifth Git release has to fix another bug in it. :-)Kassey

© 2022 - 2024 — McMap. All rights reserved.