What is the purpose of `text=auto` in `.gitattributes` file?
Asked Answered
P

3

177

Mostly .gitattributes file has * text=auto. What is the purpose of text=auto in that file?

Pedrick answered 31/1, 2014 at 5:22 Comment(0)
S
120

From the docs:

Each line in .gitattributes (or .git/info/attributes) file is of form:

pattern attr1 attr2 ...

So here, the pattern is *, which means all files, and the attribute is text=auto.

What does text=auto do? From the documentation:

When text is set to "auto", the path is marked for automatic end-of-line normalization. If Git decides that the content is text, its line endings are normalized to LF on checkin.

What's the default behaviour if it's not enabled?

Unspecified

If the text attribute is unspecified, Git uses the core.autocrlf configuration variable to determine if the file should be converted.

What does core.autocrlf do? From the docs:

   core.autocrlf

Setting this variable to "true" is almost the same as setting the text attribute to "auto" on all files except that text files are not guaranteed to be normalized: files that contain CRLF in the repository will not be touched. Use this setting if you want to have CRLF line endings in your working directory even though the repository does not have normalized line endings. This variable can be set to input, in which case no output conversion is performed.

If you think this all as clear as mud, you're not alone.

Here's what * text=auto does in my words: when someone commits a file, Git guesses whether that file is a text file or not, and if it is, it will commit a version of the file where all CR + LF bytes are replaced with LF bytes. It doesn't directly affect what files look like in the working tree, there are other settings that will convert LF bytes to CR + LF bytes when checking out a file.

Recommendation:

I would not recommend putting * text=auto in the .gitattributes file. Instead, I would recommend something like this:

*.txt text
*.html text
*.css text
*.js text

This explicitly designates which files are text files, which get CRLF converted to LF in the object database (but not necessarily in the working tree). We had a repo with * text=auto, and Git guessed wrong for an image file that it was a text file, causing it to corrupt it as it replaced CR + LF bytes with LF bytes in the object database. That was not a fun one to debug.

If you must use * text=auto, put it as the first line in .gitattributes, so that the later lines can override it. This seems to be becoming an increasingly popular practise.

Sailing answered 24/6, 2016 at 16:8 Comment(14)
Why is everyone calls LF as Normal but not CRLF? is there any ref to prove it?Stupa
@YoushaAleayoub What do you mean?Sailing
@YoushaAleayoub if your everyone refers to git-scm, it's probably because they're developing a *nix package and thus using *nix newline character is normal.Overside
@YoushaAleayoub LF is considered as "normal" b/c it is common in many dev tools. Popular dev tools like git-scm coming from *nix. MacOS uses LF. Only Windows (considering main-stream OSs only) is using CRLF. This makes it harder for devs using *nix tools on Windows and for everyone when exchanging files. See also Why CRLF.Twiggy
@Flimm, can you explain the difference between *.txt text=auto and *.txt text please? I thought all 4 lines in your example above should have been text=auto, not just text after the file extension. KiCad footprint files, for instance (".kicad_mod" extension), are normalized using this line in their gitattributes file: *.kicad_mod text=auto (kicad-pcb.org/libraries/klc/G1.7).Lissie
"I would not recommend putting * text=auto in the .gitattributes file." why? git-scm.com/docs/gitattributes recommends doing this ("see sample around echo "* text=auto" >.gitattributes")Gallivant
@MateuszKonieczny The reason why is explained later in the answer: "We had a repo with * text=auto, and Git guessed wrong for an image file that it was a text file, causing it to corrupt it as it replaced CR + LF bytes with LF bytes in the object database. That was not a fun one to debug."Sailing
@YoushaAleayoub Don't confuse "normalization" / "to normalize" with "normal". "To normalize" means to make uniform and "normalization" is the corresponding process. The resulting uniform state is called "normalized", not "normal". For unification, one has choose (and stick to) some convention. Git's convention (for the object database, not necessarily for the working directory) is LF line endings. See the Mind the End of Your Line blog post.Cosper
(So we can assume no implication that unnormalized is "not normal" or even "abnormal" is intended in the Git documentation.)Cosper
@RoiDanton >"MacOS uses LF". This is provably not true. Mac OS has always used CR. Only with Mac OSX, Apple has broken its standard and and switched from CR to LF.Jemappes
@Sailing Can you explain the difference between *.js text eol=lf and *.js eol=lf?Reims
I suspect the reason it's becoming popular to add * text=auto to .gitattributes is that GitHub has been recommending it in their documentation: Configuring Git to handle line endings Unfortunately they don't mention the risk that Git could miscategorise a binary file as text.Pharisaic
"there are other settings that will convert LF bytes to CR + LF bytes when checking out a file" What are those settings? I suspect it makes sense to use both of these together.Mantle
This is a great explanation for what .gitattributes does. For the recommendation though, I think the other way around is also good. You set * text=auto at the top, then specify below it the binary files (*.jpg, *.png, etc.) and those you want to have a CRLF line ending (*.bat, *.sln, etc.). It probably depends on the project, but it should be hard to have a supposed-to-be binary file escape Git's detection.Antemortem
B
71

It ensures line endings are normalized. Source: Kernel.org

When text is set to "auto", the path is marked for automatic end-of-line normalization. If git decides that the content is text, its line endings are normalized to LF on checkin.

If you want to interoperate with a source code management system that enforces end-of-line normalization, or you simply want all text files in your repository to be normalized, you should instead set the text attribute to "auto" for all files.

This ensures that all files that git considers to be text will have normalized (LF) line endings in the repository.

Baez answered 31/1, 2014 at 5:31 Comment(6)
What you mean by Normalized line ending?Pedrick
When a text file is normalized, its line endings are converted to LF in the repository.Baez
Important to know, this overwrites the local core.autocrlf setting on your machine see this great answer by @Daniel JomphePancreatin
It would be awfully nice if git simply did not $%# with any of the files being checked in to the repository. I"ve worked with SLM, PerForce, MsBuild, Source Depot, TFS, SVM, none of these will change even one byte in any of your files. This is an insidious git hack IMO and it has caused me a lot of pain.Revolting
What happens on checkout is only half the story - what happens upon a get? Would it be right to say that on checkout, line endings stay as LF, even on windows?Mouthpart
@spankmaster79, .gitattributes OVERRIDES core.autocrlf, not overwrites.Arlettaarlette
L
9

That configuration is with regard to how line endings are handled. When enabled, all line endings are converted to LF in the repository. There are other flags to deal with how line endings are converted in your working directory. Full info on the issue us here: https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html

Luau answered 31/1, 2014 at 5:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.