Why does Git want to correct my line endings to CRLF, even though I want them to be in LF?
Asked Answered
K

1

6

Working with a relatively large project, the policy to checkout CRLF and commit LF is used. To do so my system uses:

git config --global core.autocrlf true

However when committing a file, in this case the .gitattributes file, a warning is returned:

LF would be replaced by CRLF in .gitattributes

The .gitattributes file itself contains the line * text=auto !eol and the file itself uses LF line endings.

Why is this happening? Why does Git tell me to be careful as it will convert LF to CRLF, even though I want this file to be normalized with LF endings in the repository?

I must be missing something entirely obvious, since I have been through:

And more, but this is still not working the way I thought it did.

Kong answered 11/4, 2019 at 12:56 Comment(2)
How about using git config --global core.autocrlf false, and using only .gitattributes core.eoldirectives? Especially considering core.autocrlf true override .gitattributes directives... See Git 2.21: https://mcmap.net/q/950763/-why-does-gitattributes-not-override-core-autocrlf-configuration-on-linux. See also https://mcmap.net/q/11257/-handeling-line-ending-in-git.Langland
Will look into it.Kong
R
5

Let's look at this in several parts:

  • !eol has no function here. This sets eol to unspecified, but that's already the default, and an unspecified value of eol does not disable LF-to-CRLF translation.

  • Since you did specify text=auto, Git will check whether the contents of .gitattributes appear to be text or binary, and of course they should appear to be text.

Hence this particular entry tells Git that it should perform translations on .gitattributes.

Meanwhile, it's useful to realize that line-ending transforms are a special case of the general clean-and-smudge-filter concept. VonC's accepted answer at your third link has a nice drawing of the way the smudge filter works, but lacks one for how the clean filter works, so let's dive into this, with a bit of background.

Git-ified ("freeze-dried") vs work-tree ("rehydrated") files, and the index

Git's normal1 atomic unit of storage is the commit. A commit holds a full snapshot of your source tree (plus the commit metadata that I won't go into here). For many good reasons, the files within a commit are kept in a compressed, frozen, read-only, and Git-only storage format. I've lately taken to calling these files freeze-dried. This helps to distinguish them from files that you actually work with / on.

Like everything inside Git's internal key-value object database, these commits and their files are all read-only. That means they're preserved forever (or as long as the commit itself continues to exist), which is great for archival, but completely useless for getting any new work done. So Git has to provide a way to "rehydrate" the files, turning them into ordinary files you can work with.

Your work-tree is where Git puts the rehydrated files. They have their ordinary form, in ordinary files under ordinary names. Every program on your computer can deal with them, and you can manipulate them as you please.

Git could stop here: you'd have your frozen committed files, and your malleable work-tree files, and Git would build new commits from the work-tree. Mercurial, which in many ways is quite similar to Git, does stop here. But Git doesn't stop here. Instead, it goes on to throw into the mix an intermediary, sitting between the current frozen commit and the work-tree. This intermediary is Git's index. Git sometimes calls this the staging area, or the cache, depending on who / which part of Git documentation is doing the calling. All three are names for the same entity, though.

The index / staging-area simply holds an extra copy of every file. The format of this extra copy is the freeze-dried, internal, Git-only storage format. Files in this format are automatically shared across all commits that have the same file, so this means that when the copy that's in the index is the same as the copy in any commit, it's actually shared with that commit.

This also means that git commit, which has to freeze-dry each file to store it forever, really has almost zero work to to: the files are already freeze-dried! The freeze-drying process took place earlier, when you ran git add. That's what gets Git much of its speed. It's also why Git keeps requiring that you git add all the time.2 Note that it means that when you run git commit, Git doesn't even need to look at your work-tree. (It still does a quick half-of-git status run by default though, to create the comment text for your commit message.)


1I say normal here because Git also offers low-level access to simple key-value storage through what it calls blob objects. To use this, though, you must resort to using some of the so-called plumbing commands, rather than the ones that are, at least in theory, user-friendly. :-)

2Mercurial, which uses the work-tree as the proposed next commit, doesn't require you to keep hg add-ing your files. Once you've done the initial hg add, an hg commit scans your work-tree and commits whatever you have changed. This is much friendlier to newcomers, but it also means that in a big project, when you run hg commit, be prepared to wait.


The role of the index / staging area in line-ending transformations

Remember that the index stores freeze-dried, Git-ified copies of each file. This means that that the index-to-work-tree "rehydration" step is a great place to do any transformations you want done. This is where the smudge filters in the linked answer come in: the smudge filter can modify the committed text so that the work-tree text is more useful.

Likewise, the work-tree-to-index "freeze-dry" step—the one that occurs when you run git add—is a great place to do any transformations you want done. This is where the clean filters come in: the clean filters can remove stuff that shouldn't go into the actual commit in the repository.

Line ending transformations, in Git, are just special cases of clean and smudge filters. A freeze-dried, in-repository file can have any line endings you like.3 When we have Git copy that file from the index / staging area, to the work-tree, during a git checkout, we can have Git change those line endings from LF-only to CRLF, for instance. When we have Git copy that file from the work-tree, to the index / staging area, we can have Git change those line endings from CRLF to LF-only.

And that's the default for CRLF transformations for a text file. Those transformations will change LF-only freeze-dried files to CRLF rehydrated files, and will change CRLF rehydrated files to LF-only freeze-dried files.

You are supposed to Get a warning whenever Git can detect that this might do something different from what is already being done. So, suppose that the file in .gitattributes in your work-tree right now has LF-only line endings. Suppose further that the freeze-dried copy in the commit and/or in the index/staging-area also has LF-only line endings. And suppose the directives say that index -> work-tree should change LF-only to CRLF: why, then, something's hinky, and Git should warn.

I have found that these warnings are sometimes a little trigger-happy. I can't pin that to specific cases in specific Git versions, because I myself do my best to never, ever let Git fiddle with my data. I want the work-tree copy to match the freeze-dried copy, all the way through, every time, because I avoid OSes that require silly line-ending special-ness. But the above is the general rule, and the warning you are getting now makes sense: the actual freeze-dried files and the work-tree files all have LF-only line endings right now, but your settings tell Git that text from .gitattributes should have been converted to have CRLF line endings in your work-tree.


3And Linus Torvalds demands that you shall like LF-only line endings. :-) Kidding aside, Git sort of prefers this. If you disable all transformations—by not enabling CRLF at all, or by marking all files as -text, Git will store—permanently!—whatever line ending you say. If you then change your mind, you are stuck with the line endings you already froze because nothing in any commit can ever be changed. If those commits are wrong, the only thing you can do is stop using them. You can make new, improved, corrected ones and use those instead.

I think it's these "frozen committed copy is wrong because it has CRLF endings" cases that usually trigger bogus CRLF line ending warning issues. Since I don't actually use the line-ending-transforming code myself, it's hard to be sure about that.

Retrocede answered 11/4, 2019 at 16:55 Comment(2)
This makes sense to me. Thank you very much, this issue was driving me insane today.Kong
An extra thank you for the highly detailled answer!Kong

© 2022 - 2024 — McMap. All rights reserved.