Trying to fix line-endings with git filter-branch, but having no luck
Asked Answered
Q

9

287

I have been bitten by the Windows/Linux line-ending issue with git. It seems, via GitHub, MSysGit, and other sources, that the best solution is to have your local repos set to use linux-style line endings, but set core.autocrlf to true. Unfortunately, I didn't do this early enough, so now every time I pull changes the line endings are borked.

I thought I had found an answer here but I can't get it to work for me. My Linux command line knowledge is limited at best, so i am not even sure what the "xargs fromdos" line does in his script. I keep getting messages about no such file or directory existing, and when I manage to point it to an existing directory, it tells me I don't have permissions.

I've tried this with MSysGit on Windows and via the Mac OS X terminal.

Quickie answered 2/10, 2009 at 17:12 Comment(2)
I can't upvote this thread even nearly enough. +1 ++ for it providing the best answer on the matter.Meal
Agree with Charles. However, in my case (using Mac OS X 10.8) > git config core.autocrlf false worked, not > git config core.autocrlf inputKatekatee
L
204

The git documentation for gitattributes now documents another approach for "fixing" or normalizing all the line endings in your project. Here's the gist of it:

$ echo "* text=auto" >.gitattributes
$ git add --renormalize .
$ git status        # Show files that will be normalized
$ git commit -m "Introduce end-of-line normalization"

If any files that should not be normalized show up in git status, unset their text attribute before running git add -u.

manual.pdf -text

Conversely, text files that git does not detect can have normalization enabled manually.

weirdchars.txt text

This leverages a new --renormalize flag added in git v2.16.0, released Jan 2018.
But it may fail if you have "un-staged deleted files", hence stage those first, like:

git ls-files -z --deleted | xargs -0 git add

For older versions of git, there are a few more steps:

$ echo "* text=auto" >>.gitattributes
$ rm .git/index     # Remove the index to force git to
$ git reset         # re-scan the working directory
$ git status        # Show files that will be normalized
$ git add -u
$ git add .gitattributes
$ git commit -m "Introduce end-of-line normalization"
Library answered 13/1, 2011 at 18:32 Comment(13)
Could you tell me what the purpose of the git reset is, please?Pax
forces git to rebuild the index, during which it scans each file to make a guess about whether its binary. The rm deletes the old index, reset builds the new index.Library
Whenever I've done this I've omitted the git reset bit because the results have always been the same whether I did it or not. The git status after rm .git/index shows me all the files that need to be normalised, whether I reset or not, which is why I asked. Perhaps it's not necessary with new versions?Pax
Hm. It seems that by doing git add . I was effectively doing the same thing.Pax
Thanks, this worked for me. A useful command after running git status is to run git diff --ignore-space-at-eol just to be sure that the only changes you are committing are the line endings.Forbade
Great solution. I had to change my core.safecrlf=true setting temporarily to false before running git add -u, then change it back to true. I did this in the repos I was doing this in by running git config core.safecrlf false && git add -u && git config --unset core.safecrlf in place of the $ git add -u line that Russ has above. This left my global setting as core.safecrlf=true because my one-liner only sets the local (non-global) git config, then unsets it when done.Sever
This approach looks to be better since it doesn't try to "fix" the binary files.Arcuation
Note: The only "real" difference between this and the "old" solution is in the presence of .gitattributes (with the appropriate content). Without this, git reset will detect no modifications, and is thus useless.Etheleneethelin
Important: with this new procedure you can even git reset --hard as third step. Contrary to what one would expect, a hard reset still indicates a modified state of files (git status) and a hard reset can ONLY be effective as intended when .gitattributes is deleted or parts of it are outcommented (git 1.9).Etheleneethelin
I'm a newbe using Git (used SVN for years until now). Two questions: a) the reset or reset --hard will erase all my changes, Right? b) If I "a" is true, and I backup my changes, I will re-introduce the problem all over again upon restoring, Right?Sherr
It depends what changes you are talking about. This procedure it best performed when your working directory is clean. In other words, before starting this, run git status. If it shows any pending changes, commit them first. This procedure will then result in a new commit on top of that which just normalizes your line endings. You won't lose any other changes.Library
Migrated project from TFS to Git and major problems med line endings, suddenly change files which could not be undo'ed etc. Tried a bunch of solutions, but this actually solved it!Lelahleland
The instructions on the gitattributes page have been updated to take advantage of the --renormalize flag added in git v2.16.0 which was released in January 2018. The --renormalize flag consolidates the process of re-processing line endings for each tracked file into a single command: git add --renormalize ..Fullmouthed
H
407

The easiest way to fix this is to make one commit that fixes all the line endings. Assuming that you don't have any modified files, then you can do this as follows.

# From the root of your repository remove everything from the index
git rm --cached -r .

# Change the autocrlf setting of the repository (you may want 
#  to use true on windows):
git config core.autocrlf input

# Re-add all the deleted files to the index
# (You should get lots of messages like:
#   warning: CRLF will be replaced by LF in <file>.)
git diff --cached --name-only -z | xargs -0 git add

# Commit
git commit -m "Fixed crlf issue"

# If you're doing this on a Unix/Mac OSX clone then optionally remove
# the working tree and re-check everything out with the correct line endings.
git ls-files -z | xargs -0 rm
git checkout .
Hudnut answered 2/10, 2009 at 19:3 Comment(11)
I got the same recommendation from Github. Not in love with it, but rewriting history also makes me nervous. I may just go with this solution for now.Quickie
I'm actually going to mark your answer the fix because it was the one I went with. It was the only one that actually did seem to get all files at once. Others seemed to only get a few files at a time and each pull and checkout exposed more files that needed to be changed. Thanks!Quickie
P.S. I recommended your fix to the guys at github.com and they updated their help guide to use your solution (previously it had just recommended a fresh clone and a hard reset, which did not seem to get all files.) help.github.com/dealing-with-lineendingsQuickie
You may also want to check out config.safecrlf to ensure that you aren't changing crlfs in non-text files (such as binary). Check it out in the docs kernel.org/pub/software/scm/git/docs/git-config.html.Scale
@vrish88: If you're in this situation, though, you're likely to be suffering from mixed lined endings and core.safecrlf may actually prevent you from doing what you need to do. It's probably easier to not use safecrlf. git doesn't often get binary file detection wrong and if it does you can manually mark it as binary with a .gitattribute and recover the correct version from the previous commit.Hudnut
Seems you want autocrlf=input on OSX/Unix and autocrlf=true on WindowsBrandabrandais
The newer solution recommended in Russ Egan's answer below is simpler and does not involve scary things like deleting all your source code, so I would really recommend people use that, even though this old solution has 10 times as many votes!Release
Just be careful when some files (like test data) intentionally have mixed line endings.Boxer
Seconding Russ Egan's answer below. I had bad line endings from a TFS -> GIT migration that I had to fix. The approach above missed a large number of files while it appears that Russ's did not. Unfortunately I don't know enough about Git to know why that is.Dagmar
Two notes: 1) you may want to do git commit ... --no-verify to skip pre-commit hooks, 2) I've noticed (on Windows) that generally I have to repeat the procedure twice to really normalize all files (yep it's pretty weird). See my answerJaborandi
This steps will help you to fix eol issue but notice that files history (git blame) will be corrupted.Playwright
L
204

The git documentation for gitattributes now documents another approach for "fixing" or normalizing all the line endings in your project. Here's the gist of it:

$ echo "* text=auto" >.gitattributes
$ git add --renormalize .
$ git status        # Show files that will be normalized
$ git commit -m "Introduce end-of-line normalization"

If any files that should not be normalized show up in git status, unset their text attribute before running git add -u.

manual.pdf -text

Conversely, text files that git does not detect can have normalization enabled manually.

weirdchars.txt text

This leverages a new --renormalize flag added in git v2.16.0, released Jan 2018.
But it may fail if you have "un-staged deleted files", hence stage those first, like:

git ls-files -z --deleted | xargs -0 git add

For older versions of git, there are a few more steps:

$ echo "* text=auto" >>.gitattributes
$ rm .git/index     # Remove the index to force git to
$ git reset         # re-scan the working directory
$ git status        # Show files that will be normalized
$ git add -u
$ git add .gitattributes
$ git commit -m "Introduce end-of-line normalization"
Library answered 13/1, 2011 at 18:32 Comment(13)
Could you tell me what the purpose of the git reset is, please?Pax
forces git to rebuild the index, during which it scans each file to make a guess about whether its binary. The rm deletes the old index, reset builds the new index.Library
Whenever I've done this I've omitted the git reset bit because the results have always been the same whether I did it or not. The git status after rm .git/index shows me all the files that need to be normalised, whether I reset or not, which is why I asked. Perhaps it's not necessary with new versions?Pax
Hm. It seems that by doing git add . I was effectively doing the same thing.Pax
Thanks, this worked for me. A useful command after running git status is to run git diff --ignore-space-at-eol just to be sure that the only changes you are committing are the line endings.Forbade
Great solution. I had to change my core.safecrlf=true setting temporarily to false before running git add -u, then change it back to true. I did this in the repos I was doing this in by running git config core.safecrlf false && git add -u && git config --unset core.safecrlf in place of the $ git add -u line that Russ has above. This left my global setting as core.safecrlf=true because my one-liner only sets the local (non-global) git config, then unsets it when done.Sever
This approach looks to be better since it doesn't try to "fix" the binary files.Arcuation
Note: The only "real" difference between this and the "old" solution is in the presence of .gitattributes (with the appropriate content). Without this, git reset will detect no modifications, and is thus useless.Etheleneethelin
Important: with this new procedure you can even git reset --hard as third step. Contrary to what one would expect, a hard reset still indicates a modified state of files (git status) and a hard reset can ONLY be effective as intended when .gitattributes is deleted or parts of it are outcommented (git 1.9).Etheleneethelin
I'm a newbe using Git (used SVN for years until now). Two questions: a) the reset or reset --hard will erase all my changes, Right? b) If I "a" is true, and I backup my changes, I will re-introduce the problem all over again upon restoring, Right?Sherr
It depends what changes you are talking about. This procedure it best performed when your working directory is clean. In other words, before starting this, run git status. If it shows any pending changes, commit them first. This procedure will then result in a new commit on top of that which just normalizes your line endings. You won't lose any other changes.Library
Migrated project from TFS to Git and major problems med line endings, suddenly change files which could not be undo'ed etc. Tried a bunch of solutions, but this actually solved it!Lelahleland
The instructions on the gitattributes page have been updated to take advantage of the --renormalize flag added in git v2.16.0 which was released in January 2018. The --renormalize flag consolidates the process of re-processing line endings for each tracked file into a single command: git add --renormalize ..Fullmouthed
J
13

My procedure for dealing with the line endings is as follows (battle tested on many repos):

When creating a new repo:

  • put .gitattributes in the very first commit along with other typical files as .gitignore and README.md

When dealing with an existing repo:

  • Create / modify .gitattributes accordingly
  • git commit -a -m "Modified gitattributes"
  • git rm --cached -r . && git reset --hard && git commit -a -m 'Normalize CRLF' -n"
    • -n (--no-verify is to skip pre-commit hooks)
    • I have to do it often enough that I defined it as an alias alias fixCRLF="..."
  • repeat the previous command
    • yep, it's voodoo, but generally I have to run the command twice, first time it normalizes some files, second time even more files. Generally it's probably best to repeat until no new commit is created :)
  • go back-and-forth between the old (just before normalization) and new branch a few times. After switching the branch, sometimes git will find even more files that need to be renormalized!

In .gitattributes I declare all text files explicitly as having LF EOL since generally Windows tooling is compatible with LF while non-Windows tooling is not compatible with CRLF (even many nodejs command line tools assume LF and hence can change the EOL in your files).

Contents of .gitattributes

My .gitattributes usually looks like:

*.html eol=lf
*.js   eol=lf
*.json eol=lf
*.less eol=lf
*.md   eol=lf
*.svg  eol=lf
*.xml  eol=lf

To figure out what distinct extensions are tracked by git in the current repo, look here

Issues after normalization

Once this is done, there's one more common caveat though.

Say your master is already up-to-date and normalized, and then you checkout outdated-branch. Quite often right after checking out that branch, git marks many files as modified.

The solution is to do a fake commit (git add -A . && git commit -m 'fake commit') and then git rebase master. After the rebase, the fake commit should go away.

Jaborandi answered 4/12, 2015 at 13:39 Comment(2)
I thought I was going crazy, until I read your post, because I had to run the specified sequence of commands several times too. Voodoo! ;)Prefigure
With git version 2.7.0.windows.1, I used the following: git rm --cached -r . && git reset --hard && git add . && git commit -m "Normalize EOL" -nPrefigure
F
4
git status --short|grep "^ *M"|awk '{print $2}'|xargs fromdos

Explanation:

  • git status --short

    This displays each line that git is and is not aware of. Files that are not under git control are marked at the beginning of the line with a '?'. Files that are modified are marked with an M.

  • grep "^ *M"

    This filters out only those files that have been modified.

  • awk '{print $2}'

    This shows only the filename without any markers.

  • xargs fromdos

    This takes the filenames from the previous command and runs them through the utility 'fromdos' to convert the line-endings.

Fairminded answered 9/3, 2012 at 12:36 Comment(1)
This is awesome. Thank you. For anyone looking for solution using Homebrew use dos2unix instead of fromdos.Comptometer
S
4

Here's how I fixed all line endings in the entire history using git filter-branch. The ^M character needs to be entered using CTRL-V + CTRL-M. I used dos2unix to convert the files since this automatically skips binary files.

$ git filter-branch --tree-filter 'grep -IUrl "^M" | xargs -I {} dos2unix "{}"'
Starrstarred answered 2/4, 2015 at 20:26 Comment(1)
G
3

The "| xargs fromdos" reads from standard input (the files find finds) and uses it as arguments for the command fromdos, which converts the line endings. (Is fromdos standard in those enviroments? I'm used to dos2unix). Note that you can avoid using xargs (especially useful if you have enough files that the argument list is too long for xargs):

find <path, tests...> -exec fromdos '{}' \;

or

find <path, tests...> | while read file; do fromdos $file; done

I'm not totally sure about your error messages. I successfully tested this method. What program is producing each? What files/directories do you not have permissions for? However, here's a stab at guessing what your it might be:

One easy way to get a 'file not found' error for the script is by using a relative path - use an absolute one. Similarly you could get a permissions error if you haven't made your script executable (chmod +x).

Add comments and I'll try and help you work it out!

Gallery answered 2/10, 2009 at 17:50 Comment(1)
I saw another example with dos2unix and I thought this was somehow copying files into a folder named that, but now I get it. Wow, seems obvious now. Thanks for your help!Quickie
P
1

okay... under cygwin we don't have fromdos easily available, and that awk substeb blows up in your face if you have any spaces in paths to modified files (which we had), so I had to do that somewhat differently:

git status --short | grep "^ *M" | sed 's/^ *M//' | xargs -n 1 dos2unix

kudos to @lloyd for the bulk of this solution

Piano answered 30/4, 2012 at 11:44 Comment(0)
S
0

I had the same problem in one of my repos. If you are using both windows and linux systems for the same code repo and pulling and pushing simultaneously, try this:

First, set your git config as follows for windows:

git config --global core.autocrlf true

This will make sure to convert CRLF to LF when writing into the object database and then again replace LF with CRLF when writing out into the working directory. As a result, your repo will be safe with only one type of line endings and locally you'll have windows line ending on the windows system.

For linux/MAC set the git config as follows:

git config --global core.autocrlf input

This will make sure to convert CRLF to LF when writing into the object database but will not do the reverse, preserving LF which is needed for linux/MAC.

For the wrong line endings that are already there on your linux/MAC use dos2unix

For MAC:

brew install dos2unix # Installs dos2unix Mac
find . -type f -exec dos2unix {} \; # recursively removes windows related stuff

For Linux:

sudo apt-get install -y dos2unix # Installs dos2unix Linux
sudo find . -type f -exec dos2unix {} \; # recursively removes windows related stuff

Hope this solves your problem.

Scarlett answered 9/5, 2022 at 19:17 Comment(0)
H
-3

Follow these steps if none of other answers works for you:

  1. If you are on Windows, do git config --global core.autocrlf true; if you are on Unix, do git config core.autocrlf input
  2. Run git rm --cached -r .
  3. Delete the file .gitattributes
  4. Run git add -A
  5. Run git reset --hard

Then your local should be clean now.

Hesson answered 6/5, 2014 at 20:1 Comment(2)
Really? Deleting .gitattributes file is the solution to line endings problem?Demetriusdemeyer
Yes please address the comment by @AleksandrMNeisa

© 2022 - 2024 — McMap. All rights reserved.