Git: Removing carriage returns from source-controlled files
Asked Answered
V

4

24

I've got a Git repository that has some files with DOS format (\r\n line endings). I would like to just run the files through dos2unix (which would change all files to UNIX format, with \n line endings), but how badly would this affect history, and is it recommended at all?

I assume that the standard is to always use UNIX line endings for source-controlled files, and optionally switch to OS-specific line endings locally?

Visualize answered 18/3, 2010 at 0:54 Comment(1)
Related question for people interested in this: #446744Visualize
P
14

The approach you’ll have to use depends on how public your repository is.

If you don’t mind or care about changing all SHAs because you’re more or less the only one using it but want to have this issue sorted out for all times, you can run a git filter-branch and apply dos2unix to all files in each commit. (If you’re sharing the repository, everyone else needs more or less to completely renew it, so this is potentially dangerous.)

So the better option and also an easier way would be to change it only in the current heads. This means that your past commits still have \r\n endings but unless you’re doing much cherry-picking from the past this should not be a problem. The diff tools might complain a bit more often, of course, but normally you’ll only diff with commits in the vicinity, so this issue resolves itself as the commits accumulate.

And UNIX line endings are standard, you’re correct about that. Best approach is to setup your editor to only write these endings even on windows. Otherwise, there is also a autocrlf setting which you can use.


Addition to the history rewriting part:

Last time I did the same, I used the following command to change all files to unix endings.

#!/bin/bash
all2dos() { find * -exec dos2unix {} \; }
export -f all2dos
git filter-branch -f --tree-filter 'all2dos' --tag-name-filter cat --prune-empty -- --all
Puffin answered 18/3, 2010 at 1:5 Comment(5)
Thanks. Right now I'm the only person working on the repository, since it's pretty "young", so rewriting history shouldn't be a problem. But how well would git filter-branch play with github (I've put the repository on there)?Visualize
I think, you’d have to delete all branches and tags on github to ensure that they can be created again. (It might work without that, but maybe it’s better to start anew.) Alternatively, you delete the whole repo and then just push it again. This should be find with github unless some people have cloned from it. Then they will need to do the same, depending on how fluent they are with git.Puffin
Alright. I just removed the repository and re-pushed it with the reworked history. I needed to fix some issues with some old commit messages being multi-line too, anyways.Visualize
The code you posted didn't work well for me, so I wrote the following: git filter-branch --tree-filter 'grep -Irl --exclude-dir=.git "" . | xargs sudo dos2unix -p' HEADVisualize
Can git recognize if a file is a text file or not? Coz dos2unix doesn't work on binary files, so how does this work while running in a GIT repo that contains text files as well as binary files?Quadragesima
H
37

This crlf thing drove us crazy when we converted from svn to git (in a central (bare) like) scm environment. The thing that ultimately got us was we copied the global .gitconfig file to everyone's user root (yep both windows and linux) with the initial one coming from a Windows system and having core.autocrlf=true and core.safecrlf=false which played havoc on the linux users (like bash scripts didn't work and all those awful ^M's). So we initially did a checkout and clone script that did a dos2unix after these commands. Then I ran across the core.autocrlf and core.safecrlf config items and set them based on the O/S:

Windows: core.autocrlf=true and core.safecrlf=false Linux: core.autocrlf=input and core.safecrlf=false

These were set with: ---on Windows---

git config --global core.autocrlf true
git config --global core.safecrlf false

---on Linux---

git config --global core.autocrlf input
git config --global core.safecrlf false

Then for our Linux developers we setup a little bash script /usr/local/bin/gitfixcrlf:

#!/bin/sh
# remove local tree
git ls-files -z | xargs -0 rm
# checkout with proper crlf
git checkout .

Which they only had to run on their local sandbox clones once. Any future cloning was done correctly. Any future push pulls now were handled correctly. So, this solved our multiple O/S issues with linefeeds. Also Note that Mac falls in the same config as Linux.

Hach answered 25/10, 2010 at 21:2 Comment(0)
P
14

The approach you’ll have to use depends on how public your repository is.

If you don’t mind or care about changing all SHAs because you’re more or less the only one using it but want to have this issue sorted out for all times, you can run a git filter-branch and apply dos2unix to all files in each commit. (If you’re sharing the repository, everyone else needs more or less to completely renew it, so this is potentially dangerous.)

So the better option and also an easier way would be to change it only in the current heads. This means that your past commits still have \r\n endings but unless you’re doing much cherry-picking from the past this should not be a problem. The diff tools might complain a bit more often, of course, but normally you’ll only diff with commits in the vicinity, so this issue resolves itself as the commits accumulate.

And UNIX line endings are standard, you’re correct about that. Best approach is to setup your editor to only write these endings even on windows. Otherwise, there is also a autocrlf setting which you can use.


Addition to the history rewriting part:

Last time I did the same, I used the following command to change all files to unix endings.

#!/bin/bash
all2dos() { find * -exec dos2unix {} \; }
export -f all2dos
git filter-branch -f --tree-filter 'all2dos' --tag-name-filter cat --prune-empty -- --all
Puffin answered 18/3, 2010 at 1:5 Comment(5)
Thanks. Right now I'm the only person working on the repository, since it's pretty "young", so rewriting history shouldn't be a problem. But how well would git filter-branch play with github (I've put the repository on there)?Visualize
I think, you’d have to delete all branches and tags on github to ensure that they can be created again. (It might work without that, but maybe it’s better to start anew.) Alternatively, you delete the whole repo and then just push it again. This should be find with github unless some people have cloned from it. Then they will need to do the same, depending on how fluent they are with git.Puffin
Alright. I just removed the repository and re-pushed it with the reworked history. I needed to fix some issues with some old commit messages being multi-line too, anyways.Visualize
The code you posted didn't work well for me, so I wrote the following: git filter-branch --tree-filter 'grep -Irl --exclude-dir=.git "" . | xargs sudo dos2unix -p' HEADVisualize
Can git recognize if a file is a text file or not? Coz dos2unix doesn't work on binary files, so how does this work while running in a GIT repo that contains text files as well as binary files?Quadragesima
S
4

For the continuing solution, have a look at the core.autocrlf (and core.safecrlf) config parameters.

Doing this once to your whole repository will just create one commit that's pretty impossible to merge with (since every line in those files will be modified), but once you get past it, it should be no big deal. (Yes, you could use git filter-branch to make the modification all the way through history, but that's a bit scary.)

Sevenup answered 18/3, 2010 at 1:5 Comment(0)
E
0

If your list of version controlled files includes binaries, or you can't change history easily... here is a handy dandy one-liner:

https://unix.stackexchange.com/a/365679/112190

Erie answered 17/5, 2017 at 17:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.