Convert line-endings for whole directory tree (Git)
Asked Answered
U

9

227

Following situation:

I'm working on a Mac running OS X and recently joined a project whose members so far all use Windows. One of my first tasks was to set up the codebase in a Git repository, so I pulled the directory tree from FTP and tried to check it into the Git repo I had prepared locally. When trying to do this, all I got was this

fatal: CRLF would be replaced by LF in blog/license.txt.

Since this affects all files below the "blog" folder, I'm looking for a way to conveniently convert ALL files in the tree to Unix line-endings. Is there a tool that does that out of the box or do I get scripting something myself?

For reference, my Git config concerning line-endings:

core.safecrlf=true
core.autocrlf=input
Undirected answered 15/8, 2011 at 17:15 Comment(0)
S
377

dos2unix does that for you. Fairly straight forward process.

dos2unix filename

Thanks to toolbear, here is a one-liner that recursively replaces line endings and properly handles whitespace, quotes, and shell meta chars.

find . -type f -exec dos2unix {} \;

If you're using dos2unix 6.0 binary files will be ignored.

Spake answered 15/8, 2011 at 17:20 Comment(15)
find blog -type f | xargs dos2unix should be faster. You don't need the -name *.* either, unless you specifically want only files with a period somewhere in the name. That's a windows glob, not a *nix one.Xerophilous
Piping find to xargs will fail if find matches any files with whitespace, quotes, or other shell meta characters in their path. At the least use find blog -type f -print0 | xargs -0 dos2unix to handle the case of whitespace. You have to use find's -exec instead of piping to avoid quotes, etc.. The dos2unix man page doesn't specify what its behavior is if you invoke it on binary files. If it converts CRLF in binary files, it will corrupt them. See my answer for a safer, albeit longer alternative.Portuguese
@lukmdo which is not the version installed on centos 6.4..... which does clobber them.... instead I had to d/l from here rpmfind.net/linux/rpm2html/search.php?query=dos2unixTheodoratheodore
Addendum: The dos2unix CLI is most easily installed via Homebrew (and not npm).Heartwood
How would one ignore directories using this approach, if possible?Charge
I needed to use dumb quotes after all because my file names contained (at least) commas: find . -type f -exec dos2unix "{}" \;.Treat
KDiff3 tells me that this command changed encoding for me from UTF-8-BOM to System (?), which wasn't great. Though it looks like the default encoding for dos2unix is ascii, which is should be UTF-8? Any idea what happened there?Eustache
Worth noting to not run this tool directly in your repo, otherwise it will destroy your .git and you will have to clone from origin :)Vince
@DenisDrescher Could please explain what does {} and \ mean ?Bloodless
@Bloodless The man page can explain it better: “The expression must be terminated by a semicolon (;). If you invoke find from a shell you may need to quote the semicolon if the shell would otherwise treat it as a control operator. If the string {} appears anywhere in the utility name or the arguments it is replaced by the pathname of the current file.”Treat
Windows / PowerShell users, see this script which also uses dos2unix - #26676861Pilate
I had to do the opposite change, unix2dos also exist and works the sameChilt
You can use the one-liner git grep --cached -Ilz '' | xargs -0 dos2unix to change all git managed text files and will not include anything else if you're in a git project.Caffeine
Why using additional commands like find, perl or xargs if you can use just dos2unix? Like this: dos2unix *Lovellalovelock
You could also simply use find blog -type f -exec dos2unix {} + which will place as much files as possible instead of the placeholder, but could result in multiple invocations based on how many matches are found. For "few" files this results in a single invocation, too (like xargs). It's specific to GNU find, however.Monandrous
P
63

Assuming you have GNU grep and perl this will recursively convert CRLF to LF in non-binary files under the current directory:

find . -type f -exec grep -qIP '\r\n' {} ';' -exec perl -pi -e 's/\r\n/\n/g' {} '+'

How it Works

Find recursively under current directory; change . to blog or whatev subdirectories to limit the replacement:

find .

Only match regular files:

  -type f

Test if file contains CRLF. Exclude binary files. Runs grep command for every regular file. That's the price of excluding binaries. If you have an old grep you could try building a test using the file command:

  -exec grep -qIP '\r\n' {} ';'

Replace CRLF with LF. The '+' with the second -exec tells find to accumulate matching files and pass them to one (or as few as possible) invocations of the command -- like piping to xargs, but without problems if file path contains spaces, quotes, or other shell meta characters. The i in -pi tells perl to modify the file in place. You could use sed or awk here with some work, and you'll probably change '+' to ';' and invoke a separate process for each match:

  -exec perl -pi -e 's/\r\n/\n/g' {} '+'
Portuguese answered 23/9, 2011 at 19:6 Comment(8)
In case it helps anyone: grep -qIP '\r\n' never matches anything on my CentOS system. Changing it to grep -qIP '\r$' worked.Grecian
Hate to ask in comments, but is there a way to exclude a folder like node_modules?Charge
@Charge take a look at #4210542 for how to modify the find portion of the command to exclude directories. They suggest using -path, but you can also use -regex or -iregex, i.e. -not -regex '.*/node_modules/.*' which will exclude a node_modules at any depth.Portuguese
Sorry if I come off as a regex or bash noob, but what about multiple exclusions, say node_module and dist for example?Charge
GNU grep is required for the -P flag. OS X switched from GNU grep to BSD grep. Some alternatives for OS X: #16658833Portuguese
I also needed to use "\r$" on Linux Mint as @SteveOnorato suggested. WeirdFriedrich
Thanks for the detailed explanation of how it works. That helps us understand and learn better than just copying and pasting!Elburt
So, in linux mint the command becomes find . -type f -exec grep -qIP '\r$' {} ';' -exec perl -pi -e 's/\r$/\n/g' {} '+'Ernest
S
35

Here's a better option: Swiss File Knife. It works recursively across sub-directories, and handles properly spaces and special characters.

All you have to do is:

sfk remcr -dir your_project_directory

Bonus: sfk also does lots of other conversions. See below for the full list:

SFK - The Swiss File Knife File Tree Processor.
Release 1.6.7 Base Revision 2 of May  3 2013.
StahlWorks Technologies, http://stahlworks.com/
Distributed for free under the BSD License, without any warranty.

type "sfk commandname" for help on any of the following.
some commands require to add "-help" for the help text.

   file system
      sfk list       - list directory tree contents.
                       list latest, oldest or biggest files.
                       list directory differences.
                       list zip jar tar gz bz2 contents.
      sfk filefind   - find files by filename
      sfk treesize   - show directory size statistics
      sfk copy       - copy directory trees additively
      sfk sync       - mirror tree content with deletion
      sfk partcopy   - copy part from a file into another one
      sfk mkdir      - create directory tree
      sfk delete     - delete files and folders
      sfk deltree    - delete whole directory tree
      sfk deblank    - remove blanks in filenames
      sfk space [-h] - tell total and free size of volume
      sfk filetime   - tell times of a file
      sfk touch      - change times of a file

   conversion
      sfk lf-to-crlf - convert from LF to CRLF line endings
      sfk crlf-to-lf - convert from CRLF to LF line endings
      sfk detab      - convert TAB characters to spaces
      sfk entab      - convert groups of spaces to TAB chars
      sfk scantab    - list files containing TAB characters
      sfk split      - split large files into smaller ones
      sfk join       - join small files into a large one
      sfk hexdump    - create hexdump from a binary file
      sfk hextobin   - convert hex data to binary
      sfk hex        - convert decimal number(s) to hex
      sfk dec        - convert hex number(s) to decimal
      sfk chars      - print chars for a list of codes
      sfk bin-to-src - convert binary to source code

   text processing
      sfk filter     - search, filter and replace text data
      sfk addhead    - insert string at start of text lines
      sfk addtail    - append string at end of text lines
      sfk patch      - change text files through a script
      sfk snapto     - join many text files into one file
      sfk joinlines  - join text lines split by email reformatting
      sfk inst       - instrument c++ sourcecode with tracing calls
      sfk replace    - replace words in binary and text files
      sfk hexfind    - find words in binary files, showing hexdump
      sfk run        - run command on all files of a folder
      sfk runloop    - run a command n times in a loop
      sfk printloop  - print some text many times
      sfk strings    - extract strings from a binary file
      sfk sort       - sort text lines produced by another command
      sfk count      - count text lines, filter identical lines
      sfk head       - print first lines of a file
      sfk tail       - print last lines of a file
      sfk linelen    - tell length of string(s)

   search and compare
      sfk find       - find words in binary files, showing text
      sfk md5gento   - create list of md5 checksums over files
      sfk md5check   - verify list of md5 checksums over files
      sfk md5        - calc md5 over a file, compare two files
      sfk pathfind   - search PATH for location of a command
      sfk reflist    - list fuzzy references between files
      sfk deplist    - list fuzzy dependencies between files
      sfk dupfind    - find duplicate files by content

   networking
      sfk httpserv   - run an instant HTTP server.
                       type "sfk httpserv -help" for help.
      sfk ftpserv    - run an instant FTP server
                       type "sfk ftpserv -help" for help.
      sfk ftp        - instant anonymous FTP client
      sfk wget       - download HTTP file from the web
      sfk webrequest - send HTTP request to a server
      sfk tcpdump    - print TCP conversation between programs
      sfk udpdump    - print incoming UDP requests
      sfk udpsend    - send UDP requests
      sfk ip         - tell own machine's IP address(es).
                       type "sfk ip -help" for help.
      sfk netlog     - send text outputs to network,
                       and/or file, and/or terminal

   scripting
      sfk script     - run many sfk commands in a script file
      sfk echo       - print (coloured) text to terminal
      sfk color      - change text color of terminal
      sfk alias      - create command from other commands
      sfk mkcd       - create command to reenter directory
      sfk sleep      - delay execution for milliseconds
      sfk pause      - wait for user input
      sfk label      - define starting point for a script
      sfk tee        - split command output in two streams
      sfk tofile     - save command output to a file
      sfk toterm     - flush command output to terminal
      sfk loop       - repeat execution of a command chain
      sfk cd         - change directory within a script
      sfk getcwd     - print the current working directory
      sfk require    - compare version text

   development
      sfk bin-to-src - convert binary data to source code
      sfk make-random-file - create file with random data
      sfk fuzz       - change file at random, for testing
      sfk sample     - print example code for programming
      sfk inst       - instrument c++ with tracing calls

   diverse
      sfk media      - cut video and binary files
      sfk view       - show results in a GUI tool
      sfk toclip     - copy command output to clipboard
      sfk fromclip   - read text from clipboard
      sfk list       - show directory tree contents
      sfk env        - search environment variables
      sfk version    - show version of a binary file
      sfk ascii      - list ISO 8859-1 ASCII characters
      sfk ascii -dos - list OEM codepage 850 characters
      sfk license    - print the SFK license text

   help by subject
      sfk help select   - how dirs and files are selected in sfk
      sfk help options  - general options reference
      sfk help patterns - wildcards and text patterns within sfk
      sfk help chain    - how to combine (chain) multiple commands
      sfk help shell    - how to optimize the windows command prompt
      sfk help unicode  - about unicode file reading support
      sfk help colors   - how to change result colors
      sfk help xe       - for infos on sfk extended edition.

   All tree walking commands support file selection this way:

   1. short format with ONE directory tree and MANY file name patterns:
      src1dir .cpp .hpp .xml bigbar !footmp
   2. short format with a list of explicite file names:
      letter1.txt revenues9.xls report3\turnover5.ppt
   3. long format with MANY dir trees and file masks PER dir tree:
      -dir src1 src2 !src\save -file foosys .cpp -dir bin5 -file .exe

   For detailed help on file selection, type "sfk help select".

   * and ? wildcards are supported within filenames. "foo" is interpreted
   as "*foo*", so you can leave out * completely to search a part of a name.
   For name start comparison, say "\foo" (finds foo.txt but not anyfoo.txt).

   When you supply a directory name, by default this means "take all files".

      sfk list mydir                lists ALL  files of mydir, no * needed.
      sfk list mydir .cpp .hpp      lists SOME files of mydir, by extension.
      sfk list mydir !.cfg          lists all  files of mydir  EXCEPT .cfg

   general options:
      -tracesel tells in detail which files and/or directories are included
                or excluded, and why (due to which user-supplied mask).
      -nosub    do not process files within subdirectories.
      -nocol    before any command switches off color output.
      -quiet    or -nohead shows less output on some commands.
      -hidden   includes hidden and system files and dirs.
      For detailed help on all options, type "sfk help options".

   beware of Shell Command Characters.
      command parameters containing characters < > | ! & must be sur-
      rounded by quotes "". type "sfk filter" for details and examples.

   type "sfk ask word1 word2 ..."   to search ALL help text for words.
   type "sfk dumphelp"              to print  ALL help text.

EDIT: a word of caution: be careful when running this on folders that have binary files, as it will effectively destroy your files, particularly .git directories. If this is your case, do not run sfk in the entire folder, but select specific file extensions instead (*.rb, *.py, etc). Example: sfk remcr -dir chef -file .rb -file .json -file .erb -file .md

Struggle answered 2/6, 2013 at 19:5 Comment(5)
Works great on OSX Mavericks. No need to install anything, just run the script from the mounted dmg and your terminal appears ready to go.Electroencephalogram
@Gui Ambros You don't need to worry about the files inside .git folder. sfk does not update files inside hidden folders by default.Veto
@bittusarkar: At the time of my answer, sfk effectively processed my entire .git folder and destroyed a bunch of binaries (hence my edit; don't remember if it was Linux or Mac). They may have changed the default behavior in more recent versions, but I'd still recommend to specify the extension, to be safe.Struggle
This worked well for me, after having spent too much time trying to normalize my repos using recommended git commands that simply did not fix all the relevant files.Shuman
Thanks! Just used this to convert a whole bunch of files quickly and painlessly and now I can add them to the staging area in Git. On OSX 10.9.5, and not sure where the files were created.Larrylars
R
25
find . -not \( -name .svn -prune -o -name .git -prune \) -type f -exec perl -pi -e 's/\r\n|\n|\r/\n/g' {} \;

This is much safer as it avoids corrupting your git repo. Add or replace .git, .svn with .bzr, .hg or whatever source control your using to the not list.

Respectively answered 7/11, 2017 at 18:31 Comment(1)
This is the best answer if you didnt have to install anything like dos2unix. Allows exclusion of file types and avoids corrupting source code files.Counterspy
L
10

On OS X, this worked for me:

find ./ -type f -exec perl -pi -e 's/\r\n|\n|\r/\n/g' {} \;

Warning: Please backup your directory before executing this command.

Langille answered 5/3, 2016 at 8:16 Comment(4)
Just want to note that this corrupted my git repository. I tried again by moving out the .git folder before running and moving it back in afterwards with better success.Australasia
I'll also note that this doesn't exclude binary files, so it will e.g. corrupt your jpgs.Henriques
this needs a bigger warning, not really a great solutionNorvin
This is a very, very naive solution.Pita
P
10

The current accepted answer uses find -exec with dos2unix but this is unnecessary nowadays because the vast majority of shells, including Bash, support the use of a wildcard to operate on all files in a directory (known as pathname expansion or globbing). The answers that don't use dos2unix are even worse because they do naive search-and-replaces that will irreversibly corrupt binary files like executables, images and videos, and even the contents of the .git directory.

Both dos2unix and unix2dos are ubiquitous, lightweight tools that have been pre-installed on every UNIX-based system I've ever used, meaning they're almost definitely already installed on your system. They also skip non-text files by default, making them safe to use on entire directories in this way unlike other answers.

To convert all line endings to UNIX line endings (LF)

dos2unix -v *

To convert all line endings to Windows line endings (CRLF)

unix2dos -v * 

The -v/--verbose switch isn't required but will output which files are being converted to the console.

Pita answered 12/4, 2022 at 1:49 Comment(0)
P
4

Here a solution if using sed:

find . -type f -exec sed -i 's/\r$//' {} \;

-i stands for in-place, if you want to create a backup as well use -i.bak

's/\r$//' will replace all carriage returns(\r) at end of each line

Piedadpiedmont answered 28/6, 2020 at 22:25 Comment(3)
When you run it in a git repository, you will probably see sed: cannot rename ./.git/objects/16/sed68Vezl: Permission denied. You must exclude .git folder in that case.Fruitful
Be careful not to run this command on binary files, like .mp4, .jpg or .png, because it will corrupt them.Fruitful
Another naive and irresponsible solution, this shouldn't have upvotes until it's made clear what the risks of running it are.Pita
V
0
 find ./ -type f -name "*.java" -exec perl -pi -e 's/\r\n|\n|\r/\n/g' {} \;

This has worked for me on wsl2 to change all the java files from CRLF to LF

Votive answered 20/7, 2022 at 12:24 Comment(0)
U
0

The same as the others but with node.js

find ./src/test/resources/com/sonalake/bss/tests/bdd/ -type f -exec node -e "require('fs'); const val = fs.readFileSync(process.argv[1], 'utf8'); fs.writeFileSync(process.argv[1], val.replace(/\r\n/g, '\n'))" {} ';'
// load file system module
require('fs');
// read file
const val = fs.readFileSync(process.argv[1], 'utf8');
// replace file contents
fs.writeFileSync(process.argv[1], val.replace(/\r\n/g, '\n'))
Ule answered 6/4, 2023 at 11:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.