How to clean a codebase, trailing whitespace, new lines etc
Asked Answered
I

2

8

I have a code base that is driving me nuts with conflicts due to trailing whitespace. I'd like to clean it up.

I'd want to:

  • Remove all trailing whitespace
  • Remove any newline characters at the end of files
  • Convert all line endings to unix (dos2unix)
  • Convert all leading spaces to tabs, ie 4 spaces to tabs.

  • While ignoring the .git directory.

I'm on OSX Snow Leopard, and in zsh.

so far, i have:

sed -i "" 's/[ \t]*$//' **/*(.)

which works great, but sed adds a new line to the end of every file it touches, which is no good. I dont think sed can be stopped from doing this, so how can i remove these new lines? Theres probably some awk magic to be applied here.

(Complete answers also welcome)

Ingurgitate answered 16/2, 2011 at 1:35 Comment(0)
T
6

[EDIT: Fixed whitespace trimming]
[EDIT #2: Strip trailing blank lines from end of file]

perl -i.bak -pe 'if (defined $x && /\S/) { print $x; $x = ""; } $x .= "\n" x chomp; s/\s*?$//; 1 while s/^(\t*)    /$1\t/; if (eof) { print "\n"; $x = ""; }' **/*(.)

This strips trailing blank lines from the file, but leaves exactly one \n at the end of the file. Most tools expect this, and it will not show up as a blank line in most editors. However if you do want to strip that very last \n, just delete the print "\n"; part from the command.

The command works by "saving up" \n characters until a line containing a non-blank character is seen -- then it prints them all before processing that line.

Remove .bak to avoid creating backups of the original files (use at your own risk!)

\s*? matches zero or more whitespace characters non-greedily, including \r, which is the first character of the \r\n DOS line-break syntax. In Perl, $ matches either at the end of the line, or immediately before a final \n, so combined with the fact that *? matches non-greedily (trying a 0-width match first, then a 1-width match and so on) it does the right thing.

1 while s/^(\t*) /$1\t/ is just a loop that repeatedly replaces any lines beginning with any number of tabs followed by 4 spaces with one more tab than there was, until this is no longer possible. So it will work even if some lines have been partially converted to tabs already, provided all \t characters start at a column divisible by 4.

I haven't seen the **/*(.) syntax before, presumably that's a zsh extension? If it worked with sed, it will work with perl.

Tomfool answered 16/2, 2011 at 1:56 Comment(9)
**/*(.) is a zsh glob. its just another way of performing an action on many files. so that perl statement would be executed once for each file it found.Ingurgitate
Also, this doesn't work? It removes all whitespace, new lines, everything.Ingurgitate
Whoops! Fixed it now and tested it.Tomfool
That works pretty admirably now. About the only thing is removing newline characters from the ends of files. any chance of adding that in? Going to mark this as the answer either way.Ingurgitate
Thanks, I've now added trailing-blank-line stripping. Note that you probably do want to keep a single \n at the very end, but see the post for how to get rid of that too if you want. I think I've finally covered all the bases... :)Tomfool
I'm getting "syntax error near unexpected token `('" with this. Any ideas?Uranie
@danherd: I'm guessing you're running on Windows? If so, you'll need to change each ' to ", and each "..." to e.g. q{...}.Tomfool
I got the error from Mac terminal, -bash: syntax error near unexpected token ('`Sacci
@SazzadHissainKhan: Based on the question, it seems that Mac OS used to use zsh but has now switched to bash, and the **/*(.) syntax was a zsh extension.Tomfool
S
0

From Mac:

find . -iname '*.swift' -type f -exec sed -i '' 's/[[:space:]]\{1,\}$//' {} \+

This will remove all trailing spaces from all swift files from current directory recursively. You can change file types as you need.

Sacci answered 19/11, 2019 at 12:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.