How to find/fix files with MIXED line endings (0x0d 0x0d 0x0a)
Asked Answered
L

2

7

I know I can "probably" fix them by using "flip -u" (cygwin flip) which basically removes one of the 0xd's leaving the file with DOS style line endings (0x0d 0x0a) (of course, technically speaking this might be considered a bug!).

But the other side of it is that i'd like to do this selectively, ensuring that what I'm fixing really is a "non-binary" file and EXPLICITLY replacing the 0x0d 0x0d 0x0a sequence with 0x0d 0x0a... not running a buggy program that appears to do what I want (and possibly more).

Note that grep -P '\x0d\x0d\x0a' and grep -P '\x0d\x0d' do not find these lines.

Although people say that grep -P 'x0d\x0a' is properly finding line endings, I'd have to surmise that something else is going on since it can't match the other patterns in a file with mixed line endings (0x0d 0x0d 0x0a).

Luzluzader answered 22/9, 2010 at 21:7 Comment(1)
grep -IUPrl "\x0d\x0d$"Abercrombie
A
5

Here's an easy way to identify the files that contain mixed line endings:

cat -A $FILE | grep '\^M\^M\$'

The -A implies -v and -E which includes line endings and other hidden characters. For example, let's create a testfile. I'll use the actual text to represent fairly closely with the line endings you'll see:

$ od -x test1.txt 
0000000 6464 2061 0d20 0a0d 6464 6161 2020 0d0d
0000020 0a0a 6164 2020 0a0d
0000030

Now let's see what cat gives us:

$ cat -vE test1.txt
dda  ^M^M$
ddaa  ^M^M$
$
da  ^M$

cat is indeed showing us the CRs and LFs (though the LFs don't show up on the same line -- and justifiably so), so now we can find them:

find /path -yourPredicatesOfInterest -print | while read fn ; do
    cat -A $fn | grep '\^M\^M\$' > /dev/null 2>&1 && echo "$fn contains multiple CR CR LFs"
done
Affricate answered 22/9, 2010 at 21:42 Comment(7)
Thanks, I had high hopes but... even though the file has 0x0d 0x0d 0x0a line endings the cat shows ^M$ at the end of the lines, not ^M^M$ (Windows/cygwin). Basically the 0x0d 0x0d 0x0a mix seems not to work for most of the patterns (sed,perl,grep, etc) that have been posted on this topic!-(Luzluzader
Hmmm, the result of the non-mixed (DOS) file cat -A is simply $... again, now I need to know whether this is as expected or yet another variant of the problems I'm seeing with the other programs (eg although \x0d\x0a works to egrep to find the lines, for some reason \x0d\x0d will not find any lines, so it's as if the programs are treating these characters as a special case, not a literal search of all characters in the file. Like I said, I want to know that I am literally matching 0x0d0x0d0x0a before making any changes.Luzluzader
Guess it's time to write my own program!-PLuzluzader
Sorry I misinterpreted your response based on platform... $ is actually because of the cat "-E" option, which means show line endings... so on DOS that means the 0x0d0x0a combo is the $. The ^M is the "extra" 0x0d. So this is a usable approach for my case, but the grep pattern is just '\^M\$'.Luzluzader
Unfortunately this doesn't translate well to the "find files" part of the question. Eg how to use the cat+grep combo in a find type context to list all the files in a tree with the bad line endings.Luzluzader
In the end, I used gnuwin32 file.exe sourceforge.net/project/shownotes.php?release_id=662480to determine type (note it nicely detects/reports the different line endings I am dealing with but doesn't fix!-P), then a modified flip.exe (since it doesn't properly deal with "combined DOS/MAC" which is what I have.Luzluzader
Searching for ^M in cat -A output risks finding those characters in the text file itself, so you may get false-positives. See also this thread: #74333Wholehearted
I
0

You can try bbe (http://bbe-.sourceforge.net/):

bbe -e 's/\x0d\x0d\x0a/\x0a/'

which will replace the line ending by unix line endings; or:

bbe -e 's/\x0d\x0d\x0a/\x0d\x0a/'

which will replace them by DOS line endings.

Infernal answered 23/9, 2010 at 19:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.