how to check end-of-line of a text file to see if it is unix or dos format?
Asked Answered
W

6

5

I need to convert the text file to dos format (ending each line with 0x0d0x0a, rather than 0x0a only), if the file is in unix format (0x0a only at the end of each line).

I know how to convert it (sed 's/$/^M/'), but don't how how to detect the end-of-line character(s) of a file.

I am using ksh.

Any help would be appreciated.

[Update]: Kind of figured it out, and here is my ksh script to do the check.

[qiangxu@host:/my/folder]# cat eol_check.ksh
#!/usr/bin/ksh

if ! head -1 $1 |grep ^M$ >/dev/null 2>&1; then
  echo UNIX
else
  echo DOS
fi

In the above script, ^M should be inserted in vi with Ctrl-V and Ctrl-M.

Want to know if there is any better method.

Windbound answered 6/8, 2013 at 15:31 Comment(0)
D
2
if awk  '/\r$/{exit 0;} 1{exit 1;}' myFile
then
  echo "is DOS"
fi
Dustin answered 7/8, 2013 at 15:28 Comment(0)
D
10

Simply use the file command. If the file contains lines with CR LF at the end, this is printed out by a comment: 'ASCII text, with CRLF line terminators'.

e.g.

if file  myFile | grep "CRLF"  > /dev/null 2>&1;
  then
  ....
fi
Dustin answered 6/8, 2013 at 15:40 Comment(1)
Yet, ksh in my AIX machine only told me test.txt: ascii text, no matter what kind of end-of-line is used in test.txt. It doesn't tell me if CRLF is contained or not.Windbound
G
6

The latest (7.1) version of the dos2unix (and unix2dos) command that installs with Cygwin and some recent Linux distributions has a handy --info option which prints out a count of the different types of newline in each file. This is dos2unix 7.1 (2014-10-06) http://waterlan.home.xs4all.nl/dos2unix.html

From the man page:

--info[=FLAGS] FILE ...
       Display file information. No conversion is done.

The following information is printed, in this order: 
number of DOS line breaks, number of Unix line breaks, number of Mac line breaks, byte order mark, text or binary, file name.

       Example output:
            6       0       0  no_bom    text    dos.txt
            0       6       0  no_bom    text    unix.txt
            0       0       6  no_bom    text    mac.txt
            6       6       6  no_bom    text    mixed.txt
           50       0       0  UTF-16LE  text    utf16le.txt
            0      50       0  no_bom    text    utf8unix.txt
           50       0       0  UTF-8     text    utf8dos.txt
            2     418     219  no_bom    binary  dos2unix.exe

Optionally extra flags can be set to change the output. One or more flags can be added.
       d   Print number of DOS line breaks.
       u   Print number of Unix line breaks.
       m   Print number of Mac line breaks.
       b   Print the byte order mark.
       t   Print if file is text or binary.
       c   Print only the files that would be converted.

With the "c" flag dos2unix will print only the files that contain DOS line breaks, unix2dos will print only file names that have Unix line breaks.

Thus:

if [[ -n $(dos2unix --info=c "${filename}") ]] ; then echo DOS; fi

Conversely:

if [[ -n $(unix2dos --info=c "${filename}") ]] ; then echo UNIX; fi
Guth answered 15/10, 2014 at 9:35 Comment(0)
D
2
if awk  '/\r$/{exit 0;} 1{exit 1;}' myFile
then
  echo "is DOS"
fi
Dustin answered 7/8, 2013 at 15:28 Comment(0)
W
1

I can't test on AIX, but try:

if [[ "$(head -1 filename)" == *$'\r' ]]; then echo DOS; fi
Wyman answered 6/8, 2013 at 15:55 Comment(1)
It doesn't work for me, always saying the file is in UNIX format, while the file is actually in DOS format.Windbound
G
1

You can simply remove any existing carriage returns from all lines, and then add the carriage return to the end of all lines. Then it doesn't matter what format the incoming file is in. The outgoing format will always be DOS format.

sed 's/\r$//;s/$/\r/'
Gopherwood answered 6/8, 2013 at 17:51 Comment(2)
That's a way out. But \r doesn't work. It needs to be replaced by ^M(Ctrl-V and Ctrl-M under vi's insert mode). Still, I don't want to do it all the way. Isn't there a way to check the end-of-line characters of a txt file?Windbound
@QiangXu - I'm not a regular user of sed, and I'm more of a Windows guy, so I'm not sure. But I believe you would need regex look-behind feature, and I don't think sed supports that feature.Gopherwood
L
0

I'm probably late on this one, but I've had the same issue and I did not want to put the special ^M character in my script (I'm worried some editors might not display the special character properly or some later programmer might replace it by 2 normal characters: ^ and M...).

The solution I found feeds the special character to grep, by letting the shell convert its hex value:

if head -1 ${filename} | grep $'[\x0D]' >/dev/null
then
  echo "Win"
else
  echo "Unix"
fi

unfortunately I cannot make the $'[\x0D]' construct work in ksh. In ksh, I found this: if head -1 ${filename} | od -x | grep '0d0a$' >/dev/null then echo "Win" else echo "Unix" fi

od -x displays the text in hex codes. '0d0a$' is the hex code for CR-LF (the DOS-Win line terminator). The Unix line terminator is '0a00$'

Leotaleotard answered 16/10, 2014 at 9:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.