How to find out line-endings in a text file?
Asked Answered
M

12

474

I'm trying to use something in bash to show me the line endings in a file printed rather than interpreted. The file is a dump from SSIS/SQL Server being read in by a Linux machine for processing.

  • Are there any switches within vi, less, more, etc?

  • In addition to seeing the line-endings, I need to know what type of line end it is (CRLF or LF). How do I find that out?

Mirisola answered 25/8, 2010 at 20:36 Comment(1)
General tip: If you have an idea of which *nix/cygwin command you might use, you can always view its manpage to search for switches that might give you the functionality you need. E.g., man less.Thiele
R
590

You can use the file utility to give you an indication of the type of line endings.

Unix:

$ file testfile1.txt
testfile.txt: ASCII text

"DOS":

$ file testfile2.txt
testfile2.txt: ASCII text, with CRLF line terminators

To convert from "DOS" to Unix:

$ dos2unix testfile2.txt

To convert from Unix to "DOS":

$ unix2dos testfile1.txt

Converting an already converted file has no effect so it's safe to run blindly (i.e. without testing the format first) although the usual disclaimers apply, as always.

Rimmer answered 25/8, 2010 at 22:0 Comment(8)
These are now sometimes named "fromdos" and "todos", respectively (as is the case in Ubuntu 10.4+)Typecast
@JessChadwick: Yes, but only if you explicitly install the tofrodos package with sudo apt-get install tofrodos - just as you'd have to run sudo apt-get install dos2unix to get dos2unix and unix2dos.Levity
Actully dos2unix can't do all the work, I think https://mcmap.net/q/81163/-dos2unix-doesn-39-t-convert-m gives best answerForebrain
@nathan: What does dos2unix fail at? The OP at that question only vaguely describes the issue.Rimmer
@DennisWilliamson file command before and after dos2unix command got same output: xxx.c C source, ASCII text, with CR, LF line terminators. I found this c file has ^M in the middle of line which likes xxxxxxx ^M xxxxxxxForebrain
Just to chime in, on debian jessie you may need to install the file package to have file command in the first place.Brigand
This won't work for files like ssh keys, which are nonetheless ASCII (use file -bi to verify that) since file will name them aptly, and won't show information about line endings; also, the OP asked for line endings to be printed, not just what type they are (LF or CRLF), so the answer by @Alex Shelemin is more appropriate.Galan
This won't work for json files: file testfile1.json. Output: JSON data. In that case I had to use: file testfile1.json --exclude json. I think file -k is the correct command.Wallen
S
317

Ubuntu 14.04:

simple cat -e <filename> works just fine.

This displays Unix line endings (\n or LF) as $ and Windows line endings (\r\n or CRLF) as ^M$.

Stucco answered 20/12, 2015 at 0:49 Comment(8)
Also works on OSX. Good solution. Simple and worked for me while the accepted answer did not. (Note: was not a .txt file)Falcone
is the display of M$ an easteregg/windows bashing?Vivanvivarium
Does not work with Solaris, but man says tthat it should have workedSixteenmo
@TomM no. The caret in ^M$ inverts this into an easter egg for Microsoft cultists.Belaud
I find that I have to use cat -vE <filename> to see the \r characters (displayed as ^M) and the \n characters (displayed as a $). This is using GNU cat on Linux.Hiphuggers
cat -e to show line endings works on debian 10 buster with wslQuartersaw
This actually works but I prefer the file testfile.txt answer that does not write all the file, easier to check on several files, for instance I had a script among others that has DOS/Windows ending and was not executed properly, I then simply checked all others to see if they have issue with file $(find . -name *.sh).Harilda
This is very usefull , I had a file with CRLF (copied from windows to a Docker container unix) and unix did not recognise it The error was "no such file or directory"Butler
N
147

In vi...

:set list to see line-endings.

:set nolist to go back to normal.

While I don't think you can see \n or \r\n in vi, you can see which type of file it is (UNIX, DOS, etc.) to infer which line endings it has...

:set ff

Alternatively, from bash you can use od -t c <filename> or just od -c <filename> to display the returns.

Naif answered 25/8, 2010 at 20:42 Comment(10)
Thank you - this has indeed worked - now I'm trying to tell if it's a \n or \r\n is there an additional switch for that in Vi?Mirisola
Unfortunately, I don't think vi can show those specific characters. You can try od -c <filename> which I believe will display \n or \r\n.Naif
In the "for what it's worth" category you can grep for Dos style CRLF by issuing grep --regex="^M" where ^M is CTRL+V CTRL+M. You can remove those by replacing those with a sed command. This does essentially the same thing as dos2unixOrthoepy
In vim: :set fileformat will report which of unix or dos vim thinks the file's line endings are in. You can change it by :set fileformat=unix.Dubonnet
Use the -b flag when starting vi/vim and then use :set list to see CR (^M) and LF ($) endings.Prepossess
@RyanBerger - Looks like you're missing a -t. It should be od -t c file/path, but thanks for the new program. Worked great!Aggravation
@RyanBerger you should edit the answer to include "od -c". First thing I've come across that just simply showed the returns. Thank you.Switzerland
The utility command od should be the correct answer to this ticket.Sight
Note that the fileformat inferred by Vim (and reported by :set fileformat) prefers towards reporting unix i.e. it reports dos only if every line in the file has CRLF as its line ending, otherwise it reports it as a unix format file. If you think the file might be one of those mixed line-ending abominations, vim can't tell you which lines have \n and which have \r\n.Glaikit
To add from @Samuel's comment, using :set list might be sufficient to find out line endings. See :help binary, option -b only changes a few options.Flare
D
118

In the bash shell, try cat -v <filename>. This should display carriage-returns for windows files.

(This worked for me in rxvt via Cygwin on Windows XP).

Editor's note: cat -v visualizes \r (CR) chars. as ^M. Thus, line-ending \r\n sequences will display as ^M at the end of each output line. cat -e will additionally visualize \n, namely as $. (cat -et will additionally visualize tab chars. as ^I.)

Dutyfree answered 25/8, 2010 at 21:1 Comment(4)
@ChrisK: Try echo -e 'abc\ndef\r\n' | cat -v and you should see a ^M after the "def".Rimmer
I wanted to see if the file has ^M(Windows/DOS EOL) and only cat -v showed me that. +1 for thatPerky
^M = DOS/Windows styleHeterochromatin
correction: Thus, line-ending \r\n sequences will display as ^M$Montes
O
51

Try file, then file -k, then dos2unix -ih

file will usually be enough. But for tough cases try file -k or dos2unix -ih.

Details below.


Try file -k

Short version: file -k somefile.txt will tell you line terminators:

  • It will output with CRLF line terminators for DOS/Windows line terminators.
  • It will output with CR line terminators for MAC line terminators.
  • It will just output text for Linux/Unix "LF" line terminators. (So if it does not explicitly mention any kind of line terminators then this means: "LF line terminators".)

And for extra weird cases: When you have mixed line terminators:

  • $ echo -ne '1\n2\r\n3\r' | file -k -
    /dev/stdin: ASCII text, with CRLF, CR, LF line terminators

Long version see below.


Real world example: Certificate Encoding

I sometimes have to check this for PEM certificate files.

The trouble with regular file is this: Sometimes it's trying to be too smart/too specific.

Let's try a little quiz: I've got some files. And one of these files has different line terminators. Which one?

(By the way: this is what one of my typical "certificate work" directories looks like.)

Let's try regular file:

$ file -- *
0.example.end.cer:         PEM certificate
0.example.end.key:         PEM RSA private key
1.example.int.cer:         PEM certificate
2.example.root.cer:        PEM certificate
example.opensslconfig.ini: ASCII text
example.req:               PEM certificate request

Huh. It's not telling me the line terminators. And I already knew that those were cert files. I didn't need "file" to tell me that.

Some network appliances are really, really picky about how their certificate files are encoded. That's why I need to know.

What else can you try?

You might try dos2unix with the --info switch like this:

$ dos2unix --info -- *
  37       0       0  no_bom    text    0.example.end.cer
   0      27       0  no_bom    text    0.example.end.key
   0      28       0  no_bom    text    1.example.int.cer
   0      25       0  no_bom    text    2.example.root.cer
   0      35       0  no_bom    text    example.opensslconfig.ini
   0      19       0  no_bom    text    example.req

So that tells you that: yup, "0.example.end.cer" must be the odd man out. But what kind of line terminators are there? Do you know the dos2unix output format by heart? (I don't.)

But fortunately there's the --keep-going (or -k for short) option in file:

$ file --keep-going -- *
0.example.end.cer:         PEM certificate\012- , ASCII text, with CRLF line terminators\012- data
0.example.end.key:         PEM RSA private key\012- , ASCII text\012- data
1.example.int.cer:         PEM certificate\012- , ASCII text\012- data
2.example.root.cer:        PEM certificate\012- , ASCII text\012- data
example.opensslconfig.ini: ASCII text\012- data
example.req:               PEM certificate request\012- , ASCII text\012- data

Excellent! Now we know that our odd file has DOS (CRLF) line terminators. (And the other files have Unix (LF) line terminators. This is not explicit in this output. It's implicit. It's just the way file expects a "regular" text file to be.)

(If you wanna share my mnemonic: "L" is for "Linux" and for "LF".)

Now let's convert the culprit and try again:

$ dos2unix -- 0.example.end.cer

$ file --keep-going -- *
0.example.end.cer:         PEM certificate\012- , ASCII text\012- data
0.example.end.key:         PEM RSA private key\012- , ASCII text\012- data
1.example.int.cer:         PEM certificate\012- , ASCII text\012- data
2.example.root.cer:        PEM certificate\012- , ASCII text\012- data
example.opensslconfig.ini: ASCII text\012- data
example.req:               PEM certificate request\012- , ASCII text\012- data  

Good. Now all certs have Unix line terminators.

Try dos2unix -ih

I didn't know this when I was writing the example above but:

Actually it turns out that dos2unix will give you a header line if you use -ih (short for --info=h) like so:

$ dos2unix -ih -- *
 DOS    UNIX     MAC  BOM       TXTBIN  FILE
   0      37       0  no_bom    text    0.example.end.cer
   0      27       0  no_bom    text    0.example.end.key
   0      28       0  no_bom    text    1.example.int.cer
   0      25       0  no_bom    text    2.example.root.cer
   0      35       0  no_bom    text    example.opensslconfig.ini
   0      19       0  no_bom    text    example.req

And another "actually" moment: The header format is really easy to remember: Here's two mnemonics:

  1. It's DUMB (left to right: d for Dos, u for Unix, m for Mac, b for BOM).
  2. And also: "DUM" is just the alphabetical ordering of D, U and M.

Further reading

Osullivan answered 22/11, 2017 at 13:19 Comment(5)
It generates output like: Accounts.java: Java source, ASCII text\012- on Windows in MinTTYGovea
@standalone: interesting. I've read weird things about an option called "igncr" -- and what you're saying sounds like that. But can't reproduce what you describe. (I tried inside the Bash inside mintty that comes with Git-for-Windows, "git version 2.24.0.windows.1".)Osullivan
Hm, I tried file -k Accounts.java inside the mintty that comes with git-for-windows too, but my version is git version 2.21.0.windows.1Govea
Working solution for me is cat -e file_to_testGovea
Thank you for file -k, that's exactly what I've been looking for. The help description of "don't stop at the first match" didn't click for me, but makes sense.Dialyse
D
22

To show CR as ^M in less use less -u or type -u once less is open.

man less says:

-u or --underline-special

      Causes backspaces and carriage returns to be treated  as  print-
      able  characters;  that  is,  they are sent to the terminal when
      they appear in the input.
Divergent answered 27/7, 2015 at 15:3 Comment(1)
Please clarify your answer.Kenelm
A
8

You can use xxd to show a hex dump of the file, and hunt through for "0d0a" or "0a" chars.

You can use cat -v <filename> as @warriorpostman suggests.

Anchoveta answered 10/9, 2013 at 16:50 Comment(2)
It works for me with cat v 8.23. Unix line endings will not print any extra info, but DOS line endings will print a "^M".Anchoveta
That must be what I'm running into with 8.21, given the fact that I'm using unix line endings.Retaretable
H
6

You may use the command todos filename to convert to DOS endings, and fromdos filename to convert to UNIX line endings. To install the package on Ubuntu, type sudo apt-get install tofrodos.

Humidity answered 28/10, 2012 at 22:13 Comment(0)
E
5

You can use vim -b filename to edit a file in binary mode, which will show ^M characters for carriage return and a new line is indicative of LF being present, indicating Windows CRLF line endings. By LF I mean \n and by CR I mean \r. Note that when you use the -b option the file will always be edited in UNIX mode by default as indicated by [unix] in the status line, meaning that if you add new lines they will end with LF, not CRLF. If you use normal vim without -b on a file with CRLF line endings, you should see [dos] shown in the status line and inserted lines will have CRLF as end of line. The vim documentation for fileformats setting explains the complexities.

Also, I don't have enough points to comment on the Notepad++ answer, but if you use Notepad++ on Windows, use the View / Show Symbol / Show End of Line menu to display CR and LF. In this case LF is shown whereas for vim the LF is indicated by a new line.

Elbowroom answered 15/9, 2017 at 6:1 Comment(0)
I
1

I dump my output to a text file. I then open it in notepad ++ then click the show all characters button. Not very elegant but it works.

Indomitability answered 13/10, 2015 at 18:56 Comment(1)
This question is tagged as Linux and I don't think notepad++ is for linux. This should work for windows though.Morley
Q
0

Vim - always show Windows newlines as ^M

If you prefer to always see the Windows newlines in vim render as ^M, you can add this line to your .vimrc:

set ffs=unix

This will make vim interpret every file you open as a unix file. Since unix files have \n as the newline character, a windows file with a newline character of \r\n will still render properly (thanks to the \n) but will have ^M at the end of the file (which is how vim renders the \r character).


Vim - sometimes show Windows newlines

If you'd prefer just to set it on a per-file basis, you can use :e ++ff=unix when editing a given file.


Vim - always show filetype (unix vs dos)

If you want the bottom line of vim to always display what filetype you're editing (and you didn't force set the filetype to unix) you can add to your statusline with
set statusline+=\ %{&fileencoding?&fileencoding:&encoding}.

My full statusline is provided below. Just add it to your .vimrc.

" Make statusline stay, otherwise alerts will hide it
set laststatus=2
set statusline=
set statusline+=%#PmenuSel#
set statusline+=%#LineNr#
" This says 'show filename and parent dir'
set statusline+=%{expand('%:p:h:t')}/%t
" This says 'show filename as would be read from the cwd'
" set statusline+=\ %f
set statusline+=%m\
set statusline+=%=
set statusline+=%#CursorColumn#
set statusline+=\ %y
set statusline+=\ %{&fileencoding?&fileencoding:&encoding}
set statusline+=\[%{&fileformat}\]
set statusline+=\ %p%%
set statusline+=\ %l:%c
set statusline+=\ 

It'll render like

.vim/vimrc\                                    [vim] utf-8[unix] 77% 315:6

at the bottom of your file


Vim - sometimes show filetype (unix vs dos)

If you just want to see what type of file you have, you can use :set fileformat (this will not work if you've force set the filetype). It will return unix for unix files and dos for Windows.

Quadrangle answered 19/11, 2019 at 19:40 Comment(0)
N
0

More portable, maybe even POSIX.

Given the example above

$ printf "abc\ndef\r\n"
abc
def

Use sed

$ printf "abc\ndef\r\n" | sed -n l
abc$
def\r$

Use od

$ printf "abc\ndef\r\n" | od -c  ## optional "-t a"
0000000   a   b   c  \n   d   e   f  \r  \n
0000011
Niemi answered 26/3, 2023 at 21:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.