Try file
, then file -k
, then dos2unix -ih
file
will usually be enough. But for tough cases try file -k
or dos2unix -ih
.
Details below.
Try file -k
Short version: file -k somefile.txt
will tell you line terminators:
- It will output
with CRLF line terminators
for DOS/Windows line terminators.
- It will output
with CR line terminators
for MAC line terminators.
- It will just output
text
for Linux/Unix "LF" line terminators. (So if it does not explicitly mention any kind of line terminators
then this means: "LF line terminators".)
And for extra weird cases: When you have mixed line terminators:
$ echo -ne '1\n2\r\n3\r' | file -k -
/dev/stdin: ASCII text, with CRLF, CR, LF line terminators
Long version see below.
Real world example: Certificate Encoding
I sometimes have to check this for PEM certificate files.
The trouble with regular file
is this: Sometimes it's trying to be too smart/too specific.
Let's try a little quiz: I've got some files. And one of these files has different line terminators. Which one?
(By the way: this is what one of my typical "certificate work" directories looks like.)
Let's try regular file
:
$ file -- *
0.example.end.cer: PEM certificate
0.example.end.key: PEM RSA private key
1.example.int.cer: PEM certificate
2.example.root.cer: PEM certificate
example.opensslconfig.ini: ASCII text
example.req: PEM certificate request
Huh. It's not telling me the line terminators. And I already knew that those were cert files. I didn't need "file" to tell me that.
Some network appliances are really, really picky about how their certificate files are encoded. That's why I need to know.
What else can you try?
You might try dos2unix
with the --info
switch like this:
$ dos2unix --info -- *
37 0 0 no_bom text 0.example.end.cer
0 27 0 no_bom text 0.example.end.key
0 28 0 no_bom text 1.example.int.cer
0 25 0 no_bom text 2.example.root.cer
0 35 0 no_bom text example.opensslconfig.ini
0 19 0 no_bom text example.req
So that tells you that: yup, "0.example.end.cer" must be the odd man out. But what kind of line terminators are there? Do you know the dos2unix output format by heart? (I don't.)
But fortunately there's the --keep-going
(or -k
for short) option in file
:
$ file --keep-going -- *
0.example.end.cer: PEM certificate\012- , ASCII text, with CRLF line terminators\012- data
0.example.end.key: PEM RSA private key\012- , ASCII text\012- data
1.example.int.cer: PEM certificate\012- , ASCII text\012- data
2.example.root.cer: PEM certificate\012- , ASCII text\012- data
example.opensslconfig.ini: ASCII text\012- data
example.req: PEM certificate request\012- , ASCII text\012- data
Excellent! Now we know that our odd file has DOS (CRLF
) line terminators. (And the other files have Unix (LF
) line terminators. This is not explicit in this output. It's implicit. It's just the way file
expects a "regular" text file to be.)
(If you wanna share my mnemonic: "L" is for "Linux" and for "LF".)
Now let's convert the culprit and try again:
$ dos2unix -- 0.example.end.cer
$ file --keep-going -- *
0.example.end.cer: PEM certificate\012- , ASCII text\012- data
0.example.end.key: PEM RSA private key\012- , ASCII text\012- data
1.example.int.cer: PEM certificate\012- , ASCII text\012- data
2.example.root.cer: PEM certificate\012- , ASCII text\012- data
example.opensslconfig.ini: ASCII text\012- data
example.req: PEM certificate request\012- , ASCII text\012- data
Good. Now all certs have Unix line terminators.
Try dos2unix -ih
I didn't know this when I was writing the example above but:
Actually it turns out that dos2unix will give you a header line if you use -ih
(short for --info=h
) like so:
$ dos2unix -ih -- *
DOS UNIX MAC BOM TXTBIN FILE
0 37 0 no_bom text 0.example.end.cer
0 27 0 no_bom text 0.example.end.key
0 28 0 no_bom text 1.example.int.cer
0 25 0 no_bom text 2.example.root.cer
0 35 0 no_bom text example.opensslconfig.ini
0 19 0 no_bom text example.req
And another "actually" moment: The header format is really easy to remember: Here's two mnemonics:
- It's DUMB (left to right: d for Dos, u for Unix, m for Mac, b for BOM).
- And also: "DUM" is just the alphabetical ordering of D, U and M.
Further reading
man less
. – Thiele