Why does Windows use CR LF?
Asked Answered
H

5

108

I understand the difference between the two so there's no need to go into that, but I'm just wondering what the reasoning is behind why Windows uses both CR and LF to indicate a line break. It seems like the Linux method (just using LF) makes a lot more sense, saves space, and is easier to parse.

Haunch answered 29/6, 2011 at 13:47 Comment(7)
Newline#HistoryGravettian
It may be worth noting that CRLF on Windows is mostly just a convention/default. Most programs support either (though you might have to mess with the settings). I personally almost never use CRLF, opting instead for the UNIX-style LF; only a handful of programs still have problems with files that just use LF.Bonham
CR+LF is the correct way to do it (it is the standard), so the question isn't why Windows does it correctly but why Mac and Unix/Linux do it incorrectly. Standalone LF's legacy is laziness and taking a shortcut. I always CR+LF, except for certain Linux things that gawk at CR+LF so I change to LF mode for that. IMO, misinterpreting CR+LF is a lot worse than misinterpreting a standalone LF.Gustav
That Newline#History article seems to suggest that CR+LF is the standard according to ASA. The ISO standard seems to support both LF and CR+LF. So I guess life is more nuanced @Gustav :)Disappear
@Disappear All the standards are CR+LF pretty much. See https://mcmap.net/q/11428/-why-does-windows-use-cr-lf/… - LF was a bodged shortcut that was never officially a standard. The fact remains that it doesn't play well with CR+LF. Thus, all these years later, it would be correct to blame *nix for newline miseries for using LF, not Windows. People tend to think Windows is at fault, simply because Windows can tolerate varying line endings better than Unix can.Gustav
@Gustav Frankly, I think this is a practicality vs. purity question, and I tend to defer to The Zen of Python whenever relevant: "practicality beats purity". Do we ever still use CR without LF (obviously the opposite is true, but apparently for the "wrong" reason), and is it interpreted by any programs as "go to the beginning of the line but don't go down"?Being
@TwistedCode Indeed, I do use CR without LF in some of my own programs. It's useful to go back to the beginning of the line without going to the next one. They usually go well together, but each can be used on its own. CR on its own is more useful than LF on its own thoughGustav
E
123

Historically when using dot-matrix printers teletypes CR would return the carriage to the first position of the line while LF would feed to the next line. Using CR+LF in the file themselves made it possible to send a file directly to the printer, without any kind of printer driver.

Thanks @zaph pointing out it was teletypes and not dot matrix printers

Eiland answered 29/6, 2011 at 13:50 Comment(6)
Very common annoyance for a very little benefit.Orta
@Anders Actually it was teletypes that was the reason, CR returned the print head to the left and LF advanced the paper. Teletypes preceded dot-matrix printers.Marroquin
@Marroquin This is why I love Stack Overflow. 2 years later and I get a correction and learnt something new.Eiland
As Windows followed Unix by so many years it's puzzling that they didn't follow the Unix model of just LF.Almena
@Almena just as puzzling why Unix didn't follow DEC or ASA (American Standards Association) which predated Unix. DEC used CR/LF I believe. The IBM/360 I used at college also used CRLF but EBCDIC apparently didnt Also, check out RFC 0821 (SMTP), RFC 1939 (POP), RFC 2060 (IMAP), or RFC 2616 (HTTP). They use CR/LF.Tengler
Typically dot-matrix printers had a dip switch to control the LF behaviour. Likewise glass terminal behaviour (e.g. vt100) can be controlled using ANSI codes, and different terminals have different defaults.Teferi
B
41

@sshannin posted an URL from Raymond Chen's blog, but it doesn't work anymore. The blog has changed its internal software, so the URLs changed.

After crawling through the old posts in the new blog I've found it here.

Quote from the blog:

Why is the line terminator CR+LF?

This protocol dates back to the days of teletypewriters. CR stands for “carriage return” – the CR control character returned the print head (“carriage”) to column 0 without advancing the paper. LF stands for “linefeed” – the LF control character advanced the paper one line without moving the print head. So if you wanted to return the print head to column zero (ready to print the next line) and advance the paper (so it prints on fresh paper), you need both CR and LF.

If you go to the various internet protocol documents, such as RFC 0821 (SMTP), RFC 1939 (POP), RFC 2060 (IMAP), or RFC 2616 (HTTP), you’ll see that they all specify CR+LF as the line termination sequence. So the the real question is not “Why do CP/M, MS-DOS, and Win32 use CR+LF as the line terminator?” but rather “Why did other people choose to differ from these standards documents and use some other line terminator?”

Unix adopted plain LF as the line termination sequence. If you look at the stty options, you’ll see that the onlcr option specifies whether a LF should be changed into CR+LF. If you get this setting wrong, you get stairstep text, where

each
    line
        begins 

where the previous line left off. So even unix, when left in raw mode, requires CR+LF to terminate lines. The implicit CR before LF is a unix invention, probably as an economy, since it saves one byte per line.

The unix ancestry of the C language carried this convention into the C language standard, which requires only “\n” (which encodes LF) to terminate lines, putting the burden on the runtime libraries to convert raw file data into logical lines.

The C language also introduced the term “newline” to express the concept of “generic line terminator”. I’m told that the ASCII committee changed the name of character 0x0A to “newline” around 1996, so the confusion level has been raised even higher.

Here’s another discussion of the subject, from a unix perspective

I've changed this second link to a snapshot in The Wayback Machine, since the actual page is not available anymore.

I hope this answers your question.

Brookbrooke answered 14/1, 2016 at 19:2 Comment(13)
Since you are not really answering the question, just correcting a link, that has become stale, in a comment , this should really be a comment. Anyway, thanks for the correct link. Please add it as a comment, this answer may be deleted.Tangible
OK, I've added here the text from the blog, so if the link goes bad again the text is still available here. I think this should be kept as an answer, not just a comment, since this information actually answers the question originally asked.Brookbrooke
This answer is more detailed than excepted one and answers not only question asked but guessed reason for the question, IMHO it's better.Postbox
It is very dumb to suggest that SMTP, POP, IMAP and HTTP somehow define the standard for what '\n' means!!! Those define how one should communicate using those very OLD protocols. All those protocols made the same choice, probably based on the first and older choice. I don't think *nixes use CR or LF. They use "new line". Machines were very low level and needed you to tell them to LF and CR. It is really pointless to keep using it just because when my browser communicates with Apache it does use CRLF.Lemaster
Unix does not use CRLF. The terminal protocol, which is not a "txt" file protocol, allows you to move down without moving left. It is, again, very dumb to suggest that a text file should follow a protocol for controlling a terminal device. Back in the old days, text files had "c^H," (c backspace comma) instead of "ç". I am very happy we have evolved.Lemaster
@AndréCaldas: Raymond Chen. in his article, written way back in 2004, just talked about those old (though still current even today) protocols as an example of usage of CR+LF line terminators (maybe as a "de-facto standard"?), but he didn't suggest that those protocol documents actually defined "the standard for what '\n' means". And he also talked about Unix in "raw mode", not just any *nixes. Finally, he didn't say that a text file "should" follow one protocol or the other: It's just an explanation on how line terminators came to be, also confirmed by the Unix discussion linked at the end.Brookbrooke
@OMA: No, it is not. He suggests that CRLF is the "right" choice because of the TERMINAL protocol and because of the SMTP, POP, HTTP, etc protocols. Back in 2004 is not such a long time ago. SMTP, for instance, backs to 1981. The UTF-8, for example, was created on the 90's. He DOES suggest that CRLF is the "right" choice and argue that people should instead take the mac and *nix decisions as incoherent. It is just DISINFORMATION as always. You are wrong that he is talking about "unix raw mode". He is talking about the TERMINAL protocol. Unix is much more then the terminal protocol.Lemaster
In windows, when you read a file on "text mode", the OS converts all CRLF to '\n'. This is quite unreasonable that you have a "text mode" and a "binary mode". In windows, you use '\n', but the files have CRLF. Because of this aberration.Lemaster
@OMA: If you want to talk about *nix processing text, you should not talk about the TERMINAL program. It makes more sense to talk about AWK, SED, etc.Lemaster
@AndréCaldas when Multics and therefore Unix moved to a single character newline translated to CRLF by the IO subsystem you think that's good; when Microsoft extended ANSI fopen to have a 'text mode' so the IO subsystem translates CRLF to a single character newline ... you think that's bad. An optional one you can ignore and use just LF. With a background of years of MS DOS and CP/M software and files using CRLF to keep compatibility with. Windows is more than how programs store text. Also s/DISINFORMATION/one author's OPINION/.Trafficator
@TessellatingHeckler: LOL... Yes, it is a very good idea to put some useless characters and get the file system reading standard library to filter them in and out magically. Very intelligent, indeed. I am very happy we have evolved...Lemaster
@AndréCaldas there are decades of files which already had those characters in them, and software which read and wrote those characters. Breaking that for some ideal would be a much worse idea.Trafficator
@TessellatingHeckler: Yes... for 30 years old files, you keep producing garbage that is MAGICALLY erased and written back. That is a very silly idea. So, if you consider CRLF a problem, as you are stating, you are just perpetuating this problem. What one has to do is very simple: STOP producing those files. Deal with the problematic ones, as we have been doing. You talk about the "ideal" world like if mocking me would give you any authority of "wise person". It is silly to advocate for CRLF and at the same time open it as non binary and have LF magically removed. Do you want it or not???Lemaster
A
22

It comes from the teletype machines (and typewriters) from the days of yore.

It used to be that when you were done typing a line, you had to move the typewriter's carriage (which held the paper and slid to the left as you typed) back to the start of the line (CR). You then had to advance the paper down a line (LF) to move to the next line.

There are cases you might not have wanted to linefeed when returning the carriage, such as if you were going to strikethrough a character with a dash (you'd just overwrite it).

But basically, it boils down to convention. DOS used the full CR/LF convention, and UNIX shortened it a bit. Now we're stuck!

Advised answered 29/6, 2011 at 13:52 Comment(0)
F
2

From Wikipedia:

The sequence CR+LF was in common use on many early computer systems that had adopted teletype machines, typically an ASR33, as a console device, because this sequence was required to position those printers at the start of a new line.

Falco answered 29/6, 2011 at 13:51 Comment(0)
K
0

I have seen more than one account to the effect that the reason to send two characters (and sometimes more) instead of one was in order to better match the data transfer rate to the physical printing rate (this was a long time ago). Moving the print-head took longer than printing a single character and sending extra characters was a way of preventing the data transfer from getting ahead of the printing device. So the reason we have multiple characters for end-of-line in Windows is basically the same as the reason we have QWERTY keyboards -- it was intended to slow things down.

Obviously the reason this practice continues in Windows to this day is based on some notion of ongoing backwards compatibility, and ultimately, just simple inertia.

Of note however, this convention is not strictly enforced by Windows at the operating system level. Any Windows application is free to ignore the convention, depending on what other applications it is trying to be compatible with.

Interestingly, the Wikipedia article about "Newline", claims that Windows 8 may introduce a change to using only LF.

Knife answered 9/1, 2012 at 20:2 Comment(10)
"Intended to slow things down" - citation needed.Mears
Actually, the entire first paragraph - citation needed.Mears
Here's one citation regarding the timing rationale. See "the print head could not return from the far right to the beginning of the next line in one-character time". The Wikipedia article also includes a citation (involving a reference book for the Vim text editor), although it isn't clear how authoritative that source is.Knife
Here's a closely related Jeff Atwood article that references the same Wikipedia content: The Great Newline Schism. There's lots of intelligent user comments there as well -- including some substantiation of my point that this is not an operating-system-level concern and that a majority of Windows apps will work just fine with LF-only text files. There is also the fun comment: "Windows 10 uses CR/LF to maintain compatability with the 1963 Model 33 teletype machine".Knife
@RenéG I don't need a citation, I was there and saw it for myself. Some early dot matrix printers required even a few extra NULs thrown in for good measure, because as the baud rate of the interface increased the head couldn't keep up even with two characters worth of time. That problem went away as buffering and flow control entered the picture, but the early printers didn't have that. Finally as printers became output-only they went to a parallel interface that had built-in handshaking.Cecillececily
“Contrary to popular belief, the QWERTY layout was not designed to slow the typist down, …” – Properties | QWERTY - WikipediaRandy
@JasonSparc: Yeah, it's probably a myth. Unfortunately, I am unable to read the source material (Japanese) for the "true story".Knife
@JasonSparc: I find widespacer.blogspot.com/2015/11/… pretty convincing, though it's ironic that QWERTY is only a good layout for typewriters that use a semi-circle of type bars for the bottom two rows, rather than for all four rows. Unfortunately, QWERTY makes no effort to avoid troublesome digraphs between the home row and the row above. The ED/DE digraph is especially common, but many if not most of the adjacent-type-bar digraphs between those rows probably occur more often than only removely common digraph on a dual-semi-circle design (ZA/AZ).Littell
@JasonSparc: If a two-finger typist tried typing "SWeet katHY JUst LOst mIKE's ouTGoing AQuifer DEsigns FRiday", (but without the capitalizations shown), a typical manual typewriter would be likely to jam, since the inter-key time required to avoid jams on the capitalized digraphs would be much longer than for almost any other key pairs.Littell
Before Mac OS X, classic Mac OS used CR alone for a newline. So the transition was from CR to LF rather than from CRLF to LF. There was a time when I had to deal with all three possibilities—LF, CR, or CRLF—in text files I had.Macmahon

© 2022 - 2024 — McMap. All rights reserved.