Portable end of line (newline)
Asked Answered
D

3

15

It's been an unpleasant surprise that '\n' is replaced with "\r\n" on Windows, I did not know that. (I am guessing it is also replaced on Mac...)

Is there an easy way to ensure that Linux, Mac and Windows users can easily exchange text files?

By easy way I mean: without writing the file in binary mode or testing and replacing the end-of-line chars myself (or with some third party program/code). This issue effects my C++ program doing the text file I/O.

Dad answered 31/12, 2011 at 16:56 Comment(7)
What editor are you using? What source control are you using?Capillarity
@AtesGoral These are irrelevant to the executable doing the text-based I/O.Dad
"without writing the file in binary mode". This would be the "easy" way, why do you want to avoid it?Halden
Sorry, but line feeds are not "secretly" replaced. This behavior is well documented. From an online tutorial on files: "Non-binary files are known as text files, and some translations may occur due to formatting of some special characters (like newline and carriage return characters)."Dudleyduds
@CharlesBailey I did not know that you can use operator<< in binary mode :) I have only used write in binary mode. I expected problems on reading but it seems to work fine. Still testing...Dad
@Dad Ah, without reading the question fully, I assumed the problem was with the source code :/Capillarity
@CharlesBailey As it turns out, binary mode is the solution. It was my lack of knowledge...Dad
D
15

Apologies for the partial overlap with other answers, but for the sake of completeness:

Myth: endl is 'more portable' since it writes the line ending depending on the platform convention.

Truth: endl is defined to write \n to the stream and then call flush. So in fact you almost never want to use it. All \n that are written to a text-mode stream are implicitly converted to \r\n by the CRT behind the scenes, whether you use os<<endl, os<<'\n', or fputs("\n",file).

Myth: You should open files in text mode to write text and in binary mode to write binary data.

Truth: Text mode exists in the first place because some time ago there were file-systems that distinguished between text files and binary files. It's no longer true on any sane platform I know. You can write text to binary-opened files just as well, you just loose the automatic \n -> \r\n conversion on Windows. However, this conversion causes more harm than good. Among others, it makes your code behave differently on different platforms, and tell/seek become problematic to use. Therefore it's best to avoid this automatic conversion. Note that POSIX does not distinguish between binary and text mode.

How to do text: Open everything in binary mode and use the plain-old \n. You'll also need to worry about the encoding. Standardize on UTF-8 for Unicode-correctness. Use UTF-8 encoded narrow-strings internally, instead of wchar_t which is different on different platforms. Your code will become easier to port.

Tip: You can force MSVC to open all files in binary mode by default. It should work as follows:

#include <stdio.h>
#include <iostream>
int main() {
    _fmode = _O_BINARY;
    std::ofstream f("a.txt"); // opens in binary mode
}

EDIT: As of 2021, Windows 10 Notepad understands UNIX line endings.

Delinquent answered 31/12, 2011 at 17:35 Comment(7)
@LokiAstari: I'm not advocating fopen, it was just the simplest and the most explicit example. You may like the edited version more.Delinquent
@ybungalobill: Using '\n' in binary mode yields Unix line endings. On Windows, this breaks crappy text editors like notepad and almost any textbox you paste such content into (even when copied from an editor that handles Unix line-endings). Is this really what you are advocating, or have I completely misread you?Quantize
@MarceloCantos: Notepad is an excuse for a text editor. When copying&pasting some editors convert '\n' into '\r\n' (e.g. Wordpad or web browsers I checked), although I believe it's the receiver responsibility to understand '\n'. That said, I admit that the guideline is not acceptable if the text file is intended for a non-technical end-user, since she won't care how 'correct' your program is.Delinquent
@ybungalobill: This isn't about non-technical users. I know of no text editor running on Windows that adheres to the policy you are advocating. Even emacs and vim emit CRLF by default. Sane or not, Windows does distinguish between text and binary, and to ignore this is just asking for trouble. Note that I don't object to your advice as an answer to this question, which is about cross-platform portability. What concerns me is the sense I get that you are advocating the use of binary I/O under all circumstances. If that wasn't your intent, then I apologise for drawing the wrong conclusion.Quantize
@MarceloCantos: Notepad++ is configurable to emit LF. But it is not a problem if some editor emits CRLF, since when you read text files you usually ignore the whitespace, and CR is just a whitespace character, so you do not have any trouble to read these files from C++. The discussion was about reading the output of your program. Also will you care to backup your claims that "Windows does distinguish between text and binary"? I do not see any text/binary related flags in CreateFile...Delinquent
@ybungalobill: Not all library calls ignore whitespace. fgets() reads CR into the buffer, causing different behaviour depending on the line endings in the input file. Whether this matters or not is contingent on the programmer's intent; it should not be subject to a non-negotiable rule of the kind you recommend. WRT my "claims": Most C programs (especially portable ones) use fopen(), which treats text and binary files differently on Windows (on every runtime library I've used, at least).Quantize
@MarceloCantos: I haven't said that the library ignores it, its the programer who usually wants to ignore it. Even in the fgets case you will find yourself trimming the whitespace in the end. As per fopen, it is not Windows function, so one cannot infer from it anything about Windows per se. It does do the conversion by default on MSVC runtime, but the default can be overridden as I've shown to provide POSIX like behavior. In fact, imagine that you are porting something to Windows. In such case it is simpler to change the default than hunt the bugs.Delinquent
K
12

The issue isn’t with endl at all, it’s that text streams reformat line breaks depending on the system’s standard.

If you don’t want that, simply don’t use text streams – use binary streams. That is, open your files with the ios::binary flag.

That said, if the only issue is that users can exchange files, I wouldn’t bother with the output mode at all, I’d rather make sure that your program can read different formats without choking. That is, it should accept different line endings.

This is by the way what any decent text editor does (but then again, the default notepad.exe on Windows is not a decent text editor, and won’t correctly handle Unix line breaks).

Koh answered 31/12, 2011 at 17:0 Comment(0)
M
7

If you really just want an ASCII LF, the easiest way is to open the file in binary mode: in non-binary mode \n is replaced by a platform specific end of line sequence (e.g. it may be replaced by a LF/CR or a CR/LF sequence; on UNIXes it typically is just LF). In binary mode this is not done. Turning off the replacement is also the only effect of the binary mode.

BTW, using endl is equivalent to writing a \n followed by flushing the stream. The typically unintended flush can become a major performance problem. Thus, endl should be use rarely and only when the flush is intended.

Martamartaban answered 31/12, 2011 at 17:8 Comment(1)
+1: I would just say the '\n' (in text mode) is replaced by a platform specific ELS (End of line sequence).Skin

© 2022 - 2024 — McMap. All rights reserved.