What the point of using std::ios_base::binary?
Asked Answered
W

3

17

I had a issue with Linux file reading under Window. Here is the issue discussion: Using fstream::seekg under windows on a file created under Unix.

The issue was workarounded by opening the text file with std::ios_base::binary specified.

But what's the actual point with this mode? If specified, you can still work with your file as a text file (writting with mystream << "Hello World" << std::endl and reading with std::getline).

Under Windows, the only difference, I could notice is that mystream << "Hello World" << std::endl uses:

  • 0x0D 0x0A as line separator if std::ios_base::binary was not specified (EOL and carriage return)
  • 0x0A as line separator if std::ios_base::binary was specified (EOL only)

Notepad does not smartly show lines when opening the files generated with std::ios_base::binary. Better editors like vi or Wordpad does show them.

Is that really the only difference there is between files generated with and without std::ios_base::binary? Documentation says Consider stream as binary rather than text., what does this mean in the end?

Is it safe to always set std::ios_base::binary if I don't care about opeing the file in Notepad and want to have fstream::seekg always work?

Winstonwinstonn answered 18/11, 2014 at 11:22 Comment(2)
I would also test with \0 characters in the strings too. In binary those are probably just outputted, while for non-binary they might be interpreted as a string terminator.Campanile
@GáborBakos If the only systems concerned are Unix and Windows, then the only differences are the line endings and 0x1A, which Windows treats as an end of file character, at least on input.Jimmiejimmy
J
13

The differences between binary and text modes are implementation defined, but only concern the lowest level: they do not change the meaning of things like << and >> (which insert and extract textual data). Also, formally, outputting all but a few non-printable characters (like '\n') is undefined behavior if the file is in text mode.

For the most common OSs: under Unix, there is no distinction; both are identical. Under Windows, '\n' internally will be mapped to the two character sequence CR, LF (0x0D, 0x0A) externally, and 0x1A will be interpreted as an end of file when reading. In more exotic (and mostly extinct) OSs, however, they could be represented by entirely different file types at the OS level, and it could be impossible to read a file in text mode if it were written in binary mode, and vice versa. Or you could see something different: extra white space at the end of line, or no '\n' in binary mode.

With regards to always setting std::ios_base::binary: my policy for portable files is to decide exactly how I want them formatted, set binary, and output what I want. Which is often CR, LF, rather than just LF, since that's the network standard. On the other hand, most Windows programs have no problems with just LF, but I've encountered more than a few Unix programs which have problems with CR, LF; which argues for systematically using just LF (which is easier, too). Doing things this way means that I get the same results regardless of whether I'm running under Unix or under Windows.

Jimmiejimmy answered 18/11, 2014 at 11:29 Comment(3)
Do I understand well that setting std::ios_base::binary or not for file reading makes no difference (except fixing the bug referenced above) and setting std::ios_base::binary or not for file writting may lead to differences based on the platform?Winstonwinstonn
@Winstonwinstonn No. The choice between binary and text affects both reading and writing: under Windows, when reading, a CR,LF will be mapped to an LF, and a 0x1A will cause reading to to stop. And on some exotic systems, an open in binary mode may fail if the file was written in text, or vice versa.Jimmiejimmy
@JamesKanze -- I think others have commented that (at least on MacOS) the stream operators IGNORE binary mode if you have it set: that means if you use ">>" (the extract formatting operator) to read in binary data from a stream, you will see CR,LF expansion/conversion even though you might NOT have expected this with binary mode! I chased down complex binary file format bugs introduced from the use of ">>". Those issues were easily solved by using a plain read().Pyrrho
W
2

I found (by loosing two hour of work trying to understand what was going on) a situation where specifying std::ios_base::binary does make a huge difference.

std::vector<char> data{ 0x01, 0x02, 0x0A, 0x0B };
{
    std::fstream tfat;
    tfat.open( "binary", std::ios_base::out | std::ios_base::binary );
    tfat.write( &(data[0]), data.size() );
    tfat.close();
}
{
    std::fstream tfat;
    tfat.open( "not_binary", std::ios_base::out );
    tfat.write( &(data[0]), data.size() );
    tfat.close();
}

Then, "binary" file contains 4 bytes: 0x01, 0x02, 0x0A, 0x0B But "not_binary" file contains 5 bytes: 0x01, 0x02, 0x0D, 0x0A, 0x0B

0x0D (\r) was inserted before 0x0A (\n). While I write 4 bytes, I expected to have 4 bytes in the file in the end.

So this make me realize why std::ios_base::binary must be used when writting data to a file, even if not using << operator.

Winstonwinstonn answered 10/4, 2020 at 13:6 Comment(2)
I tried on my linux machine and I cannot reproduce the result. Both files contain the 4 bytes as expected with no extra byte. I understand that this is somehow related with particular text formatting on windows systems, but the fact that if you forget ios::binary you may get random stuff in the output, is super scary.Balduin
@DarioP: I confirm I was using Windows when I tested thatWinstonwinstonn
D
1

The meaning of text stream vs binary stream is platform-specific and somewhat unpredictable.

But as far as popular platforms go, it's easy: On Linux and MacOS X, there is no difference. On Windows, the only difference is that internal \n is translated to \r\n in the external stream.

Dichotomize answered 18/11, 2014 at 11:30 Comment(1)
Under Windows, a 0x1A will be treated as end of file in text mode.Jimmiejimmy

© 2022 - 2024 — McMap. All rights reserved.