Carriage return + newline in raw string literals?
Asked Answered
S

2

9

Consider a C++ file that has UNIX line endings (i.e. '\x0a' instead of "\x0d\x0a") and includes following raw string literal:

const char foo[] = R"(hello^M
)";

(where ^M is the actual byte 0x0d (i.e. carriage return)).

What should be the result of following string comparison (when taking the standard's definition of raw string literals into account)?

strcmp("hello\r\n", foo);

Should the strings compare to equal or not? (i.e. 0 or !=0 ?)

With GCC 4.8 (on Fedora 19) they compare unequal.

Is this a bug or feature in GCC?

Sibel answered 5/4, 2014 at 7:46 Comment(2)
now consider a file with DOS line endings... the string literal will look the same as in your example. from a language designer perspective, I wouldn't like the line endings alter the behaviour of my code... even if it's a raw string. but I might be wrong.Psychophysics
When I printed the characters of foo in decimal, I got 104, 101, 108, 108, 111, 10. The ^M somehow didn't make it to the string.Combo
L
7

As far as the standard is concerned, you can only use members of the basic source character set in the string literals (and elsewhere in the program). How the physical representation of the program is mapped to the basic source character set is implementation-defined.

g++ apparently thinks that ASCII \x0A, ASCII \x0D, and ASCII \x0D\x0A are all valid representations of the member of the basic source character set called "newline". Which is totally reasonable, given that it is desirable for source code transferred between Windows, Unix and Mac OS X Classic machines to keep its meaning.

Lubricator answered 5/4, 2014 at 8:16 Comment(0)
C
0

Raw string literals are not completely raw because they arrive to your program via compiler which reads and interpret input C++ files. Before strcmp'ing 2 strings you can check the size of your raw string - it will be different from expected by the number of ^M (\x0d) characters.

You can either resort to reading data as binary, e.g. (binary read/w example):

std::ifstream infile ("test.txt", std::ifstream::binary);
infile.seekg (0,infile.end);
long size = infile.tellg();
infile.seekg (0);
char* buffer = new char[size];
infile.read (buffer,size);

Or you can insist on using raw literals but with some tricks - replace all 'bad' character with some other characters in your literal and then do reverse replacement when you use this literal, e.g.:

... all includes ...

std::string str = R"(hello|
)";

int main()
{
  std::replace(str.begin(), str.end(), '|', '\015');
  std::cout << strcmp("hello\r\n", str.data()) << std::endl;
}
Czernowitz answered 23/9, 2019 at 12:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.