In C++11 what should happen first: raw string expansion or macros?
Asked Answered
V

2

46

This code works in Visual C++ 2013 but not in gcc/clang:

#if 0
R"foo(
#else
int dostuff () { return 23; }
// )foo";
#endif
dostuff();

Visual C++ removes the if 0 first. Clang expands the R raw string first (and never defining dostuff). Who is right and why?

Veneration answered 23/6, 2015 at 8:2 Comment(7)
This translation phase reference will tell you.Mansfield
As per Joachim's link, "Phase 3" (the tokenization) happens before "Phase 4" (the preprocessor). IOW, the code is invalid.Scorch
I don't want to add another question just now, but any ideas on how to make clang behave like vcc there? To me the preproc first would actually be more useful.Veneration
@starmole: how is any of it useful? Why not just #if <whatever> ... #else ... #endif, without the string literal?Iroquois
@TonyD: The idea is to include code as both (or either) code and a string. One fun use case would be a tutorial that can both execute and show the source. In my case it was for GLSL shaders that need to be string in GPU mode to send to the graphics driver but should compile as cpp in software emulation.Veneration
@starmole: you do know about the # operator for macros? You can stringify macro arguments and have them create string literals and/or code. Admittedly, can get tricky when there are comma separated values to pass but __VA_ARGS__ sometimes helps.Iroquois
@TonyD: Or you could put redundant () around the single argument, then stringify it and remove the first and last character from the stringified result.Eelpout
I
38

[Update: Adrian McCarthy comments below saying MSVC++ 2017 fixes this]

GCC and clang are right, VC++ is wrong.

2.2 Phases of translation [lex.phases]:

[...]

  1. The source file is decomposed into preprocessing tokens (2.5) and sequences of white-space characters (including comments).

  2. Preprocessing directives are executed, [...]

And 2.5 Preprocessing tokens [lex.pptoken] lists string-literals amongst the tokens.

Consequently, parsing is required to tokenise the string literal first, "consuming" the #else and dostuff function definition.

Iroquois answered 23/6, 2015 at 8:12 Comment(5)
Got it! Thanks to Joachim Pileborg above for the link too. So visual C is wrong!Veneration
@starmole: lots of older compilers used to do lots of things like this wrong, but MS VC++ in particular stands out for rarely fixing things like this, as they don't want to risk breaking the masses of code written for their compilers that might depend somehow on existing behaviours.Iroquois
To add to @TonyD: especially in weird edge cases that people are unlikely to ever run into.Regnant
MSVC++ 2017 seems to do the right thing. Raw string literals were brand new in MSVC++ 2013. The MSVC preprocessor has long had some non-standard behavior that was kept as-is for backward compatibility. But recently even the preprocessor is getting the standards compliance treatment. blogs.msdn.microsoft.com/vcblog/2018/07/06/…Xanthic
@AdrianMcCarthy: oh cool - I've edited my answer correspondingly. CheersIroquois
D
1

I thought it was worth reiterating the interesting "quirk" of the lexing phase. The contents inside a #if 0 ... #else are not ignored like you might naively imagine (I was naive until I tested it). Here are two examples, the difference is simply an extra space between the R and the " in the raw string declaration which is inside the #if 0 block.

#include <iostream>
using namespace std;

#if 0 
const char* s = R"(
#else
int foo() { return 3; }
// )";
#endif

int main() {
    std::cout << foo() << std::endl;
    return 0;
}

Results in (gcc 6.3, C++14)

prog.cpp: In function ‘int main()’:
prog.cpp:12:19: error: ‘foo’ was not declared in this scope
  std::cout << foo() << std::endl;

While adding a space character (in the code that is supposedly ignored by the compiler!) lets it compile:

#include <iostream>
using namespace std;

#if 0 
const char* s = R "(
#else
int foo() { return 3; }
// )";
#endif

int main() {
    std::cout << foo() << std::endl;
    return 0;
}

Compiles and runs with

3

Note that using a traditional non-raw string literal does not have this problem. You are not allowed to split a non-raw string across a newline, so in this case, the non-raw string is ignored and not tokenized. So if you got rid of the R, it compiles just file.

Obviously, the safe thing to do is not let your raw-string cross a preprocessor boundary.

Ducharme answered 28/1, 2019 at 19:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.