C99 printf formatters vs C++11 user-defined-literals
Asked Answered
P

1

23

This code:

#define __STDC_FORMAT_MACROS
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int argc,char **argv)
{
   uint64_t val=1234567890;
   printf("%"PRId64"\n",val);
   exit(0);
}

Works for C99, C++03, C++11 according to GCC 4.5, but fails on C++11 according to GCC 4.7.1. Adding a space before PRId64 lets GCC 4.7.1 compile it.

Which one is correct?

Plumb answered 8/8, 2012 at 17:5 Comment(1)
actually, you need PRIu64, not PRId64, to print unsigned (in general PRI{o,u,x,X}N for unsigned, and PRI{i,d}N for signed)Annam
B
20

gcc 4.7.1 is correct. According to the standard,

2.2 Phases of translation [lex.phases]

1 - The precedence among the syntax rules of translation is specified by the following phases. [...]
3. The source file is decomposed into preprocessing tokens (2.5) and sequences of white-space characters (including comments). [...]
4. Preprocessing directives are executed, macro invocations are expanded, [...]

And per 2.5 Preprocessing tokens [lex.pptoken], user-defined-string-literal is a preprocessing token production:

2.14.8 User-defined literals [lex.ext]

user-defined-string-literal:
    string-literal ud-suffix
ud-suffix:
    identifier

So the phase-4 macro expansion of PRId64 is irrelevant, because "%"PRId64 has already been parsed as a single user-defined-string-literal preprocessing token consisting of string-literal "%" and ud-suffix PRId64.

Oh, this is going to be awesome; everyone will have to change

printf("%"PRId64"\n", val);

to

printf("%" PRId64"\n", val);     // note extra space

However! gcc and clang have agreed to treat user-defined string literals without a leading underscore on the suffix as two separate tokens (per the non well formedness criterion), see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52538 so for future versions of gcc (4.8 branch, I think) existing code will work again.

Brieta answered 8/8, 2012 at 17:9 Comment(12)
But how on earth can a user-defined literal be defined/parsed before platform ifdef's are processed? UDL can do all constexpr stuff, right?Plumb
@Plumb that's fine; constexpr is a phase-7 process.Brieta
@rubenvb: but the UDL will be gone by then, due to preprocessor macro replacement in phase 4.Plumb
@Plumb if it helps you understand, "lit"_udl is treated as operator "" _udl("lit", 3); this happens early in phase 7.Brieta
but by the time that happens, in this example, you _udl is already replaced by the preprocessor...Plumb
@Plumb ah right; there's nothing there for macro replacement to hit; "%"PRId64 is a single token.Brieta
Ick. IMHO writing it with spaces: printf("%" PRId64 "\n", val); is better style anyway, but it's still a change that breaks existing code. Incidentally, ud-suffixes not starting with an underscore are reserved for future standardization, so the program is "ill-formed, no diagnostic required". Reference: N3337 2.14.8p10 and 17.6.4.3.5Chuvash
@Plumb this has caused some discussion and there is a fix in the works; see my latest edit.Brieta
Good for gcc and clang -- but the code is still ill-formed as far as the language standard is concerned, and other compilers won't necessarily do the same thing. The bug report mentions that the committee is considering this issue. Also <inttypes.h> is probably the most common source of this problem, but other code could also be broken by the change.Chuvash
"lit"_udl is a single token in C++11. So it precedes the preprocessor.Microstructure
Sorry if I'm repeating but is adding the space the solution for this? Is it guaranteed not to change the behaviour?Actiniform
@HannaKhalil yes, adding the space is the solution and will work on all compilers (since compile-time string concatenation works however much whitespace there is). Omitting the space was only ever a stylistic matter.Brieta

© 2022 - 2024 — McMap. All rights reserved.