Trying to understand the C preprocessor
Asked Answered
H

2

6

Why do these blocks of code yield different results?

Some common code:

#define PART1PART2 works
#define STRINGAFY0(s) #s
#define STRINGAFY1(s) STRINGAFY0(s)

case 1:

#define GLUE(a,b,c) a##b##c  
STRINGAFY1(GLUE(PART1,PART2,*))
//yields
"PART1PART2*"

case 2:

#define GLUE(a,b) a##b##*
STRINGAFY1(GLUE(PART1,PART2))
//yields
"works*"

case 3:

#define GLUE(a,b) a##b
STRINGAFY1(GLUE(PART1,PART2*))
//yields
"PART1PART2*"

I am using MSVC++ from VS.net 2005 sp1

Edit: it is currently my belief that the preprocessor works like this when expanding macros: Step 1: - take the body - remove any whitespace around ## operators - parse the string, in the case that an identifier is found that matches the name of a parameter: -if it is next to a ## operator, replace the identifier with the literal value of the parameter (i.e. the string passed in) -if it is NOT next to a ## operator, run this whole explanation process on the value of the parameter first, then replace the identifier with that result. (ignoring the stringafy single '#' case atm) -remove all ## operators

Step 2: - take that resultant string and parse it for any macros

now, from that I believe that all 3 cases should produce the exact same resultant string:

PART1PART2*

and hence after step 2, should result in

works*

but at very least should result in the same thing.

Hyohyoid answered 19/7, 2010 at 7:2 Comment(0)
P
3

cases 1 and 2 have no defined behavior since your are tempting to paste a * into one preprocessor token. According to the association rules of your preprocessor this either tries to glue together the tokens PART1PART2 (or just PART2) and *. In your case this probably fails silently, which is one of the possible outcomes when things are undefined. The token PART1PART2 followed by * will then not be considered for macro expansion again. Stringfication then produces the result you see.

My gcc behaves differently on your examples:

/usr/bin/gcc -O0 -g -std=c89 -pedantic   -E test-prepro.c
test-prepro.c:16:1: error: pasting "PART1PART2" and "*" does not give a valid preprocessing token
"works*"

So to summarize your case 1 has two problems.

  • Pasting two tokens that don't result in a valid preprocessor token.
  • evaluation order of the ## operator

In case 3, your compiler is giving the wrong result. It should

  1. evaluate the arguments to STRINGAFY1
  2. to do that it has to expand GLUE
  3. GLUE results in PART1PART2*
  4. which must be expanded again
  5. the result is works*
  6. which then is passed to STRINGAFY1
Papyraceous answered 19/7, 2010 at 8:9 Comment(10)
shouldn't case 1 and case 2 result in the same thing then?Hyohyoid
what is a "valid preprocessor token"? maybe an identifier? in that case I can't see how any of them would work. I suppose case 2 - the only one that works - is the only one that passes legal identifiers to all parameters...Hyohyoid
@matt, no case 1 is simply undefined, so you can't know what your compiler choses to resolve that problem.Papyraceous
The preprocessor is supposed to split the input into tokens. Valid tokens are (among others) identifiers, punctuation characters by themselves, all two or three character operators such as <<= or ... and some weird concept of what would be a number. The ## can only glue together two tokens into another valid token. E.g * ## = should be possible whereas = ## * should not, there is no operator =*.Papyraceous
I'm still a little confused as to the difference between case 1 and case 2, both attempt the same operation, its just in one case the '' has come from a parameter, and in the other, it is a literal. seeing as how literal##parameter and parameter##parameter should work exactly the same, I'm a little confused as to how they could yield different results. perhaps in case 2 it notices early that the '' is not gonna create an identifier, and aborts the paste, where as, hiding in a parameter, the compiler gets caught up?Hyohyoid
@ your last comment: orly? thats interesting! so I see going by that, case 3 should work (the * is taken as a separate token) but it doesn't :(Hyohyoid
@matt, yes, I had copied your case two wrongly to my example code, sorry. case 2 is not valid either, and gives me the same error.Papyraceous
yes case 3 is valid, from C99, but I don't think this changed: each instance of a ## preprocessing token in the replacement list (not from an argument) is deleted and the preceding preprocessing token is concatenated with the following preprocessing token. So this clearly requires to glue the PART2 and not the *.Papyraceous
OK, got my self a copy of the C standard (draft but what ever) your absolutely right, I have no idea what the VS compiler is doing. surely if you were going to be lax with the standards in the pursuit of user friendliness, you would make it so all 3 of these cases pass: in case 1 and 2, ok sure, you can't glue b and c, don't see why you wouldn't try and glue a and b, especially if you've already decided that you were just gonna leave the tokens separate anyway - the standard doesn't specify order of operation. and yeah, case 3 should just work.Hyohyoid
@matt: if you are still interested, you could have a look into the boost preprocessor macros. IIRC they have some sophisticated code to adapt to preprocessor deficiencies.Papyraceous
W
1

It's doing exactly what you are telling it to do. The first and second take the symbol names passed in and paste them together into a new symbol. The third takes 2 symbols and pastes them, then you are placing the * in the string yourself (which will eventually evaluate into something else.)

What exactly is the question with the results? What did you expect to get? It all seems to be working as I would expect it to.

Then of course is the question of why are you playing with the dark arts of symbol munging like this anyways? :)

Wylma answered 19/7, 2010 at 7:17 Comment(5)
as far as I can tell, after the macro expander has pasted/substituted the parameters in, and sorted out all the paste'##' operators, all three should yield Exactly the same string, 'PART1PART2*' seeing as how this should be done before the expanded body is then parsed for sub macros, I would expect the same result for all 3Hyohyoid
oh yeah, and I am only doing it to try and understand exactly how the preprocessor works, I would never write horrible code like this :)Hyohyoid
The construct a ## b doesn't expand a and b, and #c doesn't expand c. See boost.org/doc/libs/1_43_0/libs/preprocessor/doc/ref/cat.htmlKayser
not quite sure what you are saying there Philipp, that is stating that the ## operator does not allow the individual parameters to be expanded BEFORE they are pasted, mine wouldn't expand to anything individually anyway, what I am expecting is that the RESULT string AFTER pasting be expanded. now it is my belief that the pasting operations and the parse over the result are entirely independent operations, that is why, seeing as how the pasting operation results in the same string in all 3 cases, I find it strange that the answer is not the same each time.Hyohyoid
The first 2 cases are undefined behaviour, the result of ## must be a preprocessing-token (see section 6.4 of the C standard for a list of what constitutes a preprocessing-token)Troth

© 2022 - 2024 — McMap. All rights reserved.