Parsing an int(x) parameter

Asked 28/2, 2015 at 19:29 Answered 28/2, 2015 at 19:36

Solved c++c function parsing ambiguous-grammar

Here is a simple function with one int parameter:

void f(int x) {}

f(42);

And here is another function with one int parameter:

void g(int(x)) {}

g(42);

Now let us define x to be a type:

typedef int x;
void h(int(x)) {}

h(42);
// warning: passing argument 1 of ‘h’ makes pointer from integer without a cast

(This is the behavior I observe with gcc 4.8.2)

How do parser writers deal with this situation?

It seems the classic pipeline Lexer -> Parser -> Semantic Checker -> ... does not work here.

Gilbreath answered 28/2, 2015 at 19:29 Comment(14)

Oh Fred, you're into the most vexing parse again. Yep, it's the same mechanism. And you're right, there is knowledge flowing back in the pipeline. – Immateriality 28/2, 2015 at 19:35

Someone removed the C++ tag. I put it back. Do you want this to be C or C++ @FredOverflow. – Immateriality 28/2, 2015 at 19:42

The lexer sends a symbol x -> the parser knowing the meaning of the sybols finds the potential gramatical rules (and in doubt, applies the most vexing parse) -> the semantic checker looks for potential type conversions. I don't know what should not work in the classic pippeline...? – Mostly 28/2, 2015 at 19:44

Is your question more about what kind of token the lexical analyzer interprets x as or about the grammar rules that interpret this as a function pointer rather than another parse? – Southwesterly 28/2, 2015 at 20:2

@Southwesterly My question is, as I wrote: How do parser writers deal with this situation? There must be some standard idioms, patterns, whatever, because I bet there are dozens of these exceptions to deal with in real world parsers. – Gilbreath 28/2, 2015 at 20:31

Perhaps you could clarify the question by explaining how a naive parser would get this wrong? (It may be obvious to anyone with a shot at actually figuring out the answer, but making it clearer to the rest of us wouldn't hurt. At a minimum you might get fewer answers that miss your point.) – Rolfe 1/3, 2015 at 1:25

@FredOverflow Not sure if you know all this already or not, but here's a lexical analyzer and grammar for C11: quut.com/c/ANSI-C-grammar-l-2011.html Before the typedef, x will be interpreted as an IDENTIFIER while after, it will be interpreted as a TYPEDEF_NAME because the analyzer is being informed through the symbol table that x is now a type. In this particular case, the parsing is pretty straightforward then. The feedback occurs through the symbol table in this case. – Southwesterly 1/3, 2015 at 4:37

@Southwesterly But wouldn't that make void foo() { typedef int x; { int x; } } illegal, because int x; would now be parsed as INT TYPEDEF_NAME? – Gilbreath 1/3, 2015 at 8:51

@FredOverflow It does make typedef int x; int x; at the same scope illegal, for example. I believe your example is legal because the symbol table changes the type of symbol x is depending on its scope. Again, that is feedback occurring from the semantic level down to the lexer through the symbol table. – Southwesterly 1/3, 2015 at 14:57

Actually, I'm not so sure now because typedef int x; int main() { x x = 5; return x; } looks legal too. – Southwesterly 1/3, 2015 at 14:59

Unless there is a grammar rule that matches IDENTIFIER IDENTIFIER ; that I'm not able to see ... – Southwesterly 1/3, 2015 at 15:15

@Southwesterly In the meantime, I discovered the article How Clang handles the type / variable name ambiguity of C/C++ and found it to be very helpful. TL;DR: The lexer does not need to be hacked with artificial type identifiers. All ambiguities can be resolved inside the parser. – Gilbreath 1/3, 2015 at 15:24

@FredOverflow That's a great series of articles on this topic. The first couple of articles eli.thegreenplace.net/2007/11/24/… confirmed that most lex-yacc approaches use feedback from the higher levels to try to distinguish typedef-name's from regular identifiers in the lexer. – Southwesterly 1/3, 2015 at 19:35

Answer is given here: https://mcmap.net/q/1323796/-do-these-syntactically-correct-c-statements-carry-any-meaning/103167 Most importantly, "Syntactic and semantic analysis are performed together, inseparably. Semantic errors are caught during the syntactic phase of the parse, phase 7." – Whinny 2/3, 2015 at 2:10

You've effectively defined h as:

void h(int(int)) {}

The parameter is interpreted as an unnamed function pointer that takes an int and returns an int. When you try to pass 42 to it, the compiler complains that you are trying to make a function pointer from an integer.

I think what you are asking for is how do compilers handle (unnamed) function pointer types and their possibly ambiguous parses. Your question is related to the the most vexing parse in C++.

There they decided that whenever there was ambiguity between a function pointer type and another way to parse, then it would be interpreted as a function pointer. They did that because there are other ways to disambiguate when you don't want it to be a function pointer (e.g. - surround it in parentheses, use {} initializer syntax, etc.).

Getting into the specifics of how a parser writer might deal with this parse, here's a lexical analyzer and grammar for C11: http://quut.com/c/ANSI-C-grammar-l-2011.html In your example, before the typedef, x will be an IDENTIFIER token while after, it will be a TYPEDEF_NAME token because the analyzer is being informed through the symbol table that x is now a type. In this particular case, the parsing is unambiguous then. The "pipeline feedback" that you seem to be referring to occurs through the symbol table in this case, where the lexical analyzer is informed about context by the higher levels that affects its output as the compilation progresses.

EDIT: These three articles, found by the OP, describe this problem and how it is solved by some C parsers / compilers very nicely. Basically, a context free grammar (CFG) that only accepts / generates legal C syntax can almost be specified. With the introduction of a scoped lookup table that allows the lexical analyzer to distinguish between identifiers and typedef-names appropriately, then a CFG [and more importantly a LALR(1) parser (e.g. - yacc generated)] that only accepts / generates legal C syntax can be specified.

Here's an even scarier example than the OP's:

typedef int x;

int main() { x x = 5; return x; }  /* crazily enough this is legal C syntax and a well formed C program */

Southwesterly answered 28/2, 2015 at 19:35 Comment(8)

It should not issue a warning but an error because there is not implicit conversion to int(*)(). – Mostly 28/2, 2015 at 19:38

@Christophe, I think C lets you do it without any explicit cast. – Gelb 28/2, 2015 at 19:39

Ok ! I looked ad it with the C++ glasses ! – Mostly 28/2, 2015 at 19:45

All true but you didn't answer the question. – Introduce 28/2, 2015 at 19:54

@Christophe: Unless Fred (the OP) has two SO accounts, one with the name of "G. Samaras", it wasn't him who changed that tag from C++ to C. He has evidently compiled the code as C. But tagged the question as C++. – Immateriality 28/2, 2015 at 19:56

@jschultz410: Sorry but I think you're still missing the point here. – Introduce 28/2, 2015 at 20:2

BTW, regarding "This isn't valid C code at all because the parameter name is missing": [C99: 6.7.5.3/6]: "A parameter type list specifies the types of, and may declare identifiers for, the parameters of the function." and coliru.stacked-crooked.com/a/eb9108e21d07c2f3 – Introduce 28/2, 2015 at 20:46

@Gelb C doesn't, either. It's a constraint violation that's required to be diagnosed. It just happens that GCC is more lenient for C than C++, issuing a warning rather an error. – Decency 28/2, 2015 at 21:31

After introducing typedef

typedef int x;

the function has the following definition

void h(int( int ) ) {}

that is its parameter is declared as having type of function int( int ) that is adjusted to pointer to function.

You call the function supplying an integer:

h(42);

There is no implicit conversion from integer to function pointer.

I do not see a problem with

It seems the classic pipeline Lexer -> Parser -> Semantic Checker -> ... does not work here.

The parameter is substituted for the typedef.
x has a compiler attribute of a type. So it considers the record like

type-specifier h(type-specifier( type-name ) ) {}

Cari answered 28/2, 2015 at 19:36 Comment(8)

I think Fred is asking about the use of non-syntactical information in the parse, i.e. how knowledge of the nature of x influences the parsing. – Immateriality 28/2, 2015 at 19:37

Thanks Vlad. I think I disagree with your reasoning about substitution, though. At the least I think, if it is correct, then a standards quote would be nice. – Immateriality 28/2, 2015 at 19:45

@Cheers and hth. - Alf The compiler simply builds a table of names. And each name in the table has its attributes that describe the name. – Cari 28/2, 2015 at 19:47

Thanks, and yes I have implemented a compiler (as a student, I think that was in 1984), and quite a few interpreters, so I'm familiar with how they work internally. Although it's been so long that if you throw some grammar names at me I would have to google it. :) – Immateriality 28/2, 2015 at 19:50

Vlad, you didn't answer the question. @Cheersandhth.-Alf, what on earth do quotes from the standard have to do with it?? – Introduce 28/2, 2015 at 19:54

@Cheers and hth. - Alf Where to find time that to read all that exists in programming? – Cari 28/2, 2015 at 19:55

@LightnessRacesinOrbit: The precise mechanism involved at the formal level, constraining what guaranteed behavior one can rely one, is sometimes quite different from the practical details of how an actual compiler deals with the issue. And I think Fred is asking about the formal level. I.e., language lawyer stuff, how, in the formal, information flows back from semantic analysis to syntactical parsing. – Immateriality 28/2, 2015 at 20:0

@Cheers: And the standard discusses precisely zero of those things. At least, it doesn't discuss any further than what the phases of translation are, but Fred already knows those. What he's asking for is the details at the formal level of implementations, not of the C++ abstract machine. – Introduce 28/2, 2015 at 20:1

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags