I've been recently writing parser for language based on C. I'm using CUP (Yacc for Java).
I want to implement "The lexer hack" (http://eli.thegreenplace.net/2011/05/02/the-context-sensitivity-of-c%E2%80%99s-grammar-revisited/ or https://en.wikipedia.org/wiki/The_lexer_hack), to distinguish typedef names and variable/function names etc. To enable declaring variables of the same name as type declared earlier (example from first link):
typedef int AA;
void foo() {
AA aa; /* OK - define variable aa of type AA */
float AA; /* OK - define variable AA of type float */
}
we have to introduce some new productions, where variable/function name could be either IDENTIFIER
or TYPENAME
. And this is the moment where difficulties occur - conflicts in grammar.
I was trying not to use this messy Yacc grammar for gcc 3.4 (http://yaxx.googlecode.com/svn-history/r2/trunk/gcc-3.4.0/gcc/c-parse.y), but this time I have no idea how to resolve conflicts on my own. I took a look at Yacc grammar:
declarator:
after_type_declarator
| notype_declarator
;
after_type_declarator:
...
| TYPENAME
;
notype_declarator:
...
| IDENTIFIER
;
fndef:
declspecs_ts setspecs declarator
// some action code
// the rest of production
...
setspecs: /* empty */
// some action code
declspecs_ts
means declaration_specifiers where
"Whether a type specifier has been seen; after a type specifier, a typedef name is an identifier to redeclare (_ts or _nots)."
From declspecs_ts we can reach
typespec_nonreserved_nonattr:
TYPENAME
...
;
At the first glance I can't believe how shift/reduce conflicts does not appear!
setspecs
is empty, so we have declspecs_ts
followed by declarator
, so that we can expect that parser should be confused whether TYPENAME
is from declspecs_ts
or from declarator
.
Can anyone explain this briefly (or even precisely). Thanks in advance!
EDIT: Useful link: http://www.gnu.org/software/bison/manual/bison.html#Semantic-Tokens
setspec
definition in link above code snippet. I looked a bit deeper in this code.declspecs_ts
contains EXACTLY ONETYPENAME
and often some other specifiers (qualifiers likeINLINE
etc.). There are also other variants likedeclspecs_nots
and we have to combine differentdeclspecs
withafter_type_declarator
andnotype_declaraor
to support all possible combinations. This is even more complicated (because of attribues), nevertheless it is organized/split smart enough to prevent conflicts. – Tachometer