"FOLLOW_set_in_"... is undefined in generated parser

Asked 18/11, 2013 at 20:20 Answered 16/9, 2014 at 16:4

Solved c antlr grammar antlr3 antlrworks

I have written a grammar for vaguely Java-like DSL. While there are still some issues with it (it doesn't recognize all the inputs as I would want it to), what concerns me most is that the generated C code is not compilable.

I use AntlrWorks 1.5 with Antlr 3.5 (Antlr 4 apparently does not support C target).

The problem is with expression rules. I have rules prio14Expression to prio0Expression which handle operator precedence. To problem is at priority 2, which evaluates prefix and postfix operators:

...
prio3Expression: prio2Expression (('*' | '/' | '%') prio2Expression)*;

prio2Expression: ('++' | '--' | '!' | '+' | '-')* prio1Expression ('++' | '--')*;  

prio1Expression:
    prio0Expression (
        ('.' prio0Expression) |
        ('(' (expression (',' expression)*)? ')') |
        ('[' expression (',' expression)* ']')
    )*;

prio0Expression: 
    /*('(') => */('(' expression ')') |
    IDENTIFIER |
    //collectionLiteral |
    coordinateLiteral |
    'true' |
    'false' |
    NUMBER |
    STRING 
    ;
...

Expression is a label for prio14Expression. You can see the full grammar here.

The code generation itself is successful (without any errors or serious warnings). It generates following code:

CONSTRUCTEX();
EXCEPTION->type         = ANTLR3_MISMATCHED_SET_EXCEPTION;
EXCEPTION->name         = (void *)ANTLR3_MISMATCHED_SET_NAME;
EXCEPTION->expectingSet = &FOLLOW_set_in_prio2Expression962;

RECOVERFROMMISMATCHEDSET(&FOLLOW_set_in_prio2Expression962);
goto ruleprio2ExpressionEx;

Which does not build with error "Error 5 error C2065: 'FOLLOW_set_in_prio2Expression962' : undeclared identifier".

Did I do something wrong in the grammar? No other rules cause this error and if I somewhat reformulate the rule concerned, the generated code is valid (but then the grammar doesn't do what I want it to). What can I do to fix this issue?

Thanks for any help.

Elisabetta answered 18/11, 2013 at 20:20 Comment(7)

To me it looks like generation problem. There are several problems in the C target that can lead to compiler errors. Try reformulating your rule, like extracting the operators in an own rule and use explicit tokens for the string literals (i.e. use token definitions instead specifying tokens ad hoc like '++'). It also makes it simpler to parse the resulting AST (if needed). – Neusatz 19/11, 2013 at 8:22

@MikeLischke I have tried many different variants, but I can't find a rule which would maintain functionality and compile. – Lesialesion 22/11, 2013 at 10:39

Maybe off-topic, but did you try to generate C++ source from this grammar? (Yo need last ANTLR3 git checkout for this). C++ target is quite mature now, although it still does not support AST generation – Quarry 29/11, 2013 at 10:12

@Ivan I did try C++ target (although not with last GIT version) and it did have some issues as well (I don't remember what were they exactly, I stopped investigating them when I found out AST is not supported). – Lesialesion 29/11, 2013 at 10:25

Strange it works for me. Look at this link: github.com/ibre5041/antlr3/tree/t101/runtime/Cpp/tests . Look at test101. It compiles, but fails to parse your input file. BTW you can create AST even without AST support. Either by using rule actions, or by using rule return value. – Quarry 29/11, 2013 at 11:35

@Ivan So if I understand it correctly, the C++ target supports tree grammars but not AST construction using the tree construction operators, right? How complicated is manual AST construction? I have just completed adding tree operators to the grammar (I have removed the broken rule for now) :( It would be unfortunate, if I had to replace that with hundreds of lines of repetitive hand written code (or execute the compilation actions straight from the parser grammar). – Lesialesion 29/11, 2013 at 13:6

@Matěj Zábský - manual AST construction is not as easy as output=AST. It requires some initial setup and the code is not so brief. On the other hand, you can can subclass Token, Lexer and Parser classes and then you can store some additional information directly in these instances. For example in case of the "identifier" Token - you distinguish whether identifier is being declared/written/read. If AST construction already works for you, leave it as it is. But you should really use the latest git trunk - there was Mike's pull request applied into it. – Quarry 29/11, 2013 at 13:59

I encountered same problem.

I think it happens if parser rule has a part of simple OR-ed token like this:

problem_case: problematic_rule;
problematic_rule: 'A' | 'B' ;

This doesn't happen if it is lexer rule.

workaround1: As_lexer_rule;
As_lexer_rule: 'A' | 'B' ;

Or, if it is complicated rule (not simple OR-ed token).

workaround2: make_it_complicated_needlessly;
make_it_complicated_needlessly: 'A' | 'B' | {false}? NeverUsedRule;
NeverUsedRule: /* don't care*/ ;

( I used semantic predicate "{false}?" for this modification. I believe it doesn't change the grammar of target language.)

Traveller answered 10/1, 2014 at 8:59 Comment(0)

it seems to be an old post, but yet, maybe it's still useful for someone (as it was for me).

I encountered the same problem with the C runtime of antlr 3.5.

another easy workaround, that does not change the grammar:

problem_case: problematic_rule;
problematic_rule: a='A' | b='B' ;

Battiste answered 16/9, 2014 at 16:4 Comment(0)

Recommended topics

Hot tags