AnTLR4 strange behavior in precedence

Asked 31/8, 2015 at 21:32 Answered 2/9, 2015 at 2:26

I have a very simple test grammar as following:

grammar Test;

statement: expression EOF;

expression
    :   Identifier
        |   expression binary_op expression
        |   expression assignment_operator expression
        |   expression '.' Identifier 
    ;

binary_op: '+';
assignment_operator : '='  ;

Identifier : [a-zA-Z]+ ;
WS : [ \n\r\t]+ -> channel(HIDDEN) ;

With this version of the grammar I got the expected behavior if I write the following code:

b.x + b.y

I get a tree as (+ (. b x) (. b y))

However, if I replace expression binary_op expression by expression '+' expression I get a very different tree: (. (+ (. b x) b) y)

Is there any explanation for this?

Thanks

Ollieollis answered 31/8, 2015 at 21:32 Comment(0)

You have to set the precendence using something like this:

expr  : expr2 (assignment_operator expr3)?  # Equals
expr2 : expr1 (binary_op expr2)?            # Add
expr1 : Identifier | 
        expr1 . Identifier
      ;

This removes all ambiguity on operator precendence.

Upwards answered 2/9, 2015 at 2:26 Comment(0)

Literals in the parser can confuse matters. Check and fix the errors/warnings reported in generating the parser. Likely need to move the literals from parser rules to lexer rules.

You can verify that the lexer is operating as intended by dumping the token stream. That will provide a clear basis for understanding the path that the parser is taking.

Update

Neither of the parse tree representations you list look proper for an Antlr4 parse tree. Nonetheless, tried both variants of your grammar and I consistently get:

Token dump:

Identifier: [@0,0:0='b',<4>,1:0]
Dot: [@1,1:1='.',<3>,1:1]
Identifier: [@2,2:2='x',<4>,1:2]
null: [@4,4:4='+',<1>,1:4]
Identifier: [@6,6:6='b',<4>,1:6]
Dot: [@7,7:7='.',<3>,1:7]
Identifier: [@8,8:8='y',<4>,1:8]

Tree dump:

(statement (expression (expression (expression (expression b) . x) + (expression b)) . y) <EOF>)

using

ParseTree tree = parser.statement();
System.out.print(tree.toStringTree(parser));

The nulls in this particular token dump are because the symbols are first defined in the parser.

Disloyal answered 1/9, 2015 at 2:17 Comment(1)

Doesn't make difference moving the literals from parser rules to lexer rules. I tested. :) – Ollieollis 1/9, 2015 at 13:6

Recommended topics

Hot tags