Antlr parser operator priority
Asked Answered
F

1

11

Consider the following grammar. I have issues with the operator priority, for instance: res=2*a+b has a similar parse tree as res=2*(a+b). I know where the problem is, but no "beautiful" solution without mutual left recursion comes to my mind. Can you please help me out a little? The grammar is used with a custom visitor.

grammar Math;

expression: expression add=('+'|'-') expression # expressionAddExpression
            | expression mult='*' expression    # expressionMultExpression
            |'(' expression ')'  # bracketExpression
            | number                            # numberExpression
            ;
    number: INT                                                                 #int
            | '(' number ')'                                                    #bracketNumber
            | VARIABLE                                                          #var

            ;
    VARIABLE: [A-Za-z][A-Za-z0-9]*;



INT: [0-9]+;
Foliated answered 31/10, 2017 at 9:10 Comment(0)
S
23

From The Definitive ANTLR 4 Reference, 5.4 Dealing with Precedence, Left Recursion, and Associativity :

expr : expr '*' expr // match subexpressions joined with '*' operator
     | expr '+' expr // match subexpressions joined with '+' operator
     | INT // matches simple integer atom
     ;

The problem is that this rule is ambiguous for some input phrases. ...

This is a question of operator precedence, and conventional grammars simply have no way to specify precedence. Most grammar tools, such as Bison, use extra notation to specify the operator precedence.

Instead, ANTLR resolves ambiguities in favor of the alternative given first, implicitly allowing us to specify operator precedence.

So simply put multiplication before addition.

File Question.g4 :

grammar Question;

question
@init {System.out.println("Question last update 1213");}
    :   line+ EOF
    ;

line
    :   expression NL
        {System.out.println("Expression found : " + $expression.text); }
    ;

expression
    :   expression mult='*' expression          # expressionMultExpression
    |   expression add=( '+' | '-' ) expression # expressionAddExpression
    |   VARIABLE '=' expression                 # expressionAssign
    |   '(' expression ')'                      # parenthesisedExpression
    |   atom                                    # atomExpression
    ;

atom
    :   INT                                     #int
    |   VARIABLE                                #var
    ;

VARIABLE : LETTER ( LETTER | DIGIT )*;
INT      : DIGIT+;

NL      : [\r\n] ;
WS      : [ \t] -> channel(HIDDEN) ; // -> skip ;

fragment LETTER : [a-zA-Z] ;
fragment DIGIT  : [0-9] ;

File input.txt :

res = 2 * a + b
res = 2 * ( a + b )

Execution :

$ grun Question question -tokens -diagnostics input.txt 
[@0,0:2='res',<VARIABLE>,1:0]
[@1,3:3=' ',<WS>,channel=1,1:3]
[@2,4:4='=',<'='>,1:4]
[@3,5:5=' ',<WS>,channel=1,1:5]
[@4,6:6='2',<INT>,1:6]
[@5,7:7=' ',<WS>,channel=1,1:7]
[@6,8:8='*',<'*'>,1:8]
[@7,9:9=' ',<WS>,channel=1,1:9]
[@8,10:10='a',<VARIABLE>,1:10]
...
[@32,36:35='<EOF>',<EOF>,3:0]
Question last update 1213
Expression found : res = 2 * a + b
Expression found : res = 2 * ( a + b )

and

$ grun Question question -gui input.txt

enter image description here

Spongin answered 31/10, 2017 at 11:20 Comment(3)
Thank you very much. I was familiar with this section of the antlr reference but it did not work as expected. Due to a broken build script, changing the order of the rules did not make any difference, now its working flawlesslyFoliated
I see you placed fragments at the end. Since they're not tokens, I'm assuming it doesn't matter where they appear in the lexer; is that correct?Augite
@Augite Fragments are helper rules for the lexer. If you have several rules using [0-9], it's easier, and more readable, to say DIGIT. It's convenient to group fragments at the end of the grammar, they are easier to find, like footnotes at the bottom of a page. However you can write them anywhere in the grammar.Spongin

© 2022 - 2024 — McMap. All rights reserved.