ANTLR: Syntax Errors are ignored when running parser programmatically
Asked Answered
I

1

6

I am currently creating a more or less simple expression evaluator using ANTLR.

My grammar is straightforward (at least i hope so) and looks like this:

grammar SXLGrammar;

options {
  language = Java;
  output   = AST;
}

tokens {
  OR  = 'OR';
  AND = 'AND';
  NOT = 'NOT';
  GT  = '>'; //greater then
  GE  = '>='; //greater then or equal
  LT  = '<'; //lower then
  LE  = '<='; //lower then or equal
  EQ  = '=';
  NEQ = '!='; //Not equal
  PLUS = '+';
  MINUS = '-';
  MULTIPLY = '*';
  DIVISION = '/';
  CALL;
}

@header {
package somepackage;
}

@members {

}


@lexer::header {
package rise.spics.sxl;
}

rule
  :  ('='|':')! expression 
  ;

expression
    : booleanOrExpression
    ;

booleanOrExpression
    :
    booleanAndExpression ('OR'^ booleanAndExpression)*
    ;

booleanAndExpression
  :
  booleanNotExpression ('AND'^ booleanNotExpression)*
  ;

booleanNotExpression
  :
  ('NOT'^)? booleanAtom
  ;

booleanAtom
  :
  | compareExpression
  ;

compareExpression
    :
    commonExpression (('<' | '>' | '=' | '<=' | '>=' | '!=' )^ commonExpression)?
    ;

commonExpression
  :
  multExpr
  (
    (
      '+'^
      | '-'^
    )
    multExpr
  )*
  | DATE
  ;

multExpr
  :
  atom (('*'|'/')^ atom)*
  | '-'^ atom
  ;

atom
  :
  INTEGER
  | DECIMAL
  | BOOLEAN
  | ID
  | '(' expression ')' -> expression
  | functionCall
  ;

functionCall
  :
  ID '(' arguments ')' -> ^(CALL ID arguments?)
  ;

arguments
  :
  (expression) (','! expression)*
  |  WS
  ;

BOOLEAN
  :
  'true'
  | 'false'
  ;

ID
  :
  (
    'a'..'z'
    | 'A'..'Z'
  )+
  ;

INTEGER
  :
  ('0'..'9')+ 
  ;

DECIMAL
  :
  ('0'..'9')+ ('.' ('0'..'9')*)?
  ;

DATE
  :
  '!' '0'..'9' '0'..'9' '0'..'9' '0'..'9' '-' '0'..'9' '0'..'9' '-' '0'..'9' '0'..'9' (' ' '0'..'9' '0'..'9' ':''0'..'9' '0'..'9' (':''0'..'9' '0'..'9')?)?
  ;

WS
  :  (' '|'\t' | '\n' | '\r' | '\f')+ { $channel = HIDDEN; };

Now if i try to parse an invalid Expression like "= true NOT true", the graphical test-tool of the eclipse plugin throws an NoViableAltException: line 1:6 no viable alternative at input 'NOT', which is correct and supposed.

Now if i try to parse the expression in a Java Program, nothing happens. The Program

    String expression = "=true NOT false";

    CharStream input = new ANTLRStringStream(expression);
    SXLGrammarLexer lexer = new SXLGrammarLexer(input);
    TokenStream tokenStream = new CommonTokenStream(lexer);
    SXLGrammarParser parser = new SXLGrammarParser(tokenStream);
    CommonTree tree = (CommonTree) parser.rule().getTree();
    System.out.println(tree.toStringTree());
    System.out.println(parser.getNumberOfSyntaxErrors());

would output:

true
0

that means, the AST created by the parser exists of one node and ignores the rest. I'd like to handle syntax errors in my application, but its not possible if the generated parser doesn't find any error.

I also tried to alter the parser by overwriting the displayRecognitionError() method with something like this:

public void displayRecognitionError(String[] tokenNames,
                                    RecognitionException e) {
    String msg = getErrorMessage(e, tokenNames);
    throw new RuntimeException("Error at position "+e.index+" " + msg);
} 

but displayRecognitionError gets never called.

If i try something like "=1+", a error gets displayed. I guess theres something wrong with my grammar, but why does the eclipse plugin throw that error while the generated parser does not?

Ichthyornis answered 13/7, 2011 at 11:39 Comment(0)
C
4

If you want rule to consume the entire token-stream, you have to specify where you expect the end of your input. Like this:

rule
  :  ('='|':')! expression EOF
  ;

Without the EOF your parser reads the true as boolean an ignores the rest.

Chuck answered 13/7, 2011 at 11:51 Comment(3)
Thanks, that was it. Wonder why i don't find this information in "the definitive ANTLR reference"...Ichthyornis
@huzi, err, the EOF token is explained in the definitive ANTLR reference many times. If you don't specify that your tokens should be consumed all the way to EOF, the parser will simply try to consume a single expression (with a = or : in front of it), which your input "=true" is. After you invoke your rule() method on your parser, you could invoke another method to consume the rest of the tokens (the input: " NOT false"): that is why you don't see an error (what your Eclipse plugin does is not relevant, since it is simply a tool around ANTLR, it is not ANTLR itself).Morez
@Arne, I edited your answer slightly by removing your modest "I think" :) and rephrased it slightly. Feel free to revert back if you're not happy with it, I won't meddle with it then.Morez

© 2022 - 2024 — McMap. All rights reserved.