Is it possible to throw an exception if the input isn't valid?
Asked Answered
B

3

16

I have a simple ANLTR grammar and accompanying Visitor. Everything works great, unless the input is invalid. If the input is invalid, the errors get swallowed and my calculator comes out with the wrong output.

I've tried implementing an error listener, over riding the Recover method of the lexer, and.. well... half a dozen other things today. Can someone show me how to simply throw an error instead of swallowing bad "tokens"? (I use quotes because they're not tokens at all. The characters are undefined in my grammar.)

Valid Input:

1 + 2 * 3 - 4

Invalid Input:

1 + 2 + 3(4)

I want to throw an ArgumentException if the parser/lexer comes across parenthesis (or any other undefined character). Currently, the invalid characters seem to just disappear into the ether and the parser just plods along like nothing is wrong.

If I run it in the console with the grun command, I get the following output, so it recognizes the invalid tokens on some level.

line 1:9 token recognition error at: '('

line 1:11 token recognition error at: ')'

and this resulting parse tree.

enter image description here

BasicMath.g4

grammar BasicMath;

/*
 * Parser Rules
 */

compileUnit : expression+ EOF;

expression :
    expression MULTIPLY expression #Multiplication
    | expression DIVIDE expression #Division
    | expression ADD expression #Addition
    | expression SUBTRACT expression #Subtraction
    | NUMBER #Number
    ; 

/*
 * Lexer Rules
 */

NUMBER : INT; //Leave room to extend what kind of math we can do.

INT : ('0'..'9')+;
MULTIPLY : '*';
DIVIDE : '/';
SUBTRACT : '-';
ADD : '+';

WS : [ \t\r\n] -> channel(HIDDEN);

Calculator:

public static class Calculator
{
    public static int Evaluate(string expression)
    {
        var lexer = new BasicMathLexer(new AntlrInputStream(expression));
        var tokens = new CommonTokenStream(lexer);
        var parser = new BasicMathParser(tokens);
        
        var tree = parser.compileUnit();

        var visitor = new IntegerMathVisitor();

        return visitor.Visit(tree);
    }
}
Briannebriano answered 23/4, 2015 at 21:17 Comment(3)
Have a look at this answer from the Antlr4cs author: https://mcmap.net/q/749991/-how-to-collect-errors-during-run-time-given-by-a-parser-in-antlr4Pomeroy
Yup. Tried that @Alex. I inherited from the BaseErrorListener and attached it to my parser, but none of those methods ever get called.Briannebriano
Note to self, over riding something in here might help. It seems great lengths are gone to to ensure parsing completes when I need it to stop. github.com/antlr/antlr4/blob/master/runtime/Java/src/org/antlr/…Briannebriano
B
8

@CoronA was right. The error happens in the lexer.. So, while I still think that creating an ErrorStrategy would be better, this is what actually worked for me and my goal of throwing an exception for undefined input.

First, I created a derived class that inherits from BaseErrorListener and implements IAntlrErrorListener<T>. The second part was my problem all along it seems. Because my visitor inherited from FooBarBaseVistor<int>, my error listener also needed to be of type to register it with my lexer.

class ThrowExceptionErrorListener : BaseErrorListener, IAntlrErrorListener<int>
{
    //BaseErrorListener implementation; not called in my test, but left it just in case

    public override void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
    {
        throw new ArgumentException("Invalid Expression: {0}", msg, e);
    }

    //IAntlrErrorListener<int> implementation; this one actually gets called.

    public void SyntaxError(IRecognizer recognizer, int offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
    {
        throw new ArgumentException("Invalid Expression: {0}", msg, e);
    }
}

And changed my Calculator class to attach my custom error listener to the lexer. Note that you don't have to remove the ConsoleListener like I did for the error to actually be thrown. Since I'm not really using it, I figured it best to go ahead and do so.

public static class Calculator
{
    public static int Evaluate(string expression)
    {
        var lexer = new BasicMathLexer(new AntlrInputStream(expression));
        lexer.RemoveErrorListeners(); //removes the default console listener
        lexer.AddErrorListener(new ThrowExceptionErrorListener());

        var tokens = new CommonTokenStream(lexer);
        var parser = new BasicMathParser(tokens);

        var tree = parser.compileUnit();

        var visitor = new IntegerMathVisitor();

        return visitor.Visit(tree);
    }
}

And that's it. An argument exception is thrown and this test now passes.

    [TestMethod]
    [ExpectedException(typeof(ArgumentException))]
    public void BadInput()
    {
        var expr = "1 + 5 + 2(3)";
        int value = Calculator.Evaluate(expr);
    }

One last note. If you throw a RecognitionException here, it will just get swallowed up again. ParseCancelationException is recommended, because it does not derive from RecognitionException, but I choose an ArgumentException because I felt that made the most sense to the client C# code.

Briannebriano answered 24/4, 2015 at 19:50 Comment(0)
S
11

Actually each error message is caused by an exception. This exception is caught and the parser tries to recover. The parse tree is the result of the recovering.

Since the error occurs in the lexer (the lexer just does not know the characters ( or )), the error handling must be attached to the lexer. In Java this would look like:

    lexer.addErrorListener(new BaseErrorListener()  {
        @Override
        public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
            throw new RuntimeException(e);
        }
    });

The C# Syntax should not be far from that. Yet I recommend not to throw an exception. Better collect the errors into a list and report them after the lexer finished and do not start parsing if the list of errors is not empty.

Sandblind answered 24/4, 2015 at 2:39 Comment(4)
The BailErrorStrategy fails to raise any exceptions as well. I get the same results with it as I do with the DefaultErrorStrategyBriannebriano
I was mistaken. Actually parser and lexer are strictly separated in ANTLR, so my first solution to use a ErrorStrategy on the parser would not work. Yet attaching a listener to the lexer will do it. I corrected my answer to describe the solutionSandblind
Solved it thanks to your push in the right direction. Thank you very much.Briannebriano
This fixed my issue, which was identical to that stated in the OP. Thanks!Sension
B
8

@CoronA was right. The error happens in the lexer.. So, while I still think that creating an ErrorStrategy would be better, this is what actually worked for me and my goal of throwing an exception for undefined input.

First, I created a derived class that inherits from BaseErrorListener and implements IAntlrErrorListener<T>. The second part was my problem all along it seems. Because my visitor inherited from FooBarBaseVistor<int>, my error listener also needed to be of type to register it with my lexer.

class ThrowExceptionErrorListener : BaseErrorListener, IAntlrErrorListener<int>
{
    //BaseErrorListener implementation; not called in my test, but left it just in case

    public override void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
    {
        throw new ArgumentException("Invalid Expression: {0}", msg, e);
    }

    //IAntlrErrorListener<int> implementation; this one actually gets called.

    public void SyntaxError(IRecognizer recognizer, int offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
    {
        throw new ArgumentException("Invalid Expression: {0}", msg, e);
    }
}

And changed my Calculator class to attach my custom error listener to the lexer. Note that you don't have to remove the ConsoleListener like I did for the error to actually be thrown. Since I'm not really using it, I figured it best to go ahead and do so.

public static class Calculator
{
    public static int Evaluate(string expression)
    {
        var lexer = new BasicMathLexer(new AntlrInputStream(expression));
        lexer.RemoveErrorListeners(); //removes the default console listener
        lexer.AddErrorListener(new ThrowExceptionErrorListener());

        var tokens = new CommonTokenStream(lexer);
        var parser = new BasicMathParser(tokens);

        var tree = parser.compileUnit();

        var visitor = new IntegerMathVisitor();

        return visitor.Visit(tree);
    }
}

And that's it. An argument exception is thrown and this test now passes.

    [TestMethod]
    [ExpectedException(typeof(ArgumentException))]
    public void BadInput()
    {
        var expr = "1 + 5 + 2(3)";
        int value = Calculator.Evaluate(expr);
    }

One last note. If you throw a RecognitionException here, it will just get swallowed up again. ParseCancelationException is recommended, because it does not derive from RecognitionException, but I choose an ArgumentException because I felt that made the most sense to the client C# code.

Briannebriano answered 24/4, 2015 at 19:50 Comment(0)
I
1

While upgrading from ANTLR 4.6 to 4.9.2, we have noticed changed in parser behavior and some text which earlier didn't used to be matched gets matched with no change in grammar.

Some Negative input cases are working with lexer for example

title eq "Employee" 1234

I have overridden syntaxError using

lexer.addErrorListener(new BaseErrorListener()  {
    @Override
    public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
        throw new RuntimeException(e);
    }
});

On debugging found that Lexer is not failing with runtime exception for bad input.

We use Java for this implementation.

Insistent answered 9/8, 2021 at 17:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.