Handling errors in ANTLR4
Asked Answered
F

5

97

The default behavior when the parser doesn't know what to do is to print messages to the terminal like:

line 1:23 missing DECIMAL at '}'

This is a good message, but in the wrong place. I'd rather receive this as an exception.

I've tried using the BailErrorStrategy, but this throws a ParseCancellationException without a message (caused by a InputMismatchException, also without a message).

Is there a way I can get it to report errors via exceptions while retaining the useful info in the message?


Here's what I'm really after--I typically use actions in rules to build up an object:

dataspec returns [DataExtractor extractor]
    @init {
        DataExtractorBuilder builder = new DataExtractorBuilder(layout);
    }
    @after {
        $extractor = builder.create();
    }
    : first=expr { builder.addAll($first.values); } (COMMA next=expr { builder.addAll($next.values); })* EOF
    ;

expr returns [List<ValueExtractor> values]
    : a=atom { $values = Arrays.asList($a.val); }
    | fields=fieldrange { $values = values($fields.fields); }
    | '%' { $values = null; }
    | ASTERISK { $values = values(layout); }
    ;

Then when I invoke the parser I do something like this:

public static DataExtractor create(String dataspec) {
    CharStream stream = new ANTLRInputStream(dataspec);
    DataSpecificationLexer lexer = new DataSpecificationLexer(stream);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    DataSpecificationParser parser = new DataSpecificationParser(tokens);

    return parser.dataspec().extractor;
}

All I really want is

  • for the dataspec() call to throw an exception (ideally a checked one) when the input can't be parsed
  • for that exception to have a useful message and provide access to the line number and position where the problem was found

Then I'll let that exception bubble up the callstack to whereever is best suited to present a useful message to the user--the same way I'd handle a dropped network connection, reading a corrupt file, etc.

I did see that actions are now considered "advanced" in ANTLR4, so maybe I'm going about things in a strange way, but I haven't looked into what the "non-advanced" way to do this would be since this way has been working well for our needs.

Finegrained answered 8/8, 2013 at 17:16 Comment(0)
E
116

Since I've had a little bit of a struggle with the two existing answers, I'd like to share the solution I ended up with.

First of all I created my own version of an ErrorListener like Sam Harwell suggested:

public class ThrowingErrorListener extends BaseErrorListener {

   public static final ThrowingErrorListener INSTANCE = new ThrowingErrorListener();

   @Override
   public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e)
      throws ParseCancellationException {
         throw new ParseCancellationException("line " + line + ":" + charPositionInLine + " " + msg);
      }
}

Note the use of a ParseCancellationException instead of a RecognitionException since the DefaultErrorStrategy would catch the latter and it would never reach your own code.

Creating a whole new ErrorStrategy like Brad Mace suggested is not necessary since the DefaultErrorStrategy produces pretty good error messages by default.

I then use the custom ErrorListener in my parsing function:

public static String parse(String text) throws ParseCancellationException {
   MyLexer lexer = new MyLexer(new ANTLRInputStream(text));
   lexer.removeErrorListeners();
   lexer.addErrorListener(ThrowingErrorListener.INSTANCE);

   CommonTokenStream tokens = new CommonTokenStream(lexer);

   MyParser parser = new MyParser(tokens);
   parser.removeErrorListeners();
   parser.addErrorListener(ThrowingErrorListener.INSTANCE);

   ParserRuleContext tree = parser.expr();
   MyParseRules extractor = new MyParseRules();

   return extractor.visit(tree);
}

(For more information on what MyParseRules does, see here.)

This will give you the same error messages as would be printed to the console by default, only in the form of proper exceptions.

Edirne answered 26/10, 2014 at 13:0 Comment(5)
I tried this and I confirm that it worked well. I think this is the easiest of the 3 proposed solutions.Willable
This is the right way to go. Simplest way to go. The "problem" happens in the lexer and it makes sense to report it right then and there if it's important that the input be valid before attempting to parse. ++Knighterrant
Is there a particular reason to use the ThrowingErrorListener class as a Singleton?Kazmirci
@Kazmirci No, this is just an adaptation of Sam Harwells code.Edirne
This solution worked for me with one caveat - we're trying to parse using SLL and then falling back to LL, and it turns out that doing so caused no error to come out when doing the fallback parsing. The workaround was to construct a whole new parser for the second attempt instead of resetting the parser - apparently resetting the parser fails to reset some important state.Nicole
S
55

When you use the DefaultErrorStrategy or the BailErrorStrategy, the ParserRuleContext.exception field is set for any parse tree node in the resulting parse tree where an error occurred. The documentation for this field reads (for people that don't want to click an extra link):

The exception which forced this rule to return. If the rule successfully completed, this is null.

Edit: If you use DefaultErrorStrategy, the parse context exception will not be propagated all the way out to the calling code, so you'll be able to examine the exception field directly. If you use BailErrorStrategy, the ParseCancellationException thrown by it will include a RecognitionException if you call getCause().

if (pce.getCause() instanceof RecognitionException) {
    RecognitionException re = (RecognitionException)pce.getCause();
    ParserRuleContext context = (ParserRuleContext)re.getCtx();
}

Edit 2: Based on your other answer, it appears that you don't actually want an exception, but what you want is a different way to report the errors. In that case, you'll be more interested in the ANTLRErrorListener interface. You want to call parser.removeErrorListeners() to remove the default listener that writes to the console, and then call parser.addErrorListener(listener) for your own special listener. I often use the following listener as a starting point, as it includes the name of the source file with the messages.

public class DescriptiveErrorListener extends BaseErrorListener {
    public static DescriptiveErrorListener INSTANCE = new DescriptiveErrorListener();

    @Override
    public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol,
                            int line, int charPositionInLine,
                            String msg, RecognitionException e)
    {
        if (!REPORT_SYNTAX_ERRORS) {
            return;
        }

        String sourceName = recognizer.getInputStream().getSourceName();
        if (!sourceName.isEmpty()) {
            sourceName = String.format("%s:%d:%d: ", sourceName, line, charPositionInLine);
        }

        System.err.println(sourceName+"line "+line+":"+charPositionInLine+" "+msg);
    }
}

With this class available, you can use the following to use it.

lexer.removeErrorListeners();
lexer.addErrorListener(DescriptiveErrorListener.INSTANCE);
parser.removeErrorListeners();
parser.addErrorListener(DescriptiveErrorListener.INSTANCE);

A much more complicated example of an error listener that I use to identify ambiguities which render a grammar non-SLL is the SummarizingDiagnosticErrorListener class in TestPerformance.

Sequence answered 8/8, 2013 at 22:31 Comment(5)
Ok...how do I make use of that though? Am I supposed to use something like ((InputMismatchException) pce.getCause()).getCtx().exception to get at the useful error message?Finegrained
I experimented a little with throwing the exception from the error listener, but the exception never seems to show up. I just ended up with NPEs from the actions in the grammar due to the failed matches. I've added some backstory to the question since it appears I might be swimming against the current.Finegrained
You should just write a utility class to return the "line", "column", and "message" from a RecognitionException. The information you want is available in the exception that already gets thrown.Sequence
Gentle Reader, if you are like me, you're wondering what REPORT_SYNTAX_ERRORS is all about. Here's the answer: https://mcmap.net/q/219021/-handling-errors-in-antlr-4Delaryd
This example is really useful. I think it should be somewhere in the official documentation, it seems to lack a page for error handling. At least mentioning error listeners would be good.Drusilladrusus
F
10

What I've come up with so far is based on extending DefaultErrorStrategy and overriding it's reportXXX methods (though it's entirely possible I'm making things more complicated than necessary):

public class ExceptionErrorStrategy extends DefaultErrorStrategy {

    @Override
    public void recover(Parser recognizer, RecognitionException e) {
        throw e;
    }

    @Override
    public void reportInputMismatch(Parser recognizer, InputMismatchException e) throws RecognitionException {
        String msg = "mismatched input " + getTokenErrorDisplay(e.getOffendingToken());
        msg += " expecting one of "+e.getExpectedTokens().toString(recognizer.getTokenNames());
        RecognitionException ex = new RecognitionException(msg, recognizer, recognizer.getInputStream(), recognizer.getContext());
        ex.initCause(e);
        throw ex;
    }

    @Override
    public void reportMissingToken(Parser recognizer) {
        beginErrorCondition(recognizer);
        Token t = recognizer.getCurrentToken();
        IntervalSet expecting = getExpectedTokens(recognizer);
        String msg = "missing "+expecting.toString(recognizer.getTokenNames()) + " at " + getTokenErrorDisplay(t);
        throw new RecognitionException(msg, recognizer, recognizer.getInputStream(), recognizer.getContext());
    }
}

This throws exceptions with useful messages, and the line and position of the problem can be gotten from either the offending token, or if that's not set, from the current token by using ((Parser) re.getRecognizer()).getCurrentToken() on the RecognitionException.

I'm fairly happy with how this is working, though having six reportX methods to override makes me think there's a better way.

Finegrained answered 9/8, 2013 at 2:31 Comment(1)
works better for c#, accepted and top-voted answer had compilation errors in c#, some incompatibility of generics argument IToken vs intAplacental
D
1

For anyone interested, here's the ANTLR4 C# equivalent of Sam Harwell's answer:

using System; using System.IO; using Antlr4.Runtime;
public class DescriptiveErrorListener : BaseErrorListener, IAntlrErrorListener<int>
{
  public static DescriptiveErrorListener Instance { get; } = new DescriptiveErrorListener();
  public void SyntaxError(TextWriter output, IRecognizer recognizer, int offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e) {
    if (!REPORT_SYNTAX_ERRORS) return;
    string sourceName = recognizer.InputStream.SourceName;
    // never ""; might be "<unknown>" == IntStreamConstants.UnknownSourceName
    sourceName = $"{sourceName}:{line}:{charPositionInLine}";
    Console.Error.WriteLine($"{sourceName}: line {line}:{charPositionInLine} {msg}");
  }
  public override void SyntaxError(TextWriter output, IRecognizer recognizer, Token offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e) {
    this.SyntaxError(output, recognizer, 0, line, charPositionInLine, msg, e);
  }
  static readonly bool REPORT_SYNTAX_ERRORS = true;
}
lexer.RemoveErrorListeners();
lexer.AddErrorListener(DescriptiveErrorListener.Instance);
parser.RemoveErrorListeners();
parser.AddErrorListener(DescriptiveErrorListener.Instance);
Drusilladrusus answered 10/12, 2020 at 20:58 Comment(0)
H
1

For people who use Python, here is the solution in Python 3 based on Mouagip's answer.

First, define a custom error listener:

from antlr4.error.ErrorListener import ErrorListener
from antlr4.error.Errors import ParseCancellationException

class ThrowingErrorListener(ErrorListener):
    def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):
        ex = ParseCancellationException(f'line {line}: {column} {msg}')
        ex.line = line
        ex.column = column
        raise ex

Then set this to lexer and parser:

lexer = MyScriptLexer(script)
lexer.removeErrorListeners()
lexer.addErrorListener(ThrowingErrorListener())

token_stream = CommonTokenStream(lexer)

parser = MyScriptParser(token_stream)
parser.removeErrorListeners()
parser.addErrorListener(ThrowingErrorListener())

tree = parser.script()
Hadria answered 19/3, 2021 at 16:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.