I'm developing a small IDE for some language using ANTLR4 and need to underline erroneous characters when the lexer fails to match them. The built in org.antlr.v4.runtime.ANTLRErrorListener
implementation outputs a message to stderr in such cases, similar to this:
line 35:25 token recognition error at: 'foo\n'
I have no problem understanding how information about line and column of the error is obtained (passed as arguments to syntaxError
callback), but how do I get the 'foo\n'
string inside the callback?
When a parser is the source of the error, it passes the offending token as the second argument of syntaxError
callback, so it becomes trivial to extract information about the start and stop offsets of the erroneous input and this is also explained in the reference book. But what about the case when the source is a lexer? The second argument in the callback is null in this case, presumably since the lexer failed to form a token.
I need the length of unmatched characters to know how much to underline, but while debugging my listener implementation I could not find this information anywhere in the supplied callback arguments (other than extracting it from the supplied error message though string manipulation, which would be just wrong). The 'foo\n'
string may clearly be obtained somehow, so what am I missing?
I suspect that I might be looking in the wrong place and that I should be looking at extending DefaultErrorStrategy where error messages get formed.