Antlr4 discards remaining tokens instead of bailing out

I am using Antlr4, and here is a simplified grammar I wrote:

grammar BooleanExpression;

/*******************************
 *      Parser Rules
 *******************************/
booleanTerm
    : booleanLiteral (KW_OR booleanLiteral)+
    | booleanLiteral
    ;

id
    : IDENTIFIER
    ;

booleanLiteral
    : KW_TRUE
    | KW_FALSE
    ;

/*******************************
 *         Lexer Rules
 *******************************/
KW_TRUE
    : 'true'
    ;

KW_FALSE
    : 'false'
    ;

KW_OR
    : 'or'
    ;   

IDENTIFIER
    : (SIMPLE_LATIN)+
    ;

fragment 
SIMPLE_LATIN
    : 'A' .. 'Z'
    | 'a' .. 'z'
    ;

WHITESPACE
    : [ \t\n\r]+ -> skip
    ;

I used a BailErrorStategy and BailLexer like below:

public class BailErrorStrategy extends DefaultErrorStrategy {
    /**
     * Instead of recovering from exception e, rethrow it wrapped in a generic
     * IllegalArgumentException so it is not caught by the rule function catches.
     * Exception e is the "cause" of the IllegalArgumentException.
     */

    @Override
    public void recover(Parser recognizer, RecognitionException e) {
        throw new IllegalArgumentException(e);
    }

    /**
     * Make sure we don't attempt to recover inline; if the parser successfully
     * recovers, it won't throw an exception.
     */
    @Override
    public Token recoverInline(Parser recognizer) throws RecognitionException {
        throw new IllegalArgumentException(new InputMismatchException(recognizer));
    }

    /** Make sure we don't attempt to recover from problems in subrules. */
    @Override
    public void sync(Parser recognizer) {
    }

    @Override
    protected Token getMissingSymbol(Parser recognizer) {
        throw new IllegalArgumentException(new InputMismatchException(recognizer));
    }
}



 public class BailLexer extends BooleanExpressionLexer {
    public BailLexer(CharStream input) {
        super(input);
        //removeErrorListeners();
        //addErrorListener(new ConsoleErrorListener());
    }

    @Override
    public void recover(LexerNoViableAltException e) {
        throw new IllegalArgumentException(e); // Bail out
    }

    @Override
    public void recover(RecognitionException re) {
        throw new IllegalArgumentException(re); // Bail out
    }
}

Everything works okay except one case. I tried the following expression:

true OR false

I expect this expression to be rejected and an IllegalArgumentException is thrown because the 'or' token should be lower case instead of upper case. But it turned out Antlr4 didn't reject this expression and the expression is tokenized into "KW_TRUE IDENTIFIER KW_FALSE" (which is expected, upper case 'OR' will be considered as an IDENTIFIER), but the parser didn't throw an error during processing this token stream and parsed it into a tree containing only "true" and discarded the remaining "IDENTIFIER KW_FALSE" tokens. I tried different prediction modes but all of them worked like above. I have no idea why it works like this and did some debugging, and it eventually led to to this piece of code in Antlr:

ATNConfigSet reach = computeReachSet(previous, t, false);

if ( reach==null ) {
    // if any configs in previous dipped into outer context, that
    // means that input up to t actually finished entry rule
    // at least for SLL decision. Full LL doesn't dip into outer
    // so don't need special case.
    // We will get an error no matter what so delay until after
    // decision; better error message. Also, no reachable target
    // ATN states in SLL implies LL will also get nowhere.
    // If conflict in states that dip out, choose min since we
    // will get error no matter what.
    int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);
    if ( alt!=ATN.INVALID_ALT_NUMBER ) {
        // return w/o altering DFA
        return alt;
    }
    throw noViableAlt(input, outerContext, previous, startIndex);
}

The code "int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);" returned the second alternative in booleanTerm (because "true" matches the second alternative "booleanLiteral") but since it is not equal to ATN.INVALID_ALT_NUMBER, noViableAlt is not thrown immediately. According to the Java comments there, "We will get an error no matter what, so delay until after decision" but it seems no error was thrown eventually.

I really have no idea how to make Antlr reports an error in this case, could some one shed me some light on this? Any help is appreciated, thanks.

Recommended topics

Hot tags