ANTLR 4 lexer tokens inside other tokens
Asked Answered
C

1

9

I have the following grammar for ANTLR 4:

grammar Pattern;

//parser rules
parse   : string LBRACK CHAR DASH CHAR RBRACK ;
string  : (CHAR | DASH)+ ;

//lexer rules
DASH    : '-' ;
LBRACK  : '[' ;
RBRACK  : ']' ;
CHAR    : [A-Za-z0-9] ;

And I'm trying to parse the following string

ab-cd[0-9]

The code parses out the ab-cd on the left which will be treated as a literal string in my application. It then parses out [0-9] as a character set which in this case will translate to any digit. My grammar works for me except I don't like to have (CHAR | DASH)+ as a parser rule when it's simply being treated as a token. I would rather the lexer create a STRING token and give me the following tokens:

"ab-cd" "[" "0" "-" "9" "]"

instead of these

"ab" "-" "cd" "[" "0" "-" "9" "]"

I have looked at other examples, but haven't been able to figure it out. Usually other examples have quotes around such string literals or they have whitespace to help delimit the input. I'd like to avoid both. Can this be accomplished with lexer rules or do I need to continue to handle it in the parser rules like I'm doing?

Chamois answered 10/5, 2013 at 15:39 Comment(0)
C
8

In ANTLR 4, you can use lexer modes for this.

STRING : [a-z-]+;
LBRACK : '[' -> pushMode(CharSet);

mode CharSet;

DASH : '-';
NUMBER : [0-9]+;
RBRACK : ']' -> popMode;

After parsing a [ character, the lexer will operate in mode CharSet until a ] character is reached and the popMode command is executed.

Cage answered 10/5, 2013 at 16:26 Comment(3)
Thanks for this insight. Setting up subtokenizers like this sounds like the perfect solution. I am getting an error though stating lexical modes are only allowed in lexer grammars. I can declare my grammar as lexer grammar IdPattern;, but then I can't use parser rules. What am I missing?Chamois
You'll need to use a lexer grammar for your lexer, and a separate parser grammar (in a separate file) for your parser.Cage
Here is a link that can help others: meri-stuff.blogspot.co.za/2011/09/…Tybie

© 2022 - 2024 — McMap. All rights reserved.