I am learning ANTLR4 and was trying to play with lexical modes. How can I have the same token appear in multiple lexical modes? As a very simple example, let's say my grammar has two modes, and I want to match white space and end-of-lines in both of them how can I do it without ending with WS_MODE1 and WS_MODE2 for example. Is there a way to reuse the same definition in both cases? I am hoping to get WS tokens in the output stream for all white space irrespective of the mode. The same applies to EOL and other keywords that can appear in both modes.
How to define tokens that can appear in multiple lexical modes in ANTLR4?
The rules have to have different names, but you can use the -> type(...)
lexer command to give them the same type.
WS : [ \t]+;
mode Mode1;
Mode1_WS : WS -> type(WS);
mode Mode2;
Mode2_WS : WS -> type(WS);
Even though Mode1_WS
and Mode2_WS
are not fragment
rules, the code generator will see the type
command and know that you reassigned their types, so it will not define tokens for them.
Short question about the usage of these Lexer rules: in the parser rules do you refer to WS or Mode1_WS, Mode2_WS ? I tried both but it seems you only define the Lexer rules without refering to them directly in the parser rules. In that senes it's rather an 'import statement' than an 'alias'. –
Aberdare
The
type
command explicitly assigns the token type, which is the type the parser will see. In this case, WS
would be used to reference tokens created by any of these 3 rules. –
Hierolatry @SamHarwell what terminates the final mode spec? I noticed some lexer docs have fragment defs following the final mode spec where fragment usage shows the fragments are available to all modes including the default. –
Sadomasochism
Tokens that can be matched in all modes would be a very welcome lexer grammar feature. I find myself "aliasing" tokens in the absence of such a feature. –
Cardinal
© 2022 - 2024 — McMap. All rights reserved.