Token Aliases in Antlr
Asked Answered
L

1

9

I have rules that look something like this:

INTEGER           : [0-9]+;
field3     : INTEGER COMMA INTEGER;

In the parsed tree I get an List called INTEGER with two elements.

I would rather find a way for each of the elements to be named.

But if I do this:

INTEGER  : [0-9]+;
DOS      : INTEGER;
UNO      : INTEGER;
field3     : UNO COMMA DOS;

I still get the array of INTEGERs.

Am I doing it right and I just need to dig deeper to figure out what is wrong?

Is there some kind of syntax to alias INTEGER as UNO just for this command (that is actually what I would prefer)?

Lon answered 8/5, 2016 at 23:29 Comment(0)
L
8

Just use labeling to identify the subterms:

field     : a=INTEGER COMMA b=INTEGER;

The FieldContext class will be generated with two additional class fields:

TerminalNode a;
TerminalNode b;

The corresponding INTEGER instances will be assigned to these fields. So, no aliasing is actually required in most cases.

However, there can be valid reasons to change the named type of a token and typically is handled in the lexer through the use of modes, actions, and predicates. For example, using modes, if INTEGER alternates between UNO and DOS types:

lexer grammar UD ;

UNO : INT -> mode(two);

mode two;
    DOS : INT -> mode(default);

fragment INT : [0-9]+ ;

When to do the mode switch and whether a different specific approach might be more appropriate will depend on details not provided yet.

Leninism answered 9/5, 2016 at 3:48 Comment(1)
Great answer. Here is how it could be improved: comment on what makes good names for the aliases.Lon

© 2022 - 2024 — McMap. All rights reserved.