How can I differentiate between reserved words and variables using ANTLR?
Asked Answered
C

1

7

I'm using ANTLR to tokenize a simple grammar, and need to differentiate between an ID:

ID              : LETTER (LETTER | DIGIT)* ;

fragment DIGIT  : '0'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;

and a RESERVED_WORD:

RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ;

Say I run the lexer on the input:

class abc

I receive two ID tokens for "class" and "abc", while I want "class" to be recognized as a RESERVED_WORD. How can I accomplish this?

Competency answered 15/3, 2012 at 19:18 Comment(0)
O
9

Whenever 2 (or more) rules match the same amount of characters, the one defined first will "win". So, if you define RESERVED_WORD before ID, like this:

RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ;

ID              : LETTER (LETTER | DIGIT)* ;

fragment DIGIT  : '0'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;

The input "class" will be tokenized as a RESERVED_WORD.

Note that it doesn't make a lot of sense to create a single token that matches any reserved word: usually it is done like this:

// ...

NULL  : 'null';
TRUE  : 'true';
FALSE : 'false;

// ...

ID              : LETTER (LETTER | DIGIT)* ;

fragment DIGIT  : '0'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;

Now "false" will become a FALSE token, and "falser" an ID.

Overhead answered 15/3, 2012 at 19:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.