Antlr4 how to build a grammar allowed keywords as identifier
Asked Answered
J

1

7

This is a demo code

label:
var id
let id = 10
goto label

If allowed keyword as identifier will be

let:
var var
let var = 10
goto let

This is totally legal code. But it seems very hard to do this in antlr.

AFAIK, If antlr match a token let, will never fallback to id token. so for antlr it will see

LET_TOKEN :
VAR_TOKEN <missing ID_TOKEN>VAR_TOKEN
LET_TOKEN <missing ID_TOKEN>VAR_TOKEN = 10

although antlr allowed predicate, I have to control ever token match and problematic. grammar become this

grammar Demo;
options {
  language = Go;
}
@parser::members{
    var _need = map[string]bool{}
    func skip(name string,v bool){
        _need[name] = !v
        fmt.Println("SKIP",name,v)
    }
    func need(name string)bool{
        fmt.Println("NEED",name,_need[name])
        return _need[name]
    }
}

proj@init{skip("inst",false)}: (line? NL)* EOF;
line
    : VAR ID
    | LET ID EQ? Integer
    ;

NL: '\n';
VAR: {need("inst")}? 'var' {skip("inst",true)};
LET: {need("inst")}? 'let' {skip("inst",true)};
EQ: '=';

ID: ([a-zA-Z] [a-zA-Z0-9]*);
Integer: [0-9]+;

WS: [ \t] -> skip;

Looks so terrible.

But this is easy in peg, test this in pegjs

Expression = (Line? _ '\n')* ;

Line
  = 'var' _ ID
  / 'let' _ ID _ "=" _ Integer

Integer "integer"
  = [0-9]+ { return parseInt(text(), 10); }

ID = [a-zA-Z] [a-zA-Z0-9]*

_ "whitespace"
  = [ \t]*

I actually done this in peggo and javacc.

My question is how to handle these grammars in antlr4.6, I was so excited about the antlr4.6 go target, but seems I choose the wrong tool for my grammar ?

Jemine answered 2/1, 2017 at 5:13 Comment(0)
L
9

The simplest way is to define a parser rule for identifiers:

id: ID | VAR | LET;

VAR: 'var';
LET: 'let';
ID: [a-zA-Z] [a-zA-Z0-9]*;

And then use id instead of ID in your parser rules.

A different way is to use ID for identifiers and keywords, and use predicates for disambiguation. But it's less readable, so I'd use the first way instead.

Lello answered 2/1, 2017 at 13:42 Comment(3)
Unfortunately in this situation, when an error message is generated by antlr4, it tells the user that either ID, var or let is expected, which is confusing. Do you know if there a way to get antlr4 to tell the user that only ID is expected?Douma
@Douma good point. I never really tried to customize error messages, but I suppose you could supply your own IParserErrorListener to tweak the generated errors. I don't know if there's a better way.Lello
Extending DefaultErrorStrategy lets me customize the error message.Douma

© 2022 - 2024 — McMap. All rights reserved.