Understanding the context data structure in Antlr4

I'm trying to write a code translator in Java with the help of Antlr4 and had great success with the grammar part so far. However I'm now banging my head against a wall wrapping my mind around the parse tree data structure that I need to work on after my input has been parsed.

I'm trying to use the visitor template to go over my parse tree. I'll show you an example to illustrate the points of my confusion.

My grammar:

grammar pqlc;

// Lexer

//Schlüsselwörter
EXISTS: 'exists';
REDUCE: 'reduce';
QUERY: 'query';
INT: 'int';
DOUBLE: 'double';
CONST: 'const';
STDVECTOR: 'std::vector';
STDMAP: 'std::map';
STDSET: 'std::set';
C_EXPR: 'c_expr';

INTEGER_LITERAL  : (DIGIT)+ ;
fragment DIGIT: '0'..'9';
DOUBLE_LITERAL : DIGIT '.' DIGIT+;

LPAREN          : '(';
RPAREN          : ')';
LBRACK          : '[';
RBRACK          : ']';
DOT             : '.';
EQUAL           : '==';
LE              : '<=';
GE              : '>=';
GT              : '>';
LT              : '<';
ADD             : '+';
MUL             : '*';
AND             : '&&';
COLON           : ':';

IDENTIFIER    :   JavaLetter JavaLetterOrDigit*;
fragment JavaLetter    :   [a-zA-Z$_]; // these are the "java letters" below 0xFF
fragment JavaLetterOrDigit    :   [a-zA-Z0-9$_]; // these are the "java letters or digits" below 0xFF
WS  
    :  [ \t\r\n\u000C]+ -> skip  
    ;
COMMENT
    :   '/*' .*? '*/' -> skip
    ;

LINE_COMMENT
    :   '//' ~[\r\n]* -> skip
    ;


// Parser

//start_rule: query;

query :
      quant_expr
      | qexpr+
      | IDENTIFIER // order IDENTIFIER and qexpr+?
      | numeral
      | c_expr //TODO

      ;

c_type : INT | DOUBLE | CONST;
bin_op: AND | ADD | MUL | EQUAL | LT | GT | LE| GE;


qexpr:
         LPAREN query RPAREN bin_op_query? 
         // query bin_op query
         | IDENTIFIER  bin_op_query? // copied from query to resolve left recursion problem
         | numeral bin_op_query?  // ^
         | quant_expr bin_op_query? // ^
           |c_expr bin_op_query?
           // query.find(query)
         | IDENTIFIER  find_query? // copied from query to resolve left recursion problem
         | numeral find_query?  // ^
         | quant_expr find_query?
           |c_expr find_query?
           // query[query]
          | IDENTIFIER  array_query? // copied from query to resolve left recursion problem
         | numeral array_query?  // ^
         | quant_expr array_query?
           |c_expr array_query?

     // | qexpr bin_op_query // bad, resolved by quexpr+ in query 
     ;

bin_op_query: bin_op query bin_op_query?; // resolve left recursion of query bin_op query

find_query: '.''find' LPAREN query RPAREN;
array_query: LBRACK query RBRACK;

quant_expr:
    quant id ':' query
          | QUERY LPAREN match RPAREN ':' query
          | REDUCE LPAREN IDENTIFIER RPAREN id ':' query
    ;

match:
         STDVECTOR LBRACK id RBRACK EQUAL cm
     | STDMAP '.''find' LPAREN cm RPAREN EQUAL cm
     | STDSET '.''find' LPAREN cm RPAREN
     ;

cm:
    IDENTIFIER
  | numeral
   | c_expr //TODO
  ;

quant :
          EXISTS;

id :
     c_type IDENTIFIER
     | IDENTIFIER // Nach Seite 2 aber nicht der Übersicht. Laut übersicht id -> aber dann wäre Regel 1 ohne +
   ;

numeral :
            INTEGER_LITERAL
        | DOUBLE_LITERAL
        ;
c_expr:
          C_EXPR
      ;

Now let's parse the following string:

double x: x >= c_expr

Visually I'll get this tree: tree

Let's say my visitor is in the visitQexpr(@NotNull pqlcParser.QexprContext ctx) routine when it hits the branch Qexpr(x bin_op_query).

My question is, how can I tell that the left children ("x" in the tree) is a terminal node, or more specifically an "IDENTIFIER"? There are no visiting rules for Terminal nodes since they aren't rules. ctx.getChild(0) has no RuleIndex. I guess I could use that to check if I'm in a terminal or not, but that still wouldn't tell me if I was in IDENTIFIER or another kind of terminal token. I need to be able to tell the difference somehow.

I had more questions but in the time it took me to write the explanation I forgot them :< Thanks in advance.

You can add labels to tokens and access them/check if they exist in the surrounding context:

id :
     c_type labelA = IDENTIFIER
     | labelB = IDENTIFIER 
   ;

You could also do this to create different visits:

id :
     c_type IDENTIFIER    #idType1 //choose more appropriate names!
     | IDENTIFIER         #idType2
   ;

This will create different visitors for the two alternatives and I suppose (i.e. have not verified) that the visitor for id will not be called.

I prefer the following approach though:

id :
        typeDef
     |  otherId
     ;
typeDef: c_type IDENTIFIER;
otherId : IDENTIFIER ;

This is a more heavily typed system. But you can very specifically visit nodes. Some rules of thumb I use:

Use | only when all alternatives are parser rules.
Wrap each Token in a parser rule (like otherId) to give them "more meaning".
It's ok to mix parser rules and tokens, if the tokens are not really important (like ;) and therefore not needed in the parse tree.

Recommended topics

Hot tags