ANTLR - trouble getting AST hierarchy setup

grammar FlexByteArray_HexGrammar; options { //language = 'Java'; language = 'CSharp2'; output=AST; } expr : array_length remaining_data //the amount of remaining data must be equal to the array_length (times 2 since 2 hex characters per byte) // need to check if the array length is zero first to avoid checking $remaining_data.text (null reference) in that situation. //{ ($array_length.value == 0 && $remaining_data.text == null) || ($remaining_data.text != null && $array_length.value*2 == $remaining_data.text.Length) }? ; array_length //returns [UInt16 value] : uint16_little //{ $value = $uint16_little.value; } ; hex_byte1 //needed just so I can distinguish between two bytes in a uint16 when doing a semantic predicate (or whatever you call it where I write in the target language in curly brackets) : hex_byte ; uint16_big //returns [UInt16 value] : hex_byte1 hex_byte //{ $value = Convert.ToUInt16($hex_byte.text + $hex_byte1.text); } ; uint16_little //returns [UInt16 value] : hex_byte1 hex_byte //{ $value = Convert.ToUInt16($hex_byte1.text + $hex_byte.text); } ; remaining_data : hex_byte* ; hex_byte : HEX_DIGIT HEX_DIGIT ; HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

namespace FlexByteArray_Hex { using System; using Antlr.Runtime; using Antlr.Runtime.Tree; using Antlr.Utility.Tree; public class Program { public static void Main(string[] args) { ICharStream input = new ANTLRStringStream("0001ff"); FlexByteArray_HexGrammarLexer lex = new FlexByteArray_HexGrammarLexer(input); CommonTokenStream tokens = new CommonTokenStream(lex); FlexByteArray_HexGrammarParser parser = new FlexByteArray_HexGrammarParser(tokens); Console.WriteLine("Parser created."); CommonTree tree = parser.expr().Tree as CommonTree; Console.WriteLine("------Input parsed-------"); if (tree == null) { Console.WriteLine("Tree is null."); } else { DOTTreeGenerator treegen = new DOTTreeGenerator(); Console.WriteLine(treegen.ToDOT(tree)); } } } }

import org.antlr.*; import org.antlr.runtime.*; import org.antlr.runtime.tree.*; public class Program { public static void main(String[] args) throws Exception { FlexByteArray_HexGrammarLexer lex = new FlexByteArray_HexGrammarLexer(new ANTLRStringStream("0001ff")); CommonTokenStream tokens = new CommonTokenStream(lex); FlexByteArray_HexGrammarParser parser = new FlexByteArray_HexGrammarParser(tokens); System.out.println("Parser created."); CommonTree tree = (CommonTree)parser.expr().tree; System.out.println("------Input parsed-------"); if (tree == null) { System.out.println("Tree is null."); } else { DOTTreeGenerator treegen = new DOTTreeGenerator(); System.out.println(treegen.toDOT(tree)); } } }

Anssssss wrote:

As soon as I try to change how the tree would be made using a ^ character, I get an exception of the form:

When trying to make the the parser rule a the root of the tree inside p like this:

p : a^ b;
a : A A;
b : B B;

ANTLR does not know which of the A's is the root of rule a. And there can't be two roots, of course.

The inline tree operators are handy in certain cases, but in this case, they're not up to the task. You can't assign a root inside a production rule that can possible have no contents, like your remaining_data rule. In that case you need to create "imaginary tokens" in the tokens { ... } section of your grammar and use rewrite rule (-> ^( ... )) to create your AST.

A demo

The following grammar:

grammar FlexByteArray_HexGrammar;

options {
  output=AST;
}

tokens {
  ROOT;
  ARRAY;
  LENGTH;
  DATA;
}

expr
  :  array* EOF -> ^(ROOT array*)
  ;

array
@init { int index = 0; }
  :  array_length array_data[$array_length.value] -> ^(ARRAY array_length array_data)
  ;

array_length returns [int value]
  :  a=hex_byte b=hex_byte {$value = $a.value*16*16 + $b.value;} -> ^(LENGTH hex_byte hex_byte)
  ;

array_data [int length]
  :  ({length > 0}?=> hex_byte {length--;})* {length == 0}? -> ^(DATA hex_byte*)
  ;

hex_byte returns [int value]
  :  a=HEX_DIGIT b=HEX_DIGIT {$value = Integer.parseInt($a.text+$b.text, 16);}
  ;

HEX_DIGIT
  :  '0'..'9' | 'a'..'f' | 'A'..'F'
  ;

will parse the following input:

0001ff0002abcd

into the following AST:

enter image description here

as you can see using the following main class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
    public static void main(String[] args) throws Exception {
        FlexByteArray_HexGrammarLexer lexer = new FlexByteArray_HexGrammarLexer(new ANTLRStringStream("0001ff0002abcd"));
        FlexByteArray_HexGrammarParser parser = new FlexByteArray_HexGrammarParser(new CommonTokenStream(lexer));
        CommonTree tree = (CommonTree)parser.expr().getTree();
        DOTTreeGenerator gen = new DOTTreeGenerator();
        StringTemplate st = gen.toDOT(tree);
        System.out.println(st);
    }
}

More info

if you want to know more about the predicates I used in the array_data parser rule, see this previous Q&A: What is a 'semantic predicate' in ANTLR?
to get more info about the tree rewrite operator -> ^( ... ), see this previous Q&A: How to output the AST built using ANTLR?

EDIT

To give a short explanation of the array_data rule:

array_data [int length]
  :  ({length > 0}?=> hex_byte {length--;})* {length == 0}? -> ^(DATA hex_byte*)
  ;

As you already mentioned in the comments, you can pass one or more parameters to rules by adding [TYPE IDENTIFIER] after the rule.

The first (gated) semantic predicate, {length > 0}?=>, checks if length is more than zero. If this is the case, the parser tries to match a hex_byte after which the length variable is decreased by one. This all stops when length is zero, or when the parser has no more hex_byte to be parsed which happens when the EOF is next in line. Because it could parse less than the mandatory number of hex_bytes, there is a (validating) semantic predicate at the very end of the rule, {length == 0}?, that ensures that the correct amount of hex_bytes have been parsed (no more, and no less!).

Hope that clarifies it a bit more.

A demo

More info

EDIT

Recommended topics

Hot tags