Using ANTLR 3.3?

Asked 9/12, 2010 at 8:5 Answered 31/7, 2012 at 5:37

I'm trying to get started with ANTLR and C# but I'm finding it extraordinarily difficult due to the lack of documentation/tutorials. I've found a couple half-hearted tutorials for older versions, but it seems there have been some major changes to the API since.

Can anyone give me a simple example of how to create a grammar and use it in a short program?

I've finally managed to get my grammar file compiling into a lexer and parser, and I can get those compiled and running in Visual Studio (after having to recompile the ANTLR source because the C# binaries seem to be out of date too! -- not to mention the source doesn't compile without some fixes), but I still have no idea what to do with my parser/lexer classes. Supposedly it can produce an AST given some input...and then I should be able to do something fancy with that.

Sacristan answered 9/12, 2010 at 8:5 Comment(0)

132

Let's say you want to parse simple expressions consisting of the following tokens:

- subtraction (also unary);
+ addition;
* multiplication;
/ division;
(...) grouping (sub) expressions;
integer and decimal numbers.

An ANTLR grammar could look like this:

grammar Expression;

options {
  language=CSharp2;
}

parse
  :  exp EOF 
  ;

exp
  :  addExp
  ;

addExp
  :  mulExp (('+' | '-') mulExp)*
  ;

mulExp
  :  unaryExp (('*' | '/') unaryExp)*
  ;

unaryExp
  :  '-' atom 
  |  atom
  ;

atom
  :  Number
  |  '(' exp ')' 
  ;

Number
  :  ('0'..'9')+ ('.' ('0'..'9')+)?
  ;

Now to create a proper AST, you add output=AST; in your options { ... } section, and you mix some "tree operators" in your grammar defining which tokens should be the root of a tree. There are two ways to do this:

add ^ and ! after your tokens. The ^ causes the token to become a root and the ! excludes the token from the ast;
by using "rewrite rules": ... -> ^(Root Child Child ...).

Take the rule foo for example:

foo
  :  TokenA TokenB TokenC TokenD
  ;

and let's say you want TokenB to become the root and TokenA and TokenC to become its children, and you want to exclude TokenD from the tree. Here's how to do that using option 1:

foo
  :  TokenA TokenB^ TokenC TokenD!
  ;

and here's how to do that using option 2:

foo
  :  TokenA TokenB TokenC TokenD -> ^(TokenB TokenA TokenC)
  ;

So, here's the grammar with the tree operators in it:

grammar Expression;

options {
  language=CSharp2;
  output=AST;
}

tokens {
  ROOT;
  UNARY_MIN;
}

@parser::namespace { Demo.Antlr }
@lexer::namespace { Demo.Antlr }

parse
  :  exp EOF -> ^(ROOT exp)
  ;

exp
  :  addExp
  ;

addExp
  :  mulExp (('+' | '-')^ mulExp)*
  ;

mulExp
  :  unaryExp (('*' | '/')^ unaryExp)*
  ;

unaryExp
  :  '-' atom -> ^(UNARY_MIN atom)
  |  atom
  ;

atom
  :  Number
  |  '(' exp ')' -> exp
  ;

Number
  :  ('0'..'9')+ ('.' ('0'..'9')+)?
  ;

Space 
  :  (' ' | '\t' | '\r' | '\n'){Skip();}
  ;

I also added a Space rule to ignore any white spaces in the source file and added some extra tokens and namespaces for the lexer and parser. Note that the order is important (options { ... } first, then tokens { ... } and finally the @... {}-namespace declarations).

That's it.

Now generate a lexer and parser from your grammar file:

java -cp antlr-3.2.jar org.antlr.Tool Expression.g

and put the .cs files in your project together with the C# runtime DLL's.

You can test it using the following class:

using System;
using Antlr.Runtime;
using Antlr.Runtime.Tree;
using Antlr.StringTemplate;

namespace Demo.Antlr
{
  class MainClass
  {
    public static void Preorder(ITree Tree, int Depth) 
    {
      if(Tree == null)
      {
        return;
      }

      for (int i = 0; i < Depth; i++)
      {
        Console.Write("  ");
      }

      Console.WriteLine(Tree);

      Preorder(Tree.GetChild(0), Depth + 1);
      Preorder(Tree.GetChild(1), Depth + 1);
    }

    public static void Main (string[] args)
    {
      ANTLRStringStream Input = new ANTLRStringStream("(12.5 + 56 / -7) * 0.5"); 
      ExpressionLexer Lexer = new ExpressionLexer(Input);
      CommonTokenStream Tokens = new CommonTokenStream(Lexer);
      ExpressionParser Parser = new ExpressionParser(Tokens);
      ExpressionParser.parse_return ParseReturn = Parser.parse();
      CommonTree Tree = (CommonTree)ParseReturn.Tree;
      Preorder(Tree, 0);
    }
  }
}

which produces the following output:

ROOT
  *
    +
      12.5
      /
        56
        UNARY_MIN
          7
    0.5

which corresponds to the following AST:

alt text

(diagram created using graph.gafol.net)

Note that ANTLR 3.3 has just been released and the CSharp target is "in beta". That's why I used ANTLR 3.2 in my example.

In case of rather simple languages (like my example above), you could also evaluate the result on the fly without creating an AST. You can do that by embedding plain C# code inside your grammar file, and letting your parser rules return a specific value.

Here's an example:

grammar Expression;

options {
  language=CSharp2;
}

@parser::namespace { Demo.Antlr }
@lexer::namespace { Demo.Antlr }

parse returns [double value]
  :  exp EOF {$value = $exp.value;}
  ;

exp returns [double value]
  :  addExp {$value = $addExp.value;}
  ;

addExp returns [double value]
  :  a=mulExp       {$value = $a.value;}
     ( '+' b=mulExp {$value += $b.value;}
     | '-' b=mulExp {$value -= $b.value;}
     )*
  ;

mulExp returns [double value]
  :  a=unaryExp       {$value = $a.value;}
     ( '*' b=unaryExp {$value *= $b.value;}
     | '/' b=unaryExp {$value /= $b.value;}
     )*
  ;

unaryExp returns [double value]
  :  '-' atom {$value = -1.0 * $atom.value;}
  |  atom     {$value = $atom.value;}
  ;

atom returns [double value]
  :  Number      {$value = Double.Parse($Number.Text, CultureInfo.InvariantCulture);}
  |  '(' exp ')' {$value = $exp.value;}
  ;

Number
  :  ('0'..'9')+ ('.' ('0'..'9')+)?
  ;

Space 
  :  (' ' | '\t' | '\r' | '\n'){Skip();}
  ;

which can be tested with the class:

using System;
using Antlr.Runtime;
using Antlr.Runtime.Tree;
using Antlr.StringTemplate;

namespace Demo.Antlr
{
  class MainClass
  {
    public static void Main (string[] args)
    {
      string expression = "(12.5 + 56 / -7) * 0.5";
      ANTLRStringStream Input = new ANTLRStringStream(expression);  
      ExpressionLexer Lexer = new ExpressionLexer(Input);
      CommonTokenStream Tokens = new CommonTokenStream(Lexer);
      ExpressionParser Parser = new ExpressionParser(Tokens);
      Console.WriteLine(expression + " = " + Parser.parse());
    }
  }
}

and produces the following output:

(12.5 + 56 / -7) * 0.5 = 2.25

EDIT

In the comments, Ralph wrote:

Tip for those using Visual Studio: you can put something like java -cp "$(ProjectDir)antlr-3.2.jar" org.antlr.Tool "$(ProjectDir)Expression.g" in the pre-build events, then you can just modify your grammar and run the project without having to worry about rebuilding the lexer/parser.

Mullens answered 9/12, 2010 at 11:40 Comment(15)

Nice! This is a big help. I think a large part of the problem was that 3.3 simply doesn't work. It makes parse() private, and skip() isn't available, and the C# runtimes don't work with it. This should help me get started, thanks a ton! – Sacristan 9/12, 2010 at 19:40

Also... what program did you use to make the diagram? Looks nice :) – Sacristan 9/12, 2010 at 19:44

@Ralph, note that the parse method is the first parser rule I defined, so it's there because I put it in there. Also, it's correct that skip() isn't available, but Skip() is . And you're welcome, of course! (will edit my answer to post a link to the site where I created the diagram) – Mullens 9/12, 2010 at 19:48

@Bart: Yes, 3.3 puts it there too (if you defined it), but declares it private. I guess skip() (lowercase) is for Java? – Sacristan 9/12, 2010 at 20:29

@Ralph, that's odd: I'll have a look later on at version 3.3. And yes, the lower case skip is for the Java target. – Mullens 9/12, 2010 at 20:38

Tip for those using Visual Studio: you can put something like java -cp "$(ProjectDir)antlr-3.2.jar" org.antlr.Tool "$(ProjectDir)Expression.g" in the pre-build events :) Then you can just modify your grammar and run the project without having to worry about rebuilding the lexer/parser. – Sacristan 9/12, 2010 at 20:56

awesome! I'd be interested in how you'd go about evaluating your AST for your expression parser? (so that it would compute the answer somewhat like your second grammar) – Mussolini 20/12, 2010 at 4:21

public static double Preorder(ITree Tree, int Depth) { if (Tree == null) { return 0.0; } var l = Preorder(Tree.GetChild(0), Depth + 1); var r = Preorder(Tree.GetChild(1), Depth + 1); if (Tree.Type == CatalogParser.Number) return Double.Parse(Tree.Text); if(Tree.Type == 10) return lr; if(Tree.Type == 11) return l/r; if(Tree.Type == 8) return l+r; if(Tree.Type == 9) return l-r; if (Tree.Type == CatalogParser.UNARY_MIN) return l-1.0; return l; } – Mussolini 20/12, 2010 at 4:56

You really should use InvariantCulture in double.Parse. – Kolk 4/11, 2011 at 23:56

@luiscubal, I'm not really familiar with C#, so I Googled what InvariantCulture is, and yeah, you're right. But the grammar demands the '.' will always be used in floating point literals, is it still needed then? – Mullens 5/11, 2011 at 6:18

@Bart Kiers Yes. If Parse expects the separator to be , but finds a ., it may throw an exception. – Kolk 5/11, 2011 at 14:49

I had problems running the example with the latest antlr runtime (v3.4). Using language=CSharp3; seems to fix it. – Transience 27/3, 2012 at 14:20

Yeah, language=CSharp3. Also, "parse" doesn't seem to show up as a method, unless it's preceded by "public" in the .g file, like "startProg" is, here: programming-pages.com/2012/07/01/… – Orthostichy 19/7, 2012 at 16:41

@WordsLikeJared, my demo works using CSharp2 and ANTLR 3.2 (the versions I mentioned in my answer). The newer runtime (CSharp3) makes all parser rules private by default, but CSharp2 doesn't. – Mullens 19/7, 2012 at 17:13

It makes sense to mark all rules as private, and then you can easily create you partial Parser class to expose only the functions you want. (Tip 3 in lextm.com/2012/07/how-to-use-antlr-on-net-part-iv.html) – Horgan 31/7, 2012 at 5:41

Have you looked at Irony.net? It's aimed at .Net and therefore works really well, has proper tooling, proper examples and just works. The only problem is that it is still a bit 'alpha-ish' so documentation and versions seem to change a bit, but if you just stick with a version, you can do nifty things.

p.s. sorry for the bad answer where you ask a problem about X and someone suggests something different using Y ;^)

Fill answered 9/12, 2010 at 8:55 Comment(7)

Hrm...generally I'd be offended that you ignored the question, but this Irony actually seems quite nice during my preliminary poking :) – Sacristan 9/12, 2010 at 9:30

@Ralph, are you really "offended" if someone posts an answer that not addresses your question 100%? – Mullens 9/12, 2010 at 10:37

@Bart: I'm not sure "offended" is exactly the word I'm looking for...but [usually] I find it a bit annoying when someone posts about something completely different, especially when I've already considered it, and decided against it. – Sacristan 9/12, 2010 at 19:17

@Ralph, yes, I can imagine your sentiment, but personally, I wouldn't have used the word "offended", that's all. It's not that Toad's answer was completely off-topic. – Mullens 9/12, 2010 at 19:45

@Bart: No, not completely :) Anyway, I gave him a +1. I do appreciate the answer. – Sacristan 9/12, 2010 at 21:21

@Toad: I already apologized for the 'bad answer'. But I needed to give it to him anyway since it might be worthwile. Maybe I'll just post it as a comment next time ;^) – Fill 10/12, 2010 at 9:31

As of 2014 Irony has taken a back seat to the developer's paying job. – Radiograph 10/2, 2015 at 10:47

My personal experience is that before learning ANTLR on C#/.NET, you should spare enough time to learn ANTLR on Java. That gives you knowledge on all the building blocks and later you can apply on C#/.NET.

I wrote a few blog posts recently,

The assumption is that you are familiar with ANTLR on Java and is ready to migrate your grammar file to C#/.NET.

Horgan answered 31/7, 2012 at 5:37 Comment(2)

what's the advantage of learning on java first? how would it be any easier (aside from having more tutorials)? particularly if my java isn't as strong as my C#? – Sacristan 31/7, 2012 at 9:32

You don't need to be a Java expert so as to finish the Java tutorials. For me, it taught me the ABCs of ANTLR, and then I could quickly know how ANTLR works and switch back to C#. – Horgan 31/7, 2012 at 12:5

There is a great article on how to use antlr and C# together here:

http://www.codeproject.com/KB/recipes/sota_expression_evaluator.aspx

it's a "how it was done" article by the creator of NCalc which is a mathematical expression evaluator for C# - http://ncalc.codeplex.com

You can also download the grammar for NCalc here: http://ncalc.codeplex.com/SourceControl/changeset/view/914d819f2865#Grammar%2fNCalc.g

example of how NCalc works:

Expression e = new Expression("Round(Pow(Pi, 2) + Pow([Pi2], 2) + X, 2)"); 

  e.Parameters["Pi2"] = new Expression("Pi * Pi"); 
  e.Parameters["X"] = 10; 

  e.EvaluateParameter += delegate(string name, ParameterArgs args) 
    { 
      if (name == "Pi") 
      args.Result = 3.14; 
    }; 

  Debug.Assert(117.07 == e.Evaluate());

hope its helpful

Leverage answered 9/12, 2010 at 9:45 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

EDIT

Recommended topics

Hot tags