Getting ANTLR to generate a script interpreter?
Asked Answered
R

3

14

Say I have the following Java API that all packages up as blocks.jar:

public class Block {
    private Sting name;
    private int xCoord;
    private int yCoord;

    // Getters, setters, ctors, etc.

    public void setCoords(int x, int y) {
        setXCoord(x);
        setYCoord(y);
    }
}

public BlockController {
    public static moveBlock(Block block, int newXCoord, int newYCoord) {
        block.setCooords(newXCoord, newYCoord);
    }

    public static stackBlocks(Block under, Block onTop) {
        // Stack "onTop" on top of "under".
        // Don't worry about the math here, this is just for an example.
        onTop.setCoords(under.getXCoord() + onTop.getXCoord(), under.getYCoord());
    }
}

Again, don't worry about the math and the fact that (x,y) coordinates don't accurately represent blocks in 3D space. The point is that we have Java code, compiled as a JAR, that performs operations on blocks. I now want to build a lightweight scripting language that allows a non-programmer to invoke the various block API methods and manipulate blocks, and I want to implement its interpreter with ANTLR (latest version is 4.3).

The scripting language, we'll call it BlockSpeak, might look like this:

block A at (0, 10)   # Create block "A" at coordinates (0, 10)
block B at (0, 20)   # Create block "B" at coordinates (0, 20)
stack A on B         # Stack block A on top of block B

This might be equivalent to the following Java code:

Block A, B;
A = new Block(0, 10);
B = new Block(0, 20);
BlockController.stackBlocks(B, A);

So the idea is that the ANTLR-generated interpreter would take a *.blockspeak script as input, and use the commands in this script to invoke blocks.jar API operations. I read the excellent Simple Example which creates a simple calculator using ANTLR. However in that link, there is an ExpParser class with an eval() method:

ExpParser parser = new ExpParser(tokens);
parser.eval();

The problem here is that, in the case of the calculator, the tokens represent a mathematical expression to evaluate, and eval() returns the evaluation of the expression. In the case of an interpreter, the tokens would represent my BlockSpeak script, but calling eval() shouldn't evaluate anything, it should know how to map the various BlockSpeak commands to Java code:

BlockSpeak Command:             Java code:
==========================================
block A at (0, 10)      ==>     Block A = new Block(0, 10);
block B at (0, 20)      ==>     Block B = new Block(0, 20);
stack A on B            ==>     BlockController.stackBlocks(B, A);

So my question is, where do I perform this "mapping"? In other words, how do I instruct ANTLR to call various pieces of code (packaged inside blocks.jar) when it encounters particular grammars in the BlockSpeak script? More importantly, can someone give me a pseudo-code example?

Represent answered 15/7, 2014 at 18:55 Comment(3)
Have you considered Xtext? That would give you a nice editor, and so on. In the Implementing Domain-Specific Languages with Xtext and Xtend the author shows how to implement interpreter in Chapter 8. (Though if you are not ok with using eclipse as a dependency, you can also create standalone application.)Contiguity
Thanks @GáborBakos (+1) - I appreciate the suggestion, but I'm looking for an ANTLR-based solution for reasons outside the scope of this question!Represent
No problems. (Though Xtext is also based on antlr.)Contiguity
L
17

I would simply evaluate the script on the fly, not generate Java source files which need to be compiled themselves again.

With ANTLR 4 it is highly recommended to keep the grammar and target specific code separate from each other and put any target specific code inside a tree-listener or -visitor.

I will give a quick demo how to use a listener.

A grammar for your example input could look like this:

File: blockspeak/BlockSpeak.g4

grammar BlockSpeak;

parse
 : instruction* EOF
 ;

instruction
 : create_block
 | stack_block
 ;

create_block
 : 'block' NAME 'at' position
 ;

stack_block
 : 'stack' top=NAME 'on' bottom=NAME
 ;

position
 : '(' x=INT ',' y=INT ')'
 ;

COMMENT
 : '#' ~[\r\n]* -> skip
 ;

INT
 : [0-9]+
 ;

NAME
 : [a-zA-Z]+
 ;

SPACES
 : [ \t\r\n] -> skip
 ;

Some supporting Java classes:

File: blockspeak/Main.java

package blockspeak;

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTreeWalker;

import java.util.Scanner;

public class Main {

    public static void main(String[] args) throws Exception {

        Scanner keyboard = new Scanner(System.in);

        // Some initial input to let the parser have a go at.
        String input = "block A at (0, 10)   # Create block \"A\" at coordinates (0, 10)\n" +
                "block B at (0, 20)   # Create block \"B\" at coordinates (0, 20)\n" +
                "stack A on B         # Stack block A on top of block B";

        EvalBlockSpeakListener listener = new EvalBlockSpeakListener();

        // Keep asking for input until the user presses 'q'.
        while(!input.equals("q")) {

            // Create a lexer and parser for `input`.
            BlockSpeakLexer lexer = new BlockSpeakLexer(new ANTLRInputStream(input));
            BlockSpeakParser parser = new BlockSpeakParser(new CommonTokenStream(lexer));

            // Now parse the `input` and attach our listener to it. We want to reuse 
            // the same listener because it will hold out Blocks-map.
            ParseTreeWalker.DEFAULT.walk(listener, parser.parse());

            // Let's see if the user wants to continue.
            System.out.print("Type a command and press return (q to quit) $ ");
            input = keyboard.nextLine();
        }

        System.out.println("Bye!");
    }
}

// You can place this Block class inside Main.java as well.
class Block {

    final String name;
    int x;
    int y;

    Block(String name, int x, int y) {
        this.name = name;
        this.x = x;
        this.y = y;
    }

    void onTopOf(Block that) {
        // TODO
    }
}

This main class is pretty self explanatory with the inline comments. The tricky part is what the listener is supposed to look like. Well, here it is:

File: blockspeak/EvalBlockSpeakListener.java

package blockspeak;

import org.antlr.v4.runtime.misc.NotNull;

import java.util.HashMap;
import java.util.Map;

/**
 * A class extending the `BlockSpeakBaseListener` (which will be generated
 * by ANTLR) in which we override the methods in which to create blocks, and
 * in which to stack blocks.
 */
public class EvalBlockSpeakListener extends BlockSpeakBaseListener {

    // A map that keeps track of our Blocks.
    private final Map<String, Block> blocks = new HashMap<String, Block>();

    @Override
    public void enterCreate_block(@NotNull BlockSpeakParser.Create_blockContext ctx) {

        String name = ctx.NAME().getText();
        Integer x = Integer.valueOf(ctx.position().x.getText());
        Integer y = Integer.valueOf(ctx.position().y.getText());

        Block block = new Block(name, x, y);

        System.out.printf("creating block: %s\n", name);

        blocks.put(block.name, block);
    }

    @Override
    public void enterStack_block(@NotNull BlockSpeakParser.Stack_blockContext ctx) {

        Block bottom = this.blocks.get(ctx.bottom.getText());
        Block top = this.blocks.get(ctx.top.getText());

        if (bottom == null) {
            System.out.printf("no such block: %s\n", ctx.bottom.getText());
        }
        else if (top == null) {
            System.out.printf("no such block: %s\n", ctx.top.getText());
        }
        else {
            System.out.printf("putting %s on top of %s\n", top.name, bottom.name);
            top.onTopOf(bottom);
        }
    }
}

The listener above has 2 methods defined that map to the following parser rules:

create_block
 : 'block' NAME 'at' position
 ;

stack_block
 : 'stack' top=NAME 'on' bottom=NAME
 ;

Whenever the parser "enters" such a parser rule, the corresponding method inside the listener will be called. So, whenever enterCreate_block (the parser enters the create_block rule) is called, we create (and save) a block, and when enterStack_block is called, we retrieve the 2 block involved in the operation, and stack one on top of the other.

To see the 3 classes above in action, download ANTLR 4.4 inside the directory that holds the blockspeak/ directory with the .g4 and .java files.

Open a console and perform the following 3 steps:

1. generate the ANTLR files:

java -cp antlr-4.4-complete.jar org.antlr.v4.Tool blockspeak/BlockSpeak.g4 -package blockspeak

2. compile all Java sources files:

javac -cp ./antlr-4.4-complete.jar blockspeak/*.java

3. Run the main class:

3.1. Linux/Mac
java -cp .:antlr-4.4-complete.jar blockspeak.Main
3.2. Windows
java -cp .;antlr-4.4-complete.jar blockspeak.Main

Here is an example session of running the Main class:

bart@hades:~/Temp/demo$ java -cp .:antlr-4.4-complete.jar blockspeak.Main
creating block: A
creating block: B
putting A on top of B
Type a command and press return (q to quit) $ block X at (0,0)
creating block: X
Type a command and press return (q to quit) $ stack Y on X
no such block: Y
Type a command and press return (q to quit) $ stack A on X 
putting A on top of X
Type a command and press return (q to quit) $ q
Bye!
bart@hades:~/Temp/demo$ 

More info on tree listeners: https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Parse+Tree+Listeners

Luettaluevano answered 20/7, 2014 at 18:51 Comment(0)
G
4

I would personally write a grammar to generate a Java program for each script that you could then compile (along with your jar) and run independently... i.e., a 2-step process.

For example, with something like the following simple grammar (which I haven't tested and I am sure you would need to extend and adapt), you could replace the parser.eval() statement in that example with parser.program(); (also substituting "BlockSpeak" for "Exp" throughout) and it should spit out Java code that matches the script to stdout, which you could redirect into a .java file, compile (together with the jar) and run.

BlockSpeak.g:

grammar BlockSpeak;

program 
    @init { System.out.println("//import com.whatever.stuff;\n\npublic class BlockProgram {\n    public static void main(String[] args) {\n\n"); }
    @after { System.out.println("\n    } // main()\n} // class BlockProgram\n\n"); }
    : inss=instructions                         { if (null != $inss.insList) for (String ins : $inss.insList) { System.out.println(ins); } }
    ;

instructions returns [ArrayList<String> insList]
    @init { $insList = new ArrayList<String>(); }
    : (instruction { $insList.add($instruction.ins); })* 
    ;

instruction returns [String ins]
    :  ( create { $ins = $create.ins; } | move  { $ins = $move.ins; } | stack { $ins = $stack.ins; } ) ';' 
    ;

create returns [String ins]
    :  'block' id=BlockId 'at' c=coordinates    { $ins = "        Block " + $id.text + " = new Block(" + $c.coords + ");\n"; }
    ;

move returns [String ins]
    :  'move' id=BlockId 'to' c=coordinates     { $ins = "        BlockController.moveBlock(" + $id.text + ", " + $c.coords + ");\n"; }
    ;

stack returns [String ins]
    :  'stack' id1=BlockId 'on' id2=BlockId     { $ins = "        BlockController.stackBlocks(" + $id1.text + ", " + $id2.text + ");\n"; }
    ;

coordinates returns [String coords]
    :    '(' x=PosInt ',' y=PosInt ')'          { $coords = $x.text + ", " + $y.text; }
    ;

BlockId
    :    ('A'..'Z')+
    ;

PosInt
    :    ('0'..'9') ('0'..'9')* 
    ;

WS  
    :   (' ' | '\t' | '\r'| '\n')               -> channel(HIDDEN)
    ;

(Note that for simplicity this grammar requires semi-colons to separate each instruction.)

There are of course other ways to do this sort of thing, but this seems like the simplest to me.

Good luck!


Update

So I went ahead and "finished" my original post (fixing a few bugs in the above grammar) and testing it on a simple script.

Here is the .java file I used to test the above grammar (taken from the code stubs you posted above). Note that in your situation, you would probably want to make the script filename (in my code "script.blockspeak") into a command line parameter. Also, of course the Block and BlockController classes would instead come from your jar.

BlockTest.java:

import org.antlr.v4.runtime.*;

class Block {
    private String name;
    private int xCoord;
    private int yCoord;

    // Other Getters, setters, ctors, etc.
    public Block(int x, int y) { xCoord = x; yCoord = y; }

    public int getXCoord() { return xCoord; }
    public int getYCoord() { return yCoord; }

    public void setXCoord(int x) { xCoord = x; }
    public void setYCoord(int y) { yCoord = y; }

    public void setCoords(int x, int y) {
        setXCoord(x);
        setYCoord(y);
    }
}

class BlockController {
    public static void moveBlock(Block block, int newXCoord, int newYCoord) {
        block.setCoords(newXCoord, newYCoord);
    }

    public static void stackBlocks(Block under, Block onTop) {
        // Stack "onTop" on top of "under".
        // Don't worry about the math here, this is just for an example.
        onTop.setCoords(under.getXCoord() + onTop.getXCoord(), under.getYCoord());
    }
}

public class BlocksTest {
    public static void main(String[] args) throws Exception {
        ANTLRFileStream in = new ANTLRFileStream("script.blockspeak");
        BlockSpeakLexer lexer = new BlockSpeakLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        BlockSpeakParser parser = new BlockSpeakParser(tokens);
        parser.program();
    }
}

And here are the command lines I used (on my MacBook Pro):

> java -jar antlr-4.4-complete.jar BlockSpeak.g
> javac -cp .:antlr-4.4-complete.jar *.java
> java -cp .:antlr-4.4-complete.jar BlocksTest > BlockProgram.java

This was the input script:

script.blockspeak:

block A at (0, 10);                                                                                                                                            
block B at (0, 20);
stack A on B;

And this was the output:

BlockProgram.java:

//import com.whatever.stuff;

public class BlockProgram {
    public static void main(String[] args) {


        Block A = new Block(0, 10);

        Block B = new Block(0, 20);

        BlockController.stackBlocks(A, B);


    } // main()
} // class BlockProgram

You would of course then have to compile and run BlockProgram.java for each script.


In answer to one of the questions in your comment (#3), there are a couple more complex options I first contemplated that might streamline your "user experience".

(A) Instead of using the grammar to generate a java program that you then have to compile and run, you could embed the calls to the BlockController directly into the ANTLR actions. Where I created strings and passed them up from one non-terminal to the next, you could have java code there directly doing your Block commands whenever an instruction rule is recognized. This would require a bit more complexity with respect to the ANTLR grammar and imports, but it's technically doable.

(B) If you were to do option A, you could then go a step further and create an interactive interpreter ("shell"), where the user is presented with a prompt and just types in "blockspeak" commands at the prompt, which are then parsed and executed directly, displaying the results back to the user.

Neither of these options are all that much harder to accomplish in terms of complexity, but they each require doing a lot more coding that would be beyond the scope of a Stack Overflow answer. That's why I opted to present a "simpler" solution here.

Granite answered 17/7, 2014 at 23:32 Comment(10)
Excellent answer @Granite (+1) - a few followup questions though! (1) In your example grammar, you make references to the System, Block and BlockController class...how does ANTLR link to these at runtime, since they aren't "imported" with fullu-qualified package names? For example, how does it know you're invoking java.lang.System and not com.foo.bar.System? (2) What class type is parser? I looked in the ANTLR 4.3 javadocs and can't find anything like Parser or AbstractParser that has eval() or program() methods on it.Represent
And (3) Your answer has me curious here; you start out by stating "I would personally write a grammar to generate a Java program...", and then later on you say "There are of course other ways to do this sort of thing...". This makes me think that this is just one of many strategies to accomplish what I need. If so, can you elaborate on them a bit; are there any links you can provide so I can read up on them? It's not that I don't like your approach, and it may absolutely turn out to be the best, I just want to be as informed as possible. Thanks again!Represent
@Represent I just updated my post with more details. (It all works as presented.) Re qn (1), ANTLR is pretty smart about this. For System nothing need be imported, but for the ArrayList I used, ANTLR knows to import java.util.ArrayList. If you wanted to do my option (A), then you would need to add a @header block at the beginning of your grammar to include your custom imports. Re qn (2), it's a BlockSpeakParser, generated by ANTLR to extend its parser class. program() matches the name of the top non-terminal in the grammar. Re qn (3), I added that to my update. Good luck!Granite
Thanks again (+1) - I'm going to wait a few days just in case anyone has any other ideas to contribute, but you help here is enormously appreciated! Thanks again!Represent
@IamYourFaja No problem. Let me know if you did decide to go the route of option (A) or (B). Neither is all that hard. (A) requires knowing a bit more about what your BlockController will do with each instruction (e.g., update a database model?), in particular with respect to validation of the instructions (e.g., are you trying to move an unknown block? is the operation legal? etc.) and what should happen if it fails. (B) would require an input loop that repeatedly calls parser.instruction() for each line of user input and handles any (error) response from the BlockController.Granite
Thanks again @Turix, I'll get back to you within 24 hours once I have more time to vet out these options. In the meantime, I know I already asked this, but I can't find this ANTLR Parser class in their Javadocs. Can you give me the fully-qualified package name for it, or post a link to its Javadoc? I think seeing this class's methods would tie a lot of stuff together for me, mentally. Thanks again!Represent
@Represent I think it's org.antlr.runtime.Parser or org.antlr.v4.runtime.Parser. I think this is it: antlr.org/api/Java/org/antlr/v4/runtime/Parser.html. ANTLR automatically generates BlockSpeakParser.java, which extends that and, for example, adds methods for the non-terminals in your grammar. I've found it's very helpful to look at that (at least once) once you've generated it, or if you're having issues parsing something.Granite
IMHO this all seems a bit too much: generating Java source files which need to be compiled... Also, in ANTLR 4, you have visitor and listener classes to keep your grammar clean from target specific code. Imagine that the language will be extended: this would mean that the grammar and Java code will have to be edited in the same file (very error prone!). And what if there needs to be support for C# as well: this would mean stripping all Java code from the grammar and adding the C# code. Now you have 2 grammar files you need to keep in sync. In short: don't mix target code with the grammar.Luettaluevano
Sorry to sound a bit harsh, I know you mean well. But the suggested solution is not something I'd recommend implementing (and maintaining).Luettaluevano
@BartKiers You're right that option C is more user friendly (and I see you advocating that below). I'm "old school" so I prefer having the java code in the grammar itself (I consider that less error prone and see absolutely no need for a redundant C# version), but I do see your point about it being a bit much to compile generated Java code. I envisioned doing that automatically with a script allowing for batch processing of a bunch of "blockspeak" scripts at once, with the potential for repeating them (or composing "higher-order operations"). However interactivity is probably more desirable.Granite
D
1

The eval() in ExpParser is implemented through method calls; it's just that the calls have shortcut syntax in the form of operators.

As an exercise, change ExpParser adding a Calculator class with (unimplemented) methods for mathematical operators, add(), multiply(), divide(), and so on, and then change the rules to use those methods instead of the operators. Thus, you'll understand the basis of what you need to do for your BlockSpeak interpreter.

additionExp returns [double value]
    :    m1=multiplyExp       {$value =  $m1.value;} 
         ( '+' m2=multiplyExp {$value = Calculator.add($value, $m2.value);} 
         | '-' m2=multiplyExp {$value = Calculator.subtract($value, $m2.value);}
         )* 
    ;
Duong answered 17/7, 2014 at 17:51 Comment(5)
Thanks @Duong (+1) - but a few questions here! (1) Are you saying I should modify the ExpParser class, or not use it at all? If you are saying the former, this doesn't seem like a very maintainable process, for every time the grammar rules changes for any reason, I'll need to be modifying the outputted ExpParser that ANTLR gives me. (2) How do I "connect" the Calculator in the grammar file, with say, com.me.myorg.Calculator?Represent
And (3) can you post a complete code example with your recommended changes to ExpParser and Calculator? I guess I'm still having trouble seeing the forest through the trees here, thanks again!Represent
You can use ExpParser as a template. I think you would be passing a BlockSpeakBuilder bsc or Block block parameters to the rules in your grammar, so your rulues can execute things like {$value = bsc.newBlock();} or {block.setSomeFeature($var);}.Duong
To give you a more specific example I'd need a grammar for BlockSpeak, which should be your starting point any way. Try writing and testing the grammar with no semantic actions, and update your question with the result.Duong
Thanks @Duong (+2 for both) - please give me about 24 hours and I'll update my question with specific grammars for BlockSpeak! Thanks again!Represent

© 2022 - 2024 — McMap. All rights reserved.