Once grammar is complete, what's the best way to walk an ANTLR v4 tree?
Asked Answered
A

1

38

Goal

I'm working on a project to create a Varscoper for Coldfusion CFscript. Basically, this means checking through source code files to ensure that developers have properly var'd their variables.

After a couple of days of working with ANTLR V4 I have a grammar which generates a very nice parse tree in the GUI view. Now, using that tree I need a way to crawl up and down the nodes programmatically looking for variable declarations and ensure that if they are inside functions they have the proper scoping. If possible I would rather NOT do this in the grammar file as that would require mixing the definition of the language with this specific task.

What I've tried

My latest attempt was using the ParserRuleContext and attempting to go through it's children via getPayload(). After checking the class of getPayLoad() I would either have a ParserRuleContext object or a Token object. Unfortunately, using that I was never able to find a way to get the actual rule type for a specific node, only it's containing text. The rule type for each node is neccessary because it matters whether that text node is an ignored right-hand expression, a variable assignment or a function declaration.

Questions

  1. I am very new to ANTLR, is this even the right approach, or is there a better way to traverse the tree?

Here's my sample java code:

Cfscript.java

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.Trees;

public class Cfscript {
    public static void main(String[] args) throws Exception {
        ANTLRInputStream input = new ANTLRFileStream(args[0]);
        CfscriptLexer lexer = new CfscriptLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        CfscriptParser parser = new CfscriptParser(tokens);
        parser.setBuildParseTree(true);
        ParserRuleContext tree = parser.component();
        tree.inspect(parser); // show in gui
        /*
            Recursively go though tree finding function declarations and ensuring all variableDeclarations are varred
            but how?
        */
    }
}

Cfscript.g4

grammar Cfscript;

component
    : 'component' keyValue* '{' componentBody '}'
    ;

componentBody
    : (componentElement)*
    ;

componentElement
    : statement
    | functionDeclaration
    ;

functionDeclaration
    : Identifier? Identifier? 'function' Identifier argumentsDefinition '{' functionBody '}'
    ;

argumentsDefinition
    : '(' argumentDefinition (',' argumentDefinition)* ')'
    | '()'
    ;

argumentDefinition
    : Identifier? Identifier? argumentName ('=' expression)?
    ; 

argumentName
    : Identifier
    ;

functionBody
    : (statement)*
    ;

statement
    : variableStatement
    | nonVarVariableStatement
    | expressionStatement
    ;

variableStatement
    : 'var' variableName '=' expression ';'
    ;

nonVarVariableStatement
    : variableName '=' expression ';'
    ;

expressionStatement
    : expression ';'
    ;

expression
    : assignmentExpression
    | arrayLiteral
    | objectLiteral
    | StringLiteral
    | incrementExpression
    | decrementExpression
    | 'true' 
    | 'false'
    | Identifier
    ;

incrementExpression
    : variableName '++'
    ;

decrementExpression
    : variableName '--'
    ;

assignmentExpression
    : Identifier (assignmentExpressionSuffix)*
    | assignmentExpression (('+'|'-'|'/'|'*') assignmentExpression)+
    ;

assignmentExpressionSuffix
    : '.' assignmentExpression
    | ArrayIndex
    | ('()' | '(' expression (',' expression)* ')' )
    ;

methodCall
    : Identifier ('()' | '(' expression (',' expression)* ')' )
    ;

variableName
    : Identifier (variableSuffix)*
    ;

variableSuffix
    : ArrayIndex
    | '.' variableName
    ;

arrayLiteral
    : '[' expression (',' expression)* ']'
    ;

objectLiteral
    : '{' (Identifier '=' expression (',' Identifier '=' expression)*)? '}'
    ;

keyValue
    : Identifier '=' StringLiteral
    ;

StringLiteral
    :  '"' (~('\\'|'"'))* '"'
    ;

 ArrayIndex
    : '[' [1-9] [0-9]* ']'
    | '[' StringLiteral ']'
    ;

Identifier
    : [a-zA-Z0-9]+
    ;

WS
    : [ \t\r\n]+ -> skip 
    ;

COMMENT 
    : '/*' .*? '*/'  -> skip
    ;

Test.cfc (testing code file)

component something = "foo" another = "more" persistent = "true" datasource = "#application.env.dsn#" {
    var method = something.foo.test1;
    testing = something.foo[10];
    testingagain = something.foo["this is a test"];
    nuts["testing"]++;
    blah.test().test3["test"]();

    var math = 1 + 2 - blah.test().test4["test"];

    var test = something;
    var testing = somethingelse;
    var testing = { 
        test = more, 
        mystuff = { 
            interior = test 
        },
        third = "third key"
    };
    other = "Idunno homie";
    methodCall(interiorMethod());

    public function bar() {
        var new = "somebody i used to know";
        something = [1, 2, 3];
    }

    function nuts(required string test1 = "first", string test = "second", test3 = "third") {

    }

    private boolean function baz() {
        var this = "something else";
    }
}
Achromatism answered 24/2, 2013 at 8:52 Comment(5)
Btw, instead of '()', you probably want to do: '(' ')' since the first will not match parenthesis with spaces in between them.Gibeon
In case you're not aware, Railo 4 has a more general scope checking tool as part of its debugging options ... doesn't entirely make varscoper redundant, but it does overlap in the problem it solves.Grantinaid
Also, there's a more complete/general ANTLR parser available - have a look at the stuff denstar has done - might provide some shortcuts.Grantinaid
@BartKiers if I use '(' ')' on the argumentsDefinition rule I get an error that says no viable alternative at input '()'. Try it using antlr4 and the files I have provided.Achromatism
@Nucleon, ah, okay, that means your grammar isn't correct because it would not parse input like ... function baz( ) { ... properly.Gibeon
G
50

I wouldn't walk this manually if I were you. After generating a lexer and parser, ANTLR would also have generated a file called CfscriptBaseListener that has empty methods for all of your parser rules. You can let ANTLR walk your tree and attach a custom tree-listener in which you override only those methods/rules you're interested in.

In your case, you probably want to be notified whenever a new function is created (to create a new scope) and you'll probably be interested in variable assignments (variableStatement and nonVarVariableStatement). Your listener, let's call is VarListener will keep track of all scopes as ANTLR walks the tree.

I did change 1 rule slightly (I added objectLiteralEntry):

objectLiteral
    : '{' (objectLiteralEntry (',' objectLiteralEntry)*)? '}'
    ;

objectLiteralEntry
    : Identifier '=' expression
    ;
    

which makes life easier in the following demo:

VarListener.java

public class VarListener extends CfscriptBaseListener {

    private Stack<Scope> scopes;

    public VarListener() {
        scopes = new Stack<Scope>();
        scopes.push(new Scope(null));
    } 

    @Override
    public void enterVariableStatement(CfscriptParser.VariableStatementContext ctx) {
        String varName = ctx.variableName().getText();
        Scope scope = scopes.peek();
        scope.add(varName);
    }

    @Override
    public void enterNonVarVariableStatement(CfscriptParser.NonVarVariableStatementContext ctx) {
        String varName = ctx.variableName().getText();
        checkVarName(varName);
    }

    @Override
    public void enterObjectLiteralEntry(CfscriptParser.ObjectLiteralEntryContext ctx) {
        String varName = ctx.Identifier().getText();
        checkVarName(varName);
    }

    @Override
    public void enterFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {
        scopes.push(new Scope(scopes.peek()));
    }

    @Override
    public void exitFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {
        scopes.pop();        
    }

    private void checkVarName(String varName) {
        Scope scope = scopes.peek();
        if(scope.inScope(varName)) {
            System.out.println("OK   : " + varName);
        }
        else {
            System.out.println("Oops : " + varName);
        }
    }
}

A Scope object could be as simple as:

Scope.java

class Scope extends HashSet<String> {

    final Scope parent;

    public Scope(Scope parent) {
        this.parent = parent;
    }

    boolean inScope(String varName) {
        if(super.contains(varName)) {
            return true;
        }
        return parent == null ? false : parent.inScope(varName);
    }
}

Now, to test this all, here's a small main class:

Main.java

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;

public class Main {

    public static void main(String[] args) throws Exception {

        CfscriptLexer lexer = new CfscriptLexer(new ANTLRFileStream("Test.cfc"));
        CfscriptParser parser = new CfscriptParser(new CommonTokenStream(lexer));
        ParseTree tree = parser.component();
        ParseTreeWalker.DEFAULT.walk(new VarListener(), tree);
    }
}

If you run this Main class, the following will be printed:

Oops : testing
Oops : testingagain
OK   : test
Oops : mystuff
Oops : interior
Oops : third
Oops : other
Oops : something

Without a doubt, this is not exactly what you want and I probably goofed up some scoping rules of Coldfusion. But I think this will give you some insight in how to solve your problem properly. I think the code is pretty self explanatory, but if this is not the case, don't hesitate to ask for clarification.

HTH

Gibeon answered 24/2, 2013 at 11:15 Comment(11)
This is an incredibly thorough answer but is a lot to digest. I'll be back in a few days =p.Achromatism
No problem @Nucleon. Again: feel free to ask for clarification if I passed over certain aspects too quickly.Gibeon
Hint: ParseTreeWalker.DEFAULT.walk(...)Immoderation
This method works beautifully. I made some tweaks to it but your example was invaluable. You can view the files I derived from this at the github at github.com/owenallenaz/CfscriptVarscoper/tree/master/java. Thanks again.Achromatism
@BartKiers Hey there. I'm a beginner to ANTLR as well. I'm trying to use something like your main code to walk my parsetree, but my generated gardenParser object does not have a component() class. Do you have any idea why this might be?Cyclopentane
@ZachH, the parser has a component method because the grammar has a component rule. Your grammar probably does not have such a rule.Gibeon
Thaat would be why. I was wondering why my resulting parsers had none of the functions available in tutorials. Thanks!Cyclopentane
I'm not using ANTLR but this answer provided insight that's been invaluable in writing methods for my own custom parser. Thanks!Shotton
For me this doesn't work well because left-recursive rules do weird stuff... What can I do for this?Christalchristalle
Best you create a question of your own @paulGibeon
@BartKiers I have already found out how, figured that it doesn't work well with the listener but the visitor did what I needed.Christalchristalle

© 2022 - 2024 — McMap. All rights reserved.