How do I get to the bottom of an AST Expression

Asked 24/4, 2016 at 16:55 Answered 2/5, 2016 at 18:5

Solved java eclipse abstract-syntax-tree eclipse-jdt

I'm new to AST (my first time writing a plugin). Expressions can be quite complex in real life. I wanted to know how to resolve the left and right side of an asignment, for instance.

class Visitor extends ASTVisitor
{
    @Override
    public boolean visit(Assignment node)
    {
        //here, how do I get the final name to each each side of the assignment resolves?
    }
}

I have also another doubt, how do I get the instance used to invoke a method?

public boolean visit(MethodInvocation node)
{
    //how do I get to know the object used to invoke this method?
    //like, for example, MyClass is a class, and it has a field called myField
    //the type of myField has a method called myMethod.
    //how do I find myField? or for that matter some myLocalVariable used in the same way.
}

suppose the following assignment

SomeType.someStaticMethod(params).someInstanceMethod(moreParams).someField =
     [another expression with arbitrary complexity]

how do I get to someField from the Assigment node?

And also, what property of a MethodInvocation gives me the instance used to invoke the method?

EDIT 1: My question was apparently unclear, given the answer I received. I don't want to solve this particular expression. I want to be able to, given any assignment, find out the name it's being assigned to, and the name (if not an rvalue) that assigns to the first.

So, for instance, the parameters of the method invocations could be field accesses or previously declared local variables.

SomeType.someStaticMethod(instance.field).someInstanceMethod(type.staticField, localVariable, localField).Field.destinationField

So, here is the hopefully objective question: Given any assignment statement with both the left side and right side having arbitrary complexity, how to get the ultimate field/variable that gets assigned to, and the ultimate (if any) field/variable that assigns to it.

EDIT 2: To be more specific, what I want to achieve is immutability, via an annotation @Const:

/**
* When Applied to a method, ensures the method doesn't change in any
* way the state of the object used to invoke it, i.e., all the fields
* of the object must remain the same, and no field may be returned,
* unless the field itself is marked as {@code @Const} or the field is
* a primitive non-array type. A method  annotated with {@code @Const} 
* can only invoke other {@code @Const} methods of its class, can only 
* use the class's fields to invoke {@code @Const} methods of the fields 
* classes and can only pass fields as parameters to methods that 
* annotate that formal parameter as {@code @Const}.
*
* When applied to a formal parameter, ensures the method will not
* modify the value referenced by the formal parameter. A formal   
* parameter annotated as {@code @Const} will not be aliased inside the
* body of the method. The method is not allowed to invoke another 
* method and pass the annotated parameter, save if the other method 
* also annotates the formal parameter as {@code @Const}. The method is 
* not allowed to use the parameter to invoke any of its type's methods,
* unless the method being invoked is also annotated as {@code @Const}
* 
* When applied to a field, ensures the field cannot be aliased and that
* no code can alter the state of that field, either from inside the   
* class that owns the field or from outside it. Any constructor in any
* derived class is allowed to set the value of the field and invoke any
* methods using it. As for methods, only those annotated as
* {@code @Const} may be invoked using the field. The field may only be
* passed as a parameter to a method if the method annotates the 
* corresponding formal parameter as {@code @Const}
* 
* When applied to a local variable, ensures neither the block where the
* variable is declared or any nested block will alter the value of that 
* local variable. The local variable may be defined only once, at any
* point where it is in scope and cannot be aliased. Only methods
* annotated as {@code @Const} may be invoked using this variable, and 
* the variable  may only be passed as a parameter to another method if 
* said method annotates its corresponding formal parameter as
* {@code @Const}
*
*/
@Retention(RetentionPolicy.SOURCE)
@Target({ElementType.METHOD, ElementType.PARAMETER, ElementType.FIELD,
ElementType.LOCAL_VARIABLE})
@Inherited
public @interface Const
{

}

To achieve this, First thing I have to do is cases when the left side of an assignment is marked as @Const (easy enough). I must also detect when the right side of and expression is a field marked as @Const, in which case it can only be assigned at the definition of a @Const variable of the same type.

The problem is I'm having a real hard time finding the ultimate field on the right side of the expression, to avoid aliasing the field and rendering the @Const annotation useless.

Chaconne answered 24/4, 2016 at 16:55 Comment(7)

What library are you using here? (PMD or something like that?) – Triviality 26/4, 2016 at 18:39

eclipse jdt for plugin development. – Chaconne 26/4, 2016 at 18:40

ok, so you want to reach the leaf of the AST formed? – Triviality 26/4, 2016 at 18:42

I want to reach the Name of the fields (or local variables or parameters) that are the left and right resolved expressions of the assignment. so I can resolve it and get its annotations. – Chaconne 26/4, 2016 at 18:44

@Chaconne Having a look at some other questions that you have posted recently, I get the feeling that you might be interested in having a look at Annotation Processors and/or the LombokProject. Both are intended to perform custom processing on the AST of Java source files as they are being parsed. Annotation Processors get a read only access to the AST tree (but still can add warning and errors to it), while LombokProject gets a read and write access to the AST tree. Maybe that can help you. – Gorgoneion 29/4, 2016 at 22:30

Is there much difference of your @Const annotation of fields and local variables to the final keyword? – Chamorro 11/5, 2016 at 5:30

@sevenforce yes, many differences, unless the field or local is of a primitive type. – Chaconne 11/5, 2016 at 16:53

Visitors are a really great tool, but the proper solution to a specific problem is not always to have a single visitor patiently wait until its visit methods get invoked... The question you are asking is an example of such a situation.

Let's rephrase what you are trying to do:

You want to identify to identify every assignment (that is leftSide = rightSide)
For each assignment, you want to determine the nature of the left hand side (that is, either it is a local variable or a field access), and if it is indeed a field access, you want to build a "path" corresponding to that field (that is the source object, followed by a series of either method call or field access, and ending with a field access).
For each assignment, you want to determine a similar "path" corresponding to the right hand side.

I think that you have already resolved point number 1: you simply create a class that extends org.eclipse.jdt.core.dom.ASTVisitor; there, you override the #visit(Assignment) method. Finally, wherever it is appropriate, you instantiate your visitor class, and have it visit an AST tree, starting at which ever node that is match for your needs (most likely an instance of CompilationUnit, TypeDeclaration or MethodDeclaration).

Then what? The #visit(Assignment) method indeed receives an Assignment node. Directly on that object, you can obtain both the left hand side and right hand side expressions (assignment.getLeftHandSide() and assignment.getRightHandSide()). As you mentioned, both are Expressions, which can turn out being pretty complicated, so how can we extract a clean, linear "path" out of these subtrees? A visitor is certainly the best way to do so, but here's the catch, it should be done using distinct visitors, rather than having your first visitor (the one catching Assignments) continue its descend either side expressions. It is technically possible to do it all using a single visitor, but that would involve significant state management inside that visitor. I'm pretty pretty much convinced anyway that the complexity of such management would be so high that such implementation would actually be less efficient that the distinct visitors approach.

So we could image something like this:

class MyAssignmentListVisitor extends ASTVisitor {
    @Override
    public boolean visit(Assignment assignment) {
        FieldAccessLineralizationVisitor leftHandSideVisitor = new FieldAccessLineralizationVisitor();
        assignment.getLeftHandSide().accept(leftHandSideVisitor);
        LinearFieldAccess leftHandSidePath = leftHandSideVisitor.asLinearFieldAccess();

        FieldAccessLineralizationVisitor rightHandSideVisitor = new FieldAccessLineralizationVisitor();
        assignment.getRightHandSide().accept(rightHandSideVisitor);
        LinearFieldAccess rightHandSidePath = rightHandSideVisitor.asLinearFieldAccess();

        processAssigment(leftHandSidePath, rightHandSidePath);

        return true;
    }
}

class FieldAccessLineralizationVisitor extends ASTVisitor {

    List<?> significantFieldAccessParts = [...];

    // ... various visit method expecting concrete subtypes of Expression ...

    @Override
    public boolean visit(Assignment assignment) {
        // Found an assignment inside an assignment; ignore its
        // left hand side, as it does not affect the "path" for 
        // the assignment currently being investigated

        assignment.getRightHandSide().accept(this);

        return false;
    }
}

Note in this code that MyAssignmentListVisitor.visit(Assignment) returns true, to indicate that children of the assignment should recursively be inspected. This may sounds unnecessary at first, the Java language indeed support several constructions where an assignment may contain other assignments; consider for example the following extreme case:

(varA = someObject).someField = varB = (varC = new SomeClass(varD = "string").someField);

For the same reason, only the right hand side of an assignment is visited during linearization of an expression, given that the "resulting value" of an assignment is its right hand side. The left hand side is, in this situation, a mere side effect that can be safely ignored.

I will not go any further in prototyping how paths are actually modelled, given that I don't know the nature of informations that are needed for your specific situation. It may also be more appropriate for you to create distinct visitor classes respectively for the left hand side expression and the right hand side expression, for example to better handle the fact that the right hand side might actually involve multiple variables/fields/method invocations combined through binary operators. That will have to be your decision.

There is still some major concerns regarding visitor traversal of the AST tree to be discussed, namely that, by relying on the default node traversal order, you loose the opportunity to get information on the relationship between each nodes. For example, given expression this.someMethod(this.fieldA).fieldB, you will see something similar to the following sequence:

FieldAccess      => corresponding to the whole expression
MethodInvovation => corresponding to this.someMethod(this.fieldA)
ThisExpression
SimpleName ("someMethod")
FieldAccess      => corresponding to this.fieldA
ThisExpression
SimpleName ("fieldA")
SimpleName ("fieldB")

There simply no way you can actually deduce the linearized expression from this sequence of events. You will instead want to explicitly intercept every nodes, and explicitly recurse on a node's children only if appropriate, and in appropriated order. For example, we could have done the following:

    @Override
    public boolean visit(FieldAccess fieldAccess) {
        // FieldAccess :: <expression>.<name>

        // First descend on the "subject" of the field access
        fieldAccess.getExpression().accept(this);

        // Then append the name of the accessed field itself
        this.path.append(fieldAccess.getName().getIdentifier());

        return false;
    }

    @Override
    public boolean visit(MethodInvocation methodInvocation) {
        // MethodInvocation :: <expression>.<methodName><<typeArguments>>(arguments)

        // First descend on the "subject" of the method invocation
        methodInvocation.getExpression().accept(this);

        // Then append the name of the accessed field itself
        this.path.append(methodAccess.getName().getIdentifier() + "()");

        return false;
    }

    @Override
    public boolean visit(ThisExpression thisExpression) {
        // ThisExpression :: [<qualifier>.] this

        // I will ignore the qualifier part for now, it will be up
        // to you to determine if it is pertinent
        this.path.append("this");

        return false;
    }

These methods would, given the preceding example, collect in path the following sequence: this, someMethod(), fieldB. This is, I believe, pretty close to what you are looking for. Should you want to collect all field access/method invocations sequences (for example, you would like your visitor to return both this,someMethod(),fieldB and this,fieldA), then you could rewrite the visit(MethodInvocation) method roughly similar to this:

    @Override
    public boolean visit(MethodInvocation methodInvocation) {
        // MethodInvocation :: <expression>.<methodName><<typeArguments>>(arguments)

        // First descend on the "subject" of the method invocation
        methodInvocation.getExpression().accept(this);

        // Then append the name of the accessed field itself
        this.path.append(methodAccess.getName().getIdentifier() + "()");

        // Now deal with method arguments, each within its own, distinct access chain
        for (Expression arg : methodInvocation.getArguments()) {
            LinearPath orginalPath = this.path;
            this.path = new LinearPath();

            arg.accept(this);

            this.collectedPaths.append(this.path);
            this.path = originalPath;
        }

        return false;
    }

Finally, should you be interested in knowing the type of values at the each step in the path, you will have to have a look at the binding objects associated with each node, for example: methodInvocation.resolveMethodBinding().getDeclaringClass(). Note however that binding resolution must have been explicitly requested at the construction of the AST tree.

There are many more language constructs that will not be correctly handled by the above code; still, I believe that you should be able to work out these remaining issues yourself. Should you have need for a reference implementation to look at, check out class org.eclipse.jdt.internal.core.dom.rewrite.ASTRewriteFlattener, which basically reconstruct Java source code from an existing AST tree; though this specific visitor is much larger than most other ASTVisitors, it is somewhat much easier to understand.

UPDATE IN RESPONSE TO OP'S EDIT #2

Here's an update starting point following your most recent edit. There are still many cases to be handled, but that is more in line with your specific problem. Note also that though I used numerous instanceof checks (because that's easier for me at current time, given that I'm writing the code in a simple text editor, and don't have code completion on ASTNode constants), you may opt instead for a switch statement on node.getNodeType(), which will usually be more efficient.

class ConstCheckVisitor extends ASTVisitor {

    @Override
    public boolean visit(MethodInvocation methodInvocation) {    
        if (isConst(methodInvocation.getExpression())) {
            if (isConst(methodInvocation.resolveMethodBinding().getMethodDeclaration()))
                reportInvokingNonConstMethodOnConstSubject(methodInvocation);
        }

        return true;
    }

    @Override
    public boolean visit(Assignment assignment) {
        if (isConst(assignment.getLeftHandSide())) {
            if ( /* assignment to @Const value is not acceptable in the current situation */ )
                reportAssignmentToConst(assignment.getLeftHandSide());

            // FIXME: I assume here that aliasing a @Const value to
            //        another @Const value is acceptable. Is that right?

        } else if (isImplicitelyConst(assigment.getLeftHandSide())) {
            reportAssignmentToImplicitConst(assignment.getLeftHandSide());        

        } else if (isConst(assignment.getRightHandSide())) {
            reportAliasing(assignment.getRightHandSide());
        }

        return true;
    }

    private boolean isConst(Expression expression) {
        if (expression instanceof FieldAccess)
            return (isConst(((FieldAccess) expression).resolveFieldBinding()));

        if (expression instanceof SuperFieldAccess)
            return isConst(((SuperFieldAccess) expression).resolveFieldBinding());

        if (expression instanceof Name)
            return isConst(((Name) expression).resolveBinding());

        if (expression instanceof ArrayAccess)
            return isConst(((ArrayAccess) expression).getArray());

        if (expression instanceof Assignment)
            return isConst(((Assignment) expression).getRightHandSide());

        return false;
    }

    private boolean isImplicitConst(Expression expression) {
        // Check if field is actually accessed through a @Const chain
        if (expression instanceof FieldAccess)
            return isConst((FieldAccess expression).getExpression()) ||
                   isimplicitConst((FieldAccess expression).getExpression());

        // FIXME: Not sure about the effect of MethodInvocation, assuming
        //        that its subject is const or implicitly const

        return false;
    }

    private boolean isConst(IBinding binding) {
        if ((binding instanceof IVariableBinding) || (binding instanceof IMethodBinding))
            return containsConstAnnotation(binding.getAnnotations());

        return false;
    }
}

Hope that helps.

Gorgoneion answered 29/4, 2016 at 22:15 Comment(9)

All that class depends on me having a RewriteEventStore instance. I have no idea how to get one and I suspect this is not the way to go. I'll update my question with more details. – Chaconne 30/4, 2016 at 14:51

There shouldn't be such requirement... How exactly are you injecting your code into JDT? – Gorgoneion 30/4, 2016 at 21:35

@Chaconne Thank you for your update, that helps a lot indeed. Adding extra compilation warning or errors into JDT is essentially done by implementing and registering a CompilationParticipant. You can get an example starting point for doing so here: #24195642. In reconcile(), you can obtain the current file's AST using reconcileContext.getAST8(); you then navigate it through a visitor. Use reconcileContext.putProblems() to add problem markers that will appears in the editor. – Gorgoneion 2/5, 2016 at 14:3

I should however warn you that the contract that you are trying to implement is pretty complex to implement properly and efficiently. My main concern is that your code will have to systematically navigate all fields bindings in a source file to determine if the field has or has not the @Const annotation. This will probably slow down the compilation process significantly. – Gorgoneion 2/5, 2016 at 14:15

Your objective appears to be pretty similar to that of checking null-related annotations; yet, as far as I know, all proper implementations of that feature in JDT (there has been a few propositions over time) relies on the addition of new flags directly inside JDT's Java model and binding model, in order to avoid the significant overhead of systematically reanalyze other classes. These extra flags also make it much more easier to implement Quick Fix and other contributions related to such complex contracts. – Gorgoneion 2/5, 2016 at 14:23

You should definitely have a look at the following presentation: eclipsecon.org/2012/sites/eclipsecon.org.2012/files/…. It discuss several points that, I believe, are pertinent to what you are working on. – Gorgoneion 2/5, 2016 at 14:50

Reading your problem description a second time, you should not only look for Assignments, but also for MethodInvocation where the subject is marked as @Const, in order to determine if the invoked method is indeed acceptable (that is, it is marked as @Const). The same goes with FieldAccess nodes. – Gorgoneion 2/5, 2016 at 20:12

I plan to check on method invocations too, ehre I can work with @const methods,parameters, fields and locals. This problem has been so consuming and since I'm failing to actually solve it I'm become sad and disillusioned. – Chaconne 3/5, 2016 at 10:18

Static validation is often much more difficult to implement than it first appears; your @Const annotation is one of the hard one. But I doubt that you really need it to be rock solid from day one… Take it one step at a time, adding new checks as you discover remaining problems. Take the time to explore JDT's Java AST models, so that you get insights on potential corner cases that you will want to test. It is predictable though that this process will most certainly lead to bloated and inefficient code. A full rewrite will be highly desirable once you have gained a more complete understanding. – Gorgoneion 3/5, 2016 at 13:44

First quoting from an answer I posted:

You will have to work with bindings. To have bindings available, that means resolveBinding() not returning null, possibly additional steps I have posted are necessary.

The following visitor should help you doing what you want:

class AssignmentVisitor extends ASTVisitor {

    public boolean visit(Assignment node) {
        ensureConstAnnotationNotViolated(node);
        return super.visit(node);
    }

    private void ensureConstAnnotationNotViolated(Assignment node) {
        Expression leftHandSide = node.getLeftHandSide();
        if (leftHandSide.getNodeType() == ASTNode.FIELD_ACCESS) {
            FieldAccess fieldAccess = (FieldAccess) leftHandSide;
            // access field IVariableBinding
            fieldAccess.resolveFieldBinding();
            // access IAnnotationBindings e.g. your @const
            fieldAccess.resolveFieldBinding().getAnnotations();
            // access field ITypeBinding
            fieldAccess.getExpression().resolveTypeBinding();
        } else {
            // TODO: check possible other cases
        }

    }
}

Chamorro answered 2/5, 2016 at 18:5 Comment(0)

You can further visit node.getLeftHandSide(). I think a good example can be found in the Sharpen (Java2C# translation) code:

https://github.com/mono/sharpen/blob/master/src/main/sharpen/core/CSharpBuilder.java#L2848

A simple sample project here: https://github.com/revaultch/jdt-sample

Junit-Test here: https://github.com/revaultch/jdt-sample/blob/master/src/test/java/ch/revault/jdt/test/VisitorTest.java

Superheterodyne answered 26/4, 2016 at 18:50 Comment(5)

if I do that wouldn't I get the whole expression? I want the final leaf of the sub tree. – Chaconne 26/4, 2016 at 19:2

If you visit the LHS further down you will eventially find a visit(FieldAccess) or a visit(QualifiedName) – Superheterodyne 26/4, 2016 at 19:32

and how do I do that? there's no visitor for Expression. Ok but even if I manage to visit a field access, how do I know it's the leaf node of the expression? – Chaconne 26/4, 2016 at 19:50

could you write a method that finds the leaf node on both subtrees on either side of an assignment? – Chaconne 27/4, 2016 at 0:8

My example was a poor example, let me clarify what I really meant: what if there's a FieldAccess in between, meaning it's not the leaf node of the expression. So what I meant is: How do I find out the final name the expression resolves to? How do I find the name that gets assigned to and the right hand side name that assigns to it? And there could be no final name in the left hand side if it happened to be an rvalue. So how to write a visitor that works for all those cases? – Chaconne 28/4, 2016 at 10:20