Abstract syntax tree construction and traversal

Asked 27/5, 2011 at 18:15 Answered 27/5, 2011 at 18:44

Solved language-agnostic traversal abstract-syntax-tree construction

I am unclear on the structure of abstract syntax trees. To go "down (forward)" in the source of the program that the AST represents, do you go right on the very top node, or do you go down? For instance, would the example program

a = 1
b = 2
c = 3
d = 4
e = 5

Result in an AST that looks like this: enter image description here

or this: enter image description here

Where in the first one, going "right" on the main node will advance you through the program, but in the second one simply following the next pointer on each node will do the same.

It seems like the second one would be more correct since you don't need something like a special node type with a potentially extremely long array of pointers for the very first node. Although, I can see the second one becoming more complicated than the first when you get into for loops and if branches and more complicated things.

Pernas answered 27/5, 2011 at 18:15 Comment(0)

The first representation is the more typical one, though the second is compatible with the construction of a tree as a recursive data structure, as may be used when the implementation platform is functional rather than imperative.

Consider:

enter image description here

This is your first example, except shortened and with the "main" node (a conceptual straw man) more appropriately named "block," to reflect the common construct of a "block" containing a sequence of statements in an imperative programming language. Different kinds of nodes have different kinds of children, and sometimes those children include collections of subsidiary nodes whose order is important, as is the case with "block." The same might arise from, say, an array initialization:

int[] arr = {1, 2}

Consider how this might be represented in a syntax tree:

enter image description here

Here, the array-literal-type node also has multiple children of the same type whose order is important.

Mersey answered 27/5, 2011 at 18:30 Comment(0)

Where in the first one, going "right" on the main node will advance you through the program, but in the second one simply following the next pointer on each node will do the same.

It seems like the second one would be more correct since you don't need something like a special node type with a potentially extremely long array of pointers for the very first node

I'd nearly always prefer the first approach, and I think you'll find it much easier to construct your AST when you don't need to maintain a pointer to the next node.

I think its generally easier to have all objects descend from a common base class, similar to this:

abstract class Expr { }

class Block : Expr
{
    Expr[] Statements { get; set; }
    public Block(Expr[] statements) { ... }
}

class Assign : Expr
{
    Var Variable { get; set; }
    Expr Expression { get; set; }
    public Assign(Var variable, Expr expression) { ... }
}

class Var : Expr
{
    string Name { get; set; }
    public Variable(string name) { ... }
}

class Int : Expr
{
    int Value { get; set; }
    public Int(int value) { ... }
}

Resulting AST is as follows:

Expr program =
    new Block(new Expr[]
        {
            new Assign(new Var("a"), new Int(1)),
            new Assign(new Var("b"), new Int(2)),
            new Assign(new Var("c"), new Int(3)),
            new Assign(new Var("d"), new Int(4)),
            new Assign(new Var("e"), new Int(5)),
        });

Virg answered 27/5, 2011 at 18:44 Comment(0)

It depends on the language. In C, you'd have to use the first form to capture the notion of a block, since a block has a variable scope:

{
    {
        int a = 1;
    }
    // a doesn't exist here
}

The variable scope would be an attribute of what you call the "main node".

Leighleigha answered 27/5, 2011 at 18:21 Comment(4)

Sorry, I don't understand "binary main node", and I can't really understand it from your Lisp example (probably because I've never used Lisp). Could you elaborate a little more? – Pernas 27/5, 2011 at 18:24

@Seth: then never mind the Lisp-like example. I've posted a C-like example of why you'd need the first form. – Leighleigha 27/5, 2011 at 18:26

so you'd have to use a node with n child nodes for the root node (to hold the main program and stuff) and also for what? I didn't know ASTs were used to show block scope or whatever. – Pernas 27/5, 2011 at 18:31

@Seth: ASTs represent the syntactic structure of a program. Syntactic/lexical scoping is part of that. – Leighleigha 27/5, 2011 at 19:13

I believe your first version make more sense, for a couple of reasons.

Firstly, the first more clearly demonstrates the "nestedness" of the program, and also is clearly implemented as a rooted tree (which is the usual concept of a tree).

The second, and more important reason, is that your "main node" could really have been a "branch node" (for example), which can simply be another node within a larger AST. This way, your AST can be viewed in a recursive sense, where each AST is a node with other ASTs as it children. This make the design of the first much simpler, more general, and very homogeneous.

Bedabble answered 27/5, 2011 at 18:29 Comment(0)

Suggestion: When dealing with tree data structures, wheter is compiler-related AST or other kind, always use a single "root" node, it may help you perform operations and have more control:

class ASTTreeNode {
  bool isRoot() {...}

  string display() { ... }  
  // ...
}

void main ()
{
  ASTTreeNode MyRoot = new ASTTreeNode();

  // ...

  // prints the root node, plus each subnode recursively
  MyRoot.Show();
}

Cheers.

Spherule answered 27/5, 2011 at 18:24 Comment(0)

Recommended topics

Hot tags