What's the difference between parse trees and abstract syntax trees (ASTs)?

Asked 17/2, 2011 at 8:19 Answered 19/5, 2018 at 11:16

Solved compiler-construction terminology compiler-theory abstract-syntax-tree parse-tree

128

Are they generated by different phases of a compiling process? Or are they just different names for the same thing?

Enlistee answered 17/2, 2011 at 8:19 Comment(1)

Parse Tree is the result of your grammar with its artifacts (you can write an infinity of grammars for the same language), an AST reduce the Parse Tree the closest possible to the language. Several grammars for the same language will give different parse trees but should result to the same AST. (you can also reduce different scripts (different parse trees from the same grammar) to the same AST) – Maffa 29/8, 2012 at 14:38

120

This is based on the Expression Evaluator grammar by Terrence Parr.

The grammar for this example:

grammar Expr002;

options 
{
    output=AST;
    ASTLabelType=CommonTree; // type of $stat.tree ref etc...
}

prog    :   ( stat )+ ;

stat    :   expr NEWLINE        -> expr
        |   ID '=' expr NEWLINE -> ^('=' ID expr)
        |   NEWLINE             ->
        ;

expr    :   multExpr (( '+'^ | '-'^ ) multExpr)*
        ; 

multExpr
        :   atom ('*'^ atom)*
        ; 

atom    :   INT 
        |   ID
        |   '('! expr ')'!
        ;

ID      : ('a'..'z' | 'A'..'Z' )+ ;
INT     : '0'..'9'+ ;
NEWLINE : '\r'? '\n' ;
WS      : ( ' ' | '\t' )+ { skip(); } ;

Input

x=1
y=2
3*(x+y)

Parse Tree

The parse tree is a concrete representation of the input. The parse tree retains all of the information of the input. The empty boxes represent whitespace, i.e. end of line.

Parse Tree

AST

The AST is an abstract representation of the input. Notice that parens are not present in the AST because the associations are derivable from the tree structure.

AST

For a more through explanation see Compilers and Compiler Generators pg. 23
or Abstract Syntax Trees on pg. 21 in Syntax and Semantics of Programming Languages

Lagena answered 25/3, 2012 at 22:14 Comment(6)

How do you derive the AST from the parse tree? What's the method of simplifying a parse tree into an AST? – Baeza 15/2, 2015 at 8:54

There is no specific algorithm to derive the AST from the parse tree. What goes into the AST is more of a personal preference but must contain enough info to accomplish the task. I excluded the parens from the AST by using the ANTLR ! operator in the grammar since they are not needed, but by default ANTLR would have included them. I think of the parse tree as giving you everything whether you need it or not, and the AST as giving you the bare minimum. Remember that you will traverse the trees a lot, so size matters. – Lagena 5/1, 2016 at 15:48

You mean like CST (concrete syntax tree) vs AST (abstract syntax tree)? – Baeza 7/1, 2016 at 13:36

Semantic actions/rules embedded in a parser or parser generator’s syntax files are the usual way of semantic analysis and creating an AST, while the parse tree is rarely, if ever constructed or used by user code, except perhaps for parser correctness verification. – Asgard 15/6, 2018 at 22:30

Of interest: Abstract semantic graph – Lagena 25/6, 2018 at 13:20

Of interest: Abstract and Concrete Syntax – Lagena 20/12, 2021 at 16:46

Here's an explanation of parse trees (concrete syntax trees, CSTs) and abstract syntax trees (ASTs), in the context of compiler construction. They're similar data structures, but they're constructed differently and used for different tasks.

Parse trees

Parse trees are usually generated as the next step after lexical analysis (which turns the source code into a series of tokens that can be viewed as meaningful units, as opposed to just a sequence of characters).

They are tree-like data structures that shows how an input string of terminals (source code tokens) has been generated by the grammar of the language in question. The root of the parse tree is the most general symbol of the grammar - the start symbol (for example, statement), and the interior nodes represent nonterminal symbols that the start symbol expands to (can include the start symbol itself), such as expression, statement, term, function call. The leaves are the terminals of the grammar, the actual symbols which appear as identifiers, keywords, and constants in the language / input string, e.g. for, 9, if, etc.

While parsing the compiler also performs various checks to ensure the correctness of syntax - and and syntax error reports can be imbedded into parser code.

They can be used for syntax-directed translation via syntax-directed definitions or translation schemes, for simple tasks such as converting an infix expression to a postfix one.

Here's a graphical representation of a parse tree for the expression 9 - 5 + 2 (note the placement of the terminals in the tree and the actual symbols from the expression string):

enter image description here

Abstract syntax trees

ASTs represent the syntactic structure of the some code. The trees of programming constructs such as expressions, flow control statements, etc - grouped into operators (interior nodes) and operands (leaves). For example, the syntax tree for the expression i + 9 would have the operator + as root, the variable i as the operator's left child, and the number 9 as the right child.

The difference here is that nonterminals and terminals don't play a role, as ASTs don't deal with grammars and string generation, but programming constructs, and thus they represent relationships between such constructs, and not the ways they are generated by a grammar.

Note that the operators themselves are programming constructs in a given language, and don't have to be actual computational operators (like + is): for loops would also be treated in this way. For example, you could have a syntax tree such as for [ expr, expr, expr, stmnt ] (represented inline), where for is an operator, and the elements inside the square brackets are its children (representing C's for syntax) - also composed out of operators etc.

ASTs are usually generated by compilers in the syntax analysis (parsing) phase as well, and are used later for semantic analysis, intermediate representation, code generation, etc.

Here's a graphical representation of an AST:

enter image description here

Datcha answered 12/10, 2014 at 22:23 Comment(3)

I wish your answer was an accepted one. It is much more detailed, and better explained. – Unmoved 5/9, 2017 at 4:0

@Unmoved thanks!:) I wrote about these things on my blog as well: flowing.systems/tag/mcd – Datcha 5/9, 2017 at 12:56

Is there any author who defined parse tree for the first time? – Lumberman 24/7, 2019 at 9:1

From what I understand, the AST focuses more on the abstract relationships between the components of source code, while the parse tree focuses on the actual implementation of the grammar utilized by the language, including the nitpicky details. They are definitely not the same, since another term for "parse tree" is "concrete syntax tree".

Higley answered 17/2, 2011 at 8:30 Comment(3)

The link is not pointing to correct information – Saffier 17/2, 2022 at 3:6

Thanks @HrishikeshDevhare. I've just removed it since there's no point keeping it around anymore. – Higley 17/2, 2022 at 22:4

It can still be accessed via Wayback Machine. web.archive.org/web/20020803201420/http://www.jguru.com/faq/… I gave it a read. @KenWayneVanderLinde – Falcate 12/7, 2023 at 20:58

The DSL book from Martin Fowler explains this nicely. The AST only contains all 'useful' elements that will be used for further processing, while the parse tree contains all the artifacts (spaces, brackets, ...) from the original document you parse

Acidulous answered 17/2, 2011 at 8:36 Comment(0)

An AST describes the source code conceptually, it doesn't need to contain all the syntactical elements required to parse some source code (curly braces, keywords, parenthesis etc.).

A Parse tree represents the source code more closely.

In an AST the node for an IF statement could contain just three children:

Condition
If Case
Else Case

For a C-like language the Parse Tree would need to contain nodes for the 'if' keyword, parenthesis, curly braces also.

Cecillececily answered 11/5, 2011 at 17:54 Comment(0)

Wikipedia says

Parse trees concretely reflect the syntax of the input language, making them distinct from the abstract syntax trees used in computer programming.

An answer on Quora says

A parse tree is a record of the rules (and tokens) used to match some input text whereas a syntax tree records the structure of the input and is insensitive to the grammar that produced it.

Combining the above two definitions,

An Abstract Syntax Tree describes the parse tree logically. It does not need to contain all the syntactical constructs required to parse some source code (white spaces, braces, keywords, parenthesis etc). That's why Parse Tree is also called Concrete Syntax Tree while the AST is called Syntax Tree. The output of syntax analyser is, thus, syntax tree actually.

Algeciras answered 19/5, 2018 at 11:16 Comment(0)

Take the pascal assignment Age:= 42;

The syntax tree would look just like the source code. Below I am putting brackets around the nodes. [Age][:=][42][;]

An abstract tree would look like this [=][Age][42]

The assignment becomes a node with 2 elements, Age and 42. The idea is that you can execute the assignment.

Also note that the pascal syntax disappears. Thus it is possible to have more than one language generate the same AST. This is useful for cross language script engines.

Ciceronian answered 4/8, 2015 at 14:26 Comment(0)

In parse tree interior nodes are non terminal, leaves are terminal. In syntax tree interior nodes are operator, leaves are operands.

Ban answered 25/11, 2014 at 12:56 Comment(0)

Parse trees

Abstract syntax trees

Recommended topics

Hot tags