Unparse AST < O(exp(n))?
Asked Answered
E

1

12

Abstract problem description:

The way I see it, unparsing means to create a token stream from an AST, which when parsed again produces an equal AST.

So parse(unparse(AST)) = AST holds.

This is the equal to finding a valid parse tree which would produce the same AST.

The language is described by a context free S-attributed grammar using a eBNF variant.

So the unparser has to find a valid 'path' through the traversed nodes in which all grammar constraints hold. This bascially means to find a valid allocation of AST nodes to grammar production rules. This is a constraint satisfaction problem (CSP) in general and could be solved, like parsing, by backtracking in O(exp(n)).

Fortunately for parsing, this can be done in O(n³) using GLR (or better restricting the grammar). Because the AST structure is so close to the grammar production rule structure, I was really surprised seeing an implementation where the runtime is worse than parsing: XText uses ANTLR for parsing and backtracking for unparsing.

Questions

  1. Is a context free S-attribute grammar everything a parser and unparser need to share or are there further constraints, e.g. on the parsing technique / parser implementation?
  2. I've got the feeling this problem isn't O(exp(n)) in general - could some genius help me with this?
  3. Is this basically a context-sensitive grammar?

Example1:

Area    returns AnyObject   -> Pedestrian | Highway
Highway returns AnyObject   -> "Foo" Car
Pedestrian  returns AnyObject   -> "Bar" Bike
Car     returns Vehicle     -> anyObjectInstance.name="Car"
Bike    returns Vehicle     -> anyObjectInstance.name="Bike" 

So if I have an AST containing

AnyObject -> AnyObject -> Vehicle [name="Car"] and I know the root can be Area, I could resolve it to

Area    -> Highway  -> Car

So the (Highway | Pedestrian) decision depends on the subtree decisions. The problem get's worse when a leaf might be, at first sight, one of several types, but has to be a specific one to form a valid path later on.

Example2:

So if I have S-attribute rules returning untyped objects, just assigning some attributes, e.g.

A -> B C   {Obj, Obj}
X -> Y Z   {Obj, Obj}
B -> "somekeyword"  {0}
Y -> "otherkeyword" {0}
C -> "C" {C}
Z -> "Z" {Z}

So if an AST contains

   Obj
  /  \
"0"  "C"

I can unparse it to

   A
  / \
 B   C

just after I could resolve "C" to C.

While traversing the AST, all constraints I can generate from the grammar are satisfied for both rules, A and X, until I hit "C". This means that for

A -> B | C 
B -> "map"  {MagicNumber_42}
C -> "foreach" {MagicNumber_42}

both solutions for the tree

     Obj
      |
 MagicNumber_42

are valid and it is considered that they have equal semantics ,e.g. syntactic sugar.

Further Information:

Ellie answered 12/8, 2012 at 1:10 Comment(5)
I guess I don't understand. A depth-first in-fix traversal of a parse tree should visit the token leaves in their original order. Is the AST too different from the parse tree to do such a traversal?Expectorant
yes, the parse tree is 'strong typed', so you basically know which particular grammar rule were used to produce a certain node. With a general AST, this information is lost and needs to be reconstructed. So for unparsing the AST, it is sufficient to construct a valid parse tree which produces this AST. Note, there might be several parse trees, but this doesn't matter because any valid of them has the same meaning (syntactic sugar). The problem lies not in traversing the AST, but in tagging the visited nodes with a valid sequence of grammar production rules.Ellie
The transform taking a parse tree to an AST is application-dependent; since it sounds like this is the step you want to invert, you're going to have to tell us about the specific application (language.)Expectorant
the grammar used is a specific eBNF grammar, like ANTLR uses, e.g. r[int a, String b] returns [int c, String d] : a='class' b=ID ... {$c=$a; $d=$b;}. So AST construction is automated and constrained by synthesized attributes.Ellie
If you don't get an acceptable answer at SE:SO try SE:CSTorgerson
D
1

Question 1: no, the grammar itself may not be enough. Take the example of an ambiguous grammar. If you ended up with a unique leftmost (rightmost) derivation (the AST) for a given string, you would somehow have to know how the parser eliminated the ambiguity. Just think of the string 'a+b*c' with the naive grammar for expressions 'E:=E+E|E*E|...'.

Question 3: none of the grammar examples you give is context sensitive. The lefthand-side of the productions are a single non-terminal, there is no context.

Decongestant answered 14/9, 2012 at 16:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.