Pretty Printing AST with Minimal Parentheses

Asked 4/12, 2012 at 17:46 Answered 15/2, 2017 at 13:23

javascript compiler-construction abstract-syntax-tree pretty-print parentheses

I'm implementing a pretty-printer for a JavaScript AST and I wanted to ask if someone is aware of a "proper" algorithm to automatically parenthesize expressions with minimal parentheses based on operator precedence and associativity. I haven't found any useful material on the google.

What seems obvious is that an operator whose parent has a higher precedence should be parenthesized, e.g.:

(x + y) * z // x + y has lower precedence

However, there are also some operators which are not associative, in which case parentheses are still are needed, e.g.:

x - (y - z) // both operators have the same precedence

I'm wondering what would be the best rule for this latter case. Whether it's sufficient to say that for division and subtraction, the rhs sub-expression should be parenthesized if it has less than or equal precedence.

Ballistic answered 4/12, 2012 at 17:46 Comment(0)

I stumbled on your question in search of the answer myself. While I haven't found a canonical algorithm, I have found that, like you say, operator precedence alone is not enough to minimally parenthesize expressions. I took a shot at writing a JavaScript pretty printer in Haskell, though I found it tedious to write a robust parser so I changed the concrete syntax: https://gist.github.com/kputnam/5625856

In addition to precedence, you must take operator associativity into account. Binary operations like / and - are parsed as left associative. However, assignment =, exponentiation ^, and equality == are right associative. This means the expression Div (Div a b) c can be written a / b / c without parentheses, but Exp (Exp a b) c must be parenthesized as (a ^ b) ^ c.

Your intuition is correct: for left-associative operators, if the left operand's expression binds less tightly than its parent, it should be parenthesized. If the right operand's expression binds as tightly or less tightly than its parent, it should be parenthesized. So Div (Div a b) (Div c d) wouldn't require parentheses around the left subexpression, but the right subexpression would: a / b / (c / d).

Next, unary operators, specifically operators which can either be binary or unary, like negation and subtraction -, coercion and addition +, etc might need to be handled on a case-by-case basis. For example Sub a (Neg b) should be printed as a - (-b), even though unary negation binds more tightly than subtraction. I guess it depends on your parser, a - -b may not be ambiguous, just ugly.

I'm not sure how unary operators which can be both prefix and postfix should work. In expressions like ++ (a ++) and (++ a) ++, one of the operators must bind more tightly than the other, or ++ a ++ would be ambiguous. But I suspect even if parentheses aren't needed in one of those, for the sake of readability, you may want to add parentheses anyway.

Sigurd answered 22/5, 2013 at 7:31 Comment(0)

It depends on the rules for the specific grammar. I think you have it right for operators with different precedence, and right for subtraction and division.

Exponentiation, however, is often treated differently, in that its right hand operand is evaluated first. So you need

 (a ** b) ** c

when c is the right child of the root.

Which way the parenthesization goes is determined by what the grammar rules define. If your grammar is of the form of

exp = sub1exp ;
exp = sub1exp op exp ;
sub1exp = sub1exp ;  
sub1exp = sub1exp op1 sub2exp ;
sub2exp = sub3exp ;
sub2exp = sub3exp op2 sub2exp ;
sub3exp = ....
subNexp = '(' exp ')' ;

with op1 and op2 being non-associative, then you want to parenthesize the right subtree of op1 if the subtree root is also op1, and you want to parenthesize the left subtree of op2 if the left subtree has root op2.

Muriah answered 13/12, 2012 at 4:21 Comment(0)

There is a generic approach to pretty printing expressions with minimal parentheses. Begin by defining an unambiguous grammar for your expression language which encodes precedence and associativity rules. For example, say I have a language with three binary operators (*, +, @) and a unary operator (~), then my grammar might look like

E -> E0

E0 -> E1 '+' E0       (+ right associative, lowest precedence)
E0 -> E1

E1 -> E1 '*' E2       (* left associative; @ non-associative; same precedence)
E1 -> E2 '@' E2
E1 -> E2

E2 -> '~' E2          (~ binds the tightest)
E2 -> E3

E3 -> Num             (atomic expressions are numbers and parenthesized expressions)
E3 -> '(' E0 ')'

Parse trees for the grammar contain all necessary (and unnecessary) parentheses, and it is impossible to construct a parse tree whose flattening results in an ambiguous expression. For example, there is no parse tree for the string

1 @ 2 @ 3

because '@' is non-associative and always requires parentheses. On the other hand, the string

1 @ (2 @ 3)

has parse tree

E(E0(E1( E2(E3(Num(1)))
         '@'
         E2(E3( '('
                E0(E1(E2(E3(Num(2)))
                      '@'
                      E2(E3(Num(3)))))
                ')')))

The problem is thus reduced to the problem of coercing an abstract syntax tree to a parse tree. The minimal number of parentheses is obtained by avoiding coercing an AST node to an atomic expression whenever possible. This is easy to do in a systematic way:

Maintain a pair consisting of a pointer to the current node in the AST and the current production being expanded. Initialize the pair with the root AST node and the 'E' production. In each case for the possible forms of the AST node, expand the grammar as much as necessary to encode the AST node. This will leave an unexpanded grammar production for each AST subtree. Apply the method recursively on each (subtree, production) pair.

For example, if the AST is (* (+ 1 2) 3), then proceed as follows:

expand[ (* (+ 1 2) 3); E ]  -->  E( E0( E1( expand[(+ 1 2) ; E1]
                                            '*'
                                            expand[3 ; E2] ) ) )

expand[ (+ 1 2) ; E1 ] --> E1(E2(E3( '('
                                     E0( expand[ 1 ; E1 ]
                                         '+'
                                         expand[ 2 ; E0 ] )
                                     ')' )))

...

The algorithm can of course be implemented in a much less explicit way, but the method can be used to guide an implementation without going insane :).

Beem answered 15/2, 2017 at 13:23 Comment(0)

Recommended topics

Hot tags