Combined unparser/parser generator
Asked Answered
R

5

18

Is there a parser generator that also implements the inverse direction, i.e. unparsing domain objects (a.k.a. pretty-printing) from the same grammar specification? As far as I know, ANTLR does not support this.

Robertoroberts answered 20/9, 2012 at 17:31 Comment(2)
Seems kinda hard when you have arbitrary actions, but with an attribute grammar it seems easy enough... I can't wait to be humbled by a smart answer :DAceydeucy
@delnan: And in fact, with an attribute grammar it "is easy enough". See my answer; the prettyprinter specification is an attribute grammer with funny syntax.Phyllisphylloclade
C
3

I have implemented a set of Invertible Parser Combinators in Java and Kotlin. A parser is written pretty much in LL-1 style and it provides a parse- and a print-method where the latter provides the pretty printer.

You can find the project here: https://github.com/searles/parsing Here is a tutorial: https://github.com/searles/parsing/blob/master/tutorial.md And here is a parser/pretty printer for mathematical expressions: https://github.com/searles/parsing/blob/master/src/main/java/at/searles/demo/DemoInvert.kt

Cotter answered 10/7, 2019 at 22:30 Comment(0)
R
3

Take a look at Invertible syntax descriptions: Unifying parsing and pretty printing.

Raposa answered 29/12, 2014 at 12:29 Comment(0)
C
3

I have implemented a set of Invertible Parser Combinators in Java and Kotlin. A parser is written pretty much in LL-1 style and it provides a parse- and a print-method where the latter provides the pretty printer.

You can find the project here: https://github.com/searles/parsing Here is a tutorial: https://github.com/searles/parsing/blob/master/tutorial.md And here is a parser/pretty printer for mathematical expressions: https://github.com/searles/parsing/blob/master/src/main/java/at/searles/demo/DemoInvert.kt

Cotter answered 10/7, 2019 at 22:30 Comment(0)
E
2

There are several parser generators that include an implementation of an unparser. One of them is the nearley parser generator for context-free grammars.

It is also possible to implement bidirectional transformations of source code using definite clause grammars. In SWI-Prolog, the phrase/2 predicate can convert an input text into a parse tree and vice-versa.

Erasion answered 3/6, 2017 at 18:34 Comment(0)
P
1

Our DMS Software Reengineering Toolkit does precisely this (and provides a lot of additional support for analyzing/transforming code). It does this by decorating a language grammar with additional attributes, producing what is called an attribute grammar. We use a special DSL to write these rules to make them convenient to write.

It helps to know that DMS produces a tree based directly on the grammar.

Each DMS grammar rule is paired with with so-called "prettyprinting" rule. Each prettyprinting rule describes how to "prettyprint" the syntactic element and sub-elements recognized by its corresponding grammar rule. The prettyprinting process essentially manufactures or combines rectangular boxes of text horizontally or vertically (with optional indentation), with leaves producing unit-height boxes containing the literal value of the leaf (keyword, operator, identifier, constant, etc.

As an example, one might write the following DMS grammar rule and matching prettyprinting rule:

statement = 'for' '(' assignment ';' assignment ';' conditional_expression ')'
            '{' sequence_of_statements '}' ;
<<PrettyPrinter>>: 
    { V(H('for','(',assignment[1],';','assignment[2],';',conditional_expression,')'),
        H('{', I(sequence_of_statements)),
        '}');

This will parse the following:

    for ( i=x*2;
       i--;  i>-2*x ) {  a[x]+=3; 
      b[x]=a[x]-1; }

(using additional grammar rules for statements and expressions) and prettyprint it (using additional prettyprinting rules for those additional grammar rules) as follows:

    for (i=x*2;i--;i>-2*x)
    {   a[x]+=3;
        b[x]=a[x]-1;
    }

DMS also captures comments, attaches them to AST nodes, and regenerates them on output. The implementation is a bit exotic because most parsers don't handle comments, but utilization is easy, even "free"; comments will be automatically inserted in the prettyprinted result in their original places.

DMS can also print in "fidelity" mode. In this form, it tries to preserve the shape of the toke (e.g., number radix, identifier character capitalization, which keyword spelling was used) the column offset (into the line) of a parsed token. This would cause the original text (or something so close that you don't think it is different) to get regenerated.

More details about what prettyprinters must do are provided in my SO answer on Compiling an AST back to source code. DMS addresses all of those topics cleanly.

This capability has been used by DMS on some 40+ real languages, including full IBM COBOL, PL/SQL, Java 1.8, C# 5.0, C (many dialects) and C++14.

By writing a sufficiently interesting set of prettyprinter rules, you can build things like JavaDoc extended to include hyperlinked source code.

Phyllisphylloclade answered 29/12, 2014 at 15:48 Comment(0)
P
-2

It is not possible in general.

What makes a print pretty? A print is pretty, if spaces, tabs or newlines are at those positions, which make the print looking nicely.

But most grammars ignore white spaces, because in most languages white spaces are not significant. There are exceptions like Python but in general the question, whether it is a good idea to use white spaces as syntax, is still controversial. And therefor most grammars do not use white spaces as syntax.

And if the abstract syntax tree does not contain white spaces, because the parser has thrown them away, no generator can use them to pretty print an AST.

Previous answered 11/10, 2012 at 11:58 Comment(1)
I was thinking this. Some grammars (like the one for C) allow a lot of flexibility in things like white space. But some grammars don't, so any printer would automatically be a pretty printer. But it seems like you could probably annotate the grammar with information as to which of the choices is "pretty", and then you could generate a pretty printer from that.Vinitavinn

© 2022 - 2024 — McMap. All rights reserved.