Parser Combinators, separating grammar and AST construction
Asked Answered
P

4

22

I'm writing a simple functional programming language in Scala using the parser-combinators library.

The syntax is specified here: https://github.com/hejfelix/Frase/blob/master/src/main/scala/it/vigtig/lambda/ParserLike.scala

There is one thing which I haven't been able to solve with the implementation: how do I separate grammar definition from the transformation into AST nodes?

Would be really cool to have a close-to-human-readable grammar directly the parser source, especially considering that I'm the only programmer on the project ATM and it could serve as documentation.

How do I separate the grammar and AST-specific code?

Pelag answered 31/10, 2015 at 10:14 Comment(0)
H
11

This is a great question, and one that I struggled with for a long time before coming up with a solution that I feel works pretty well for me.

When building a parser, I use two different syntax trees:

  • an Concrete Syntax Tree, or CST: this is a tree form of the text, and has a 1:1 correspondence with the text. Everything that appears in the text will also appear in the CST.

  • an Abstract Syntax Tree, or AST: this doesn't necessarily have a 1:1 correspondence with the text, as unnecessary textual details (such as braces, punctuation, etc.) have been removed and don't appear in the AST.

Thus, getting from the input text to the AST has two steps: the first step is to parse the input string into a CST; the second step is to convert the CST to an AST, throwing away unnecessary details.

  1. String -> CST This is where I use parser combinators. I don't do any manipulation of the tree structure in this stage. The structure of the CST is entirely determined by the combinators used. Each combinator produces a sub-tree of a certain shape, which I never change in this stage. There aren't any actions attached to combinators, so the grammar definition is clean and free of any AST information.

  2. CST -> AST This is where I massage the parse tree, extracting the important stuff, ignoring the rest. This is also where I often perform context-sensitive checks (for example: checking that a function definition doesn't have duplicate parameter names), keeping these details out of the actual parsing stage.


Example: here's a JSON parser I built using this method:

Hackle answered 5/11, 2015 at 13:45 Comment(5)
This sounds exactly like what I want, but I am unsure how you accomplish 1. Do you have code examples, or could you clarify how you would change my code from the link in the question?Pelag
@Pelag -- I added an example (it's in Javascript instead of Scala, unfortunately, but the principle is the same). Please let me know if I should explain it more -- I'm happy to do so!Hackle
@Pelag -- in order to do this in your code, I'd recommend 1. removing actions like ^^ { case id ~ _ ~ term => Abstr(id, term) } from the parser, 2. creating a new module for converting a CST to an AST.Hackle
Won't I need to be explicit in the types to actually get PackratParsers then?Pelag
@Pelag that's a great point, and one that I hadn't considered yet. I need to think about that a bit more -- thanks for pointing it out!Hackle
C
2

Well, in principle, all your AST transformation have a specific type. You could define them all elsewhere and use them from your grammar definition. It would clear up things somewhat. Alternatively, you could define your grammar definitions as "pass by name" functions that are evaluated on call, then use them from your transformation.

Basically any language allows you to break complexity by defining things somewhere and referring to them anywhere else. Since scala allows you to have functions as values this is even easier.

Cutting answered 2/11, 2015 at 20:15 Comment(2)
This solution, however, forces the grammar-definition to make explicit support for different AST-transformations. Do you think it is possible to do something similar, while providing the grammer without any transformations, i.e. all parsers should simply be Parser[String]?Pelag
Depends on what references what, as I stated. In the other direction, there is no such dependency. Even in the first direction, if you are able to keep generic types, you can use a trait that defines the transformations without specifying them (yet). Not an expert in the field though, last time I did something of this kind I used LEX & YACC to keep these pieces separate. You on the other hand will have to do with standard Scala language features...Cutting
V
1

Would be really cool to have a close-to-human-readable grammar directly "building" the parser source...

I wonder what that "close-to-human-readable grammar" is?

How do I separate grammar definition from the transformation into AST nodes?

What you have is a handwritten Packrat Parser.

I could be wrong, but i understand this question as a request to use a standalone grammar definition to build a parser. And then use that parser to get the syntax tree of the parsed source.

So, the grammer could be an EBNF or PEG or CFG or "your own" grammar, right?

Anyway...

  • Lets start with a "separate grammar definition", e.g. EBNF.

  • Then you need a parser for the grammar, e.g. an EBNFParser.

  • Parsing the grammar with that parser results is an internal structure of that grammar: a syntax tree.

  • Given a syntax tree for a valid grammar, you may return an association list with the keys (as the meta identifiers) and attach grammar rules to them.

    foreach grammar key add matching grammar rule

  • That means that you need to pick a Grammar Rule identified by RuleName and add its Rule to a "Constructed Parser".

  • At the end: you have a "Constructed Parser" assembled from individual "Grammar Rules" able to parse Source defined by the given Grammar.

  • Parsing the Source, gives you a Syntax Tree for the source.

Pass 1

Grammar -> GrammarParser -> GrammarTree -> GrammarRules -> ConstructedParserForGrammar

Pass 2

Source -> ConstructedParserForGrammar -> Syntax Tree -> Transformations...

In other words: its quite a puzzle to go from BNF to an automatically constructed Packrat parser.

Vishinsky answered 6/11, 2015 at 18:31 Comment(0)
P
0

Since this commit

https://github.com/scala/scala-parser-combinators/commit/33792d3380791ddafb41760604857d2fc43e54e1

Parser combinators links to a post that accurately solves my problem. This is, imho, the most precise answer to my question.

The post here https://enear.github.io/2016/03/31/parser-combinators/ lexes into a concrete syntax tree first (lexing) and then produces an AST afterwards.

I leave it here because it adds an example to the accepted answer.

Pelag answered 23/3, 2017 at 16:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.