Specifying a dynamic priority and precedence for an operator in Menhir/Ocamlyacc
Asked Answered
V

1

7

I'm trying to parse a language where the operators have a dynamic attributes (priority and precedence) using the Menhir parser (similar to Ocamlyacc). During the lexing phase, all the operators fill a OP:string token (so "+" turns into (OP "+"), etc).

The operator attributes are determined at parse time and fill a table associating operators and their attributes. Given this table, how can I instruct Menhir to dynamically change the priority of the rule parsing the operators based on this table's data ?

Thanks, CharlieP.

Vaginismus answered 1/7, 2012 at 16:8 Comment(0)
M
15

I'm sorry for answering with a "you're doing it wrong" kind of comment. I have three objections I hope are constructive, in decreasing order of relevance:

  1. Menhir is not meant for dynamic grammar updates; if you insist on changing your grammar at parse-time, you should use a tool that provides this feature, such as the GLR parser Dypgen. The Dypgen manual mentions the possibility of dynamically updating operator priorities, in a constrained way (it seems you can add new operators and corresponding priorities, but not change priority of existing ones) that may or may not match your needs. See section 6.6 of the Dypgen manual (PDF), page... 42.

  2. Dynamically updating a CFG grammar is, I think, not the best way to handle user-defined operator precedences. Agda has very general user-defined mixfix operators, and their solution is roughly the following: use your CFG parser to parse the grammatical structure that is statically known, but for expression that possibly use fancy precendences and associativities, just parse them into a list of tokens. For example, let x = if foo then x + y * z else bar would be parsed into something like Let(x, If(foo, Expr(x, +, y, *, z), bar). A later specialized pass can gather the required information to post-parse those into Expr nodes into their specialized structure. Use parser generators for what they're good for (statically-known rich CFG), and use a post-processing pass for the complex, ill-defined, dynamic stuff. The Agda guys have some literature on the topic, for example Parsing Mixfix Operators, Danielsson and Norell, 2009.

    From a design point of view, I strongly urge you to separate your lexing and parsing in several different passes, each of them well-defined and using only information gathered on the previous structure, instead of trying to dynamically change its own behavior. You'll have something much simpler and much more robust.

  3. Dynamic or user-defined precedence and priorities are, in my opinion, a bit evil. OCaml has a different system where operator precedence priorities is determined by their first few characters (eg. @, @@ and @+ are all right-associative). It is a bit restrictive for the people choosing an infix operator, but makes code reader lives much more comfortable, as they have only one set of grammar rules to learn, instead of having to dynamically adapt their eyes to any new piece of code. If you want to allow for insertion of wild, foreign pieces of code with an entirely different syntax, quotations mechanisms (eg. camlp4 <:foo< ... >>) are much more robust than fiddling with operator-level associativities and priorities, and also much simpler to parse.

    That said, projects have different needs and I would completely understand if you insisted on having dynamically changing operators precedence and associativities for some application I don't know about. Just keep in mind that it's not the only way around, and sometimes consistency and simplicity are better than absolute flexibility.

Maliamalice answered 2/7, 2012 at 8:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.