Combine similar constructs in recursive rules
Asked Answered
O

1

1

This is for a parser in Jison but I guess the same applies for Bison.

I have a rule that has a definition for an expression.

expr
    : NUMBER -> { type: "number", value: $1 }
    | "(" expr ")" -> $2
    | expr "+" expr -> { type: "+", left: $1, right: $3 }
    | expr "-" expr -> { type: "-", left: $1, right: $3 }
    | expr "*" expr -> { type: "*", left: $1, right: $3 }
    | expr "/" expr -> { type: "/", left: $1, right: $3 }
    ;

I the same grammar I also have a rule for a "filter expression" that also supports "parameters".

filterExpr
    : NUMBER -> { type: "number", value: $1 }
    | PARAM -> { type: "param", name: $1 } /* parameter */
    | "(" filterExpr ")" -> $2
    | filterExpr "+" filterExpr -> { type: "+", left: $1, right: $3 }
    | filterExpr "-" filterExpr -> { type: "-", left: $1, right: $3 }
    | filterExpr "*" filterExpr -> { type: "*", left: $1, right: $3 }
    | filterExpr "/" filterExpr -> { type: "/", left: $1, right: $3 }
    ;

This works but when I add operators I have to change both definitions. Is there a way to combine the common part of both "expr" and "filterExpr" in the grammar?

Octahedrite answered 29/6, 2017 at 7:43 Comment(0)
R
3

Javascript itself (officially ECMAScript, defined by ECMA-262) is described using an extension to BNF which allows rules to be augmented with boolean qualifiers ("parameters" in the language of the standard). This has precisely the effect you are looking for and it clearly simplifies the presentation of the language's somewhat intricate grammar. A full explanation of the BNF extensions can be found in section 5.1.5 (Grammar Notation) of the standard; in summary, parameters can be passed through from the left-hand side to non-terminals in a right-hand side or they can be explicitly set or unset for RHS terminals; furthermore, they can be used to filter possible productions based on either the presence or absence of the parameter. (There's an example at the end of this post.)

This particular BNF extension does not add any generative power to BNF; all uses of it can be mechanically eliminated by simply enumerating the possibilities. Sadly, I know of no grammar generator which implements this formalism (although it is certainly possible that some Javascript implementation contains a custom parser generator).

For your purposes, it would be easy to preprocess your jison grammar to implement something very similar. Indeed, it would be relatively easy to preprocess a bison grammar file, but it is easier with jison because you can compute the grammar programmatically and pass it to jison as a JSON object. This feature is not well-documented but the jison manual contains enough examples that it should be straight-forward to use. See, for example, the CommonJS section.


As promised, here's an excerpt from the ECMA-262 grammar which shows the use of this BNF extension:

IdentifierReference can be qualified with two possible boolean qualifiers (Yield and Await) giving rise to four possibilities. It can always be an Identifier; it can be the keyword yield only if not qualified with the Yield attribute or the keyword await only if not qualified with Await.

IdentifierReference[Yield, Await]:
    Identifier
    [~Yield]yield
    [~Await]await

So this single stanza is equivalent to four non-terminals, which could be produced mechanically:

IdentifierReference: Identfier | yield | await
IdentifierReference_Yield: Identifier | await
IdentifierReference_Await: Identifier | yield
IdentifierReference_Yield_Await: Identifier

Here's how it is applied: An Expression can be qualified with three attributes, all of which are passed through (the ? in ?Yield) to the non-terminals on the right-hand side.

Expression[In, Yield, Await]:
    AssignmentExpression[?In, ?Yield, ?Await]
    Expression[?In, ?Yield, ?Await] , AssignmentExpression[?In, ?Yield, ?Await]

A yield expression is only allowed in the variants of AssignmentExpression qualified with Yield:

AssignmentExpression[In, Yield, Await]:
    ConditionalExpression[?In, ?Yield, ?Await]
    [+Yield]YieldExpression[?In, ?Await]

Finally, an example with explicit parameters. In the production for GeneratorMethod, Yield is explicitly specified for the PropertyName production (which prevents yield from being recognized as an identifier in the list of parameters) and GeneratorBody is defined as a FunctionBody with Yield (allowing yield expressions and forbidding yield as an identifier) and without Await (not allowing await expressions but allowing await to be an identifier).

GeneratorMethod[Yield, Await]:
    * PropertyName[?Yield, ?Await] ( UniqueFormalParameters[+Yield, ~Await] ) { GeneratorBody }

GeneratorBody:
    FunctionBody[+Yield, ~Await]

Much of the above complexity is required by the insistence on backwards-compatibility: since programs written for earlier JS versions may have used yield and await as variable names, those keywords are only reserved in syntactic contexts which were not available in the earlier versions. (That's an oversimplification but the details are well out of scope for this question.)

Restrainer answered 29/6, 2017 at 15:44 Comment(2)
So long story short, it isn't possible with plain Jison and would require custom pre-processing of the grammar?Octahedrite
@rveerd: yes, that's it. But for bison, the custom preprocessing might be simple enough to make it worthwhile.Restrainer

© 2022 - 2024 — McMap. All rights reserved.