Solving parsing conflicts in a tiny Lemon grammar
Asked Answered
A

1

6

I'm trying to learn basics of the Lemon parser generator, but I stuck quickly.

Here's a tiny grammar:

%right PLUS_PLUS.
%left DOT.

program ::= expr.

member_expr ::= expr DOT IDENTIFIER.

lhs_expr ::= member_expr.

expr ::= lhs_expr.
expr ::= PLUS_PLUS lhs_expr.

It causes 1 parsing conflict:

State 3:
      (3) expr ::= lhs_expr *
      (4) expr ::= PLUS_PLUS lhs_expr *

                           DOT reduce       3      expr ::= lhs_expr
                           DOT reduce       4       ** Parsing conflict **
                     {default} reduce       4      expr ::= PLUS_PLUS lhs_expr

Whereas, if I rewrite the last rule as follows:

expr ::= PLUS_PLUS expr DOT IDENTIFIER.

Then it causes no conflicts. But I don't think that this is the right way to go.

I'd appreciate if someone could explain what's the right way, and why.

Arnone answered 6/1, 2017 at 22:25 Comment(2)
I suggest you think of this without the extra names for things. Everywhere you have lhs_expr you could simply write expr DOT IDENTIFIER to see clearly what is really being asked for. If everything is in terms of expr you can see the conflict more clearly.Shortridge
Thing is, lhs_expr can be something other than expr DOT IDENTIFIER. This particular grammar does not contain other rules for it, because I wanted it to be very minimal, but it can also be e.g. IDENTIFIER, or expr LBRACKET expr RBRACKET, etc. And I want to write the rule PLUS_PLUS lhs_expr just once, and cover all possible left-hand side expressions being pre-incremented.Arnone
M
5

So you wrote an ambiguous grammar, which says to accept:

 ++ x . y

with two interpretations:

 [++ x ] . y

and

 ++ [x . y]

where the [ ] are just my way to show to the groupings.

Lemon is an L(AL)R parser, and such parsers simply don't handle ambiguities (multiple interpretations). The reduce-reduce conflict reported is what happens when the parser hits that middle dot; does it group "++ x" as "[++ x] ." or as "++ [ x .]"? Both choices are valid, and it can't choose safely.

If you stick with Lemon (or another LALR parser generator), you have to get rid of the problem by changing the grammar. [You could use a GLR parser generator; it would accept and give you both parses. But all you've done then is to push the problem of resolving the ambiguity to the post-parsing phrase. As you don't want an ambiguity, you might as well avoid it during parsing if you can. In this case I think you can.]

I think you are trying to build a C-like language. So you want something like this:

primitive_target ::= IDENTIFIER ;
primitive_target ::= IDENTIFIER '[' expr ']' ;
access_path ::= primitive_target ;
access_path ::= access_path '.' primitive_target ;

lhs ::= access_path ;
lhs ::= PLUS_PLUS access_path ;
lhs ::= access_path PLUS_PLUS ;

program ::= expr ;

expr ::= term ;
expr ::= expr '+' term ;
term :::= '(' expr ')' ;
term ::= lhs ;
term ::= lhs '=' expr ;
term ::= constant ;
Mcmath answered 12/1, 2017 at 14:48 Comment(7)
Thanks for the answer. But, shouldn't the ambiguity of ++ x . y be solved by the priorities of PLUS_PLUS and DOT? Given my grammar, DOT has a higher priority (because it appears after PLUS_PLUS).Arnone
I'm don't know ANTLR's %right and %left declarations. What you in essence want is something that forces a shift when you encounter dot with left context containing ++. It isn't clear to me that %right and %left force shifting; my guess is they force associativity on binary operators, but PLUS_PLUS isn't a binary operator so it might be meaningless. My cure was to avoid those special declarations and force the grammar to do what I (well, you) wanted directly.Mcmath
It's not ANTLR, but Lemon. Anyway, thank you so much!Arnone
Oh, yes, I knew that when I wrote the answer, lost in fog over last several days. You're welcome.Mcmath
Still, this grammar won't parse expressions like foo.bar.baz.Arnone
Why not? program -> expr -> term -> lhs -> accesspath, and access path allows a sequence of dot-separated primitive targets.Mcmath
Hmm yeah, my bad, sorry.Arnone

© 2022 - 2024 — McMap. All rights reserved.