Solving parsing conflicts in a tiny Lemon grammar - McMap

About

Solving parsing conflicts in a tiny Lemon grammar

Asked 6/1, 2017 at 22:25 Answered 12/1, 2017 at 14:48

Solved parser-generator shift-reduce-conflict lemon shift-reduce

A

1

6

I'm trying to learn basics of the Lemon parser generator, but I stuck quickly.

Here's a tiny grammar:

%right PLUS_PLUS.
%left DOT.

program ::= expr.

member_expr ::= expr DOT IDENTIFIER.

lhs_expr ::= member_expr.

expr ::= lhs_expr.
expr ::= PLUS_PLUS lhs_expr.

It causes 1 parsing conflict:

State 3:
      (3) expr ::= lhs_expr *
      (4) expr ::= PLUS_PLUS lhs_expr *

                           DOT reduce       3      expr ::= lhs_expr
                           DOT reduce       4       ** Parsing conflict **
                     {default} reduce       4      expr ::= PLUS_PLUS lhs_expr

Whereas, if I rewrite the last rule as follows:

expr ::= PLUS_PLUS expr DOT IDENTIFIER.

Then it causes no conflicts. But I don't think that this is the right way to go.

I'd appreciate if someone could explain what's the right way, and why.

Arnone answered 6/1, 2017 at 22:25 Comment(2)

I suggest you think of this without the extra names for things. Everywhere you have lhs_expr you could simply write expr DOT IDENTIFIER to see clearly what is really being asked for. If everything is in terms of expr you can see the conflict more clearly. – Shortridge 6/1, 2017 at 22:48

Thing is, lhs_expr can be something other than expr DOT IDENTIFIER. This particular grammar does not contain other rules for it, because I wanted it to be very minimal, but it can also be e.g. IDENTIFIER, or expr LBRACKET expr RBRACKET, etc. And I want to write the rule PLUS_PLUS lhs_expr just once, and cover all possible left-hand side expressions being pre-incremented. – Arnone 6/1, 2017 at 22:52

M

5

So you wrote an ambiguous grammar, which says to accept:

 ++ x . y

with two interpretations:

 [++ x ] . y

and

 ++ [x . y]

where the [ ] are just my way to show to the groupings.

Lemon is an L(AL)R parser, and such parsers simply don't handle ambiguities (multiple interpretations). The reduce-reduce conflict reported is what happens when the parser hits that middle dot; does it group "++ x" as "[++ x] ." or as "++ [ x .]"? Both choices are valid, and it can't choose safely.

If you stick with Lemon (or another LALR parser generator), you have to get rid of the problem by changing the grammar. [You could use a GLR parser generator; it would accept and give you both parses. But all you've done then is to push the problem of resolving the ambiguity to the post-parsing phrase. As you don't want an ambiguity, you might as well avoid it during parsing if you can. In this case I think you can.]

I think you are trying to build a C-like language. So you want something like this:

primitive_target ::= IDENTIFIER ;
primitive_target ::= IDENTIFIER '[' expr ']' ;
access_path ::= primitive_target ;
access_path ::= access_path '.' primitive_target ;

lhs ::= access_path ;
lhs ::= PLUS_PLUS access_path ;
lhs ::= access_path PLUS_PLUS ;

program ::= expr ;

expr ::= term ;
expr ::= expr '+' term ;
term :::= '(' expr ')' ;
term ::= lhs ;
term ::= lhs '=' expr ;
term ::= constant ;

Mcmath answered 12/1, 2017 at 14:48 Comment(7)

Thanks for the answer. But, shouldn't the ambiguity of ++ x . y be solved by the priorities of PLUS_PLUS and DOT? Given my grammar, DOT has a higher priority (because it appears after PLUS_PLUS). – Arnone 16/1, 2017 at 8:15

I'm don't know ANTLR's %right and %left declarations. What you in essence want is something that forces a shift when you encounter dot with left context containing ++. It isn't clear to me that %right and %left force shifting; my guess is they force associativity on binary operators, but PLUS_PLUS isn't a binary operator so it might be meaningless. My cure was to avoid those special declarations and force the grammar to do what I (well, you) wanted directly. – Mcmath 16/1, 2017 at 9:26

It's not ANTLR, but Lemon. Anyway, thank you so much! – Arnone 16/1, 2017 at 14:30

Oh, yes, I knew that when I wrote the answer, lost in fog over last several days. You're welcome. – Mcmath 16/1, 2017 at 14:44

Still, this grammar won't parse expressions like foo.bar.baz. – Arnone 19/1, 2017 at 21:16

Why not? program -> expr -> term -> lhs -> accesspath, and access path allows a sequence of dot-separated primitive targets. – Mcmath 19/1, 2017 at 22:0

Hmm yeah, my bad, sorry. – Arnone 20/1, 2017 at 14:16

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.