Grammar spec resolving Shift/Reduce conflicts
Asked Answered
A

1

6

I'm using Jison (Bison) to create a simple markup language. I'm clearly new to this, but slight variations are working very well. I just don't understand the source of the S/R conflict.

It doesn't seem matter that 'Text' is returned by two lexer actions (with different Start Conditions) and I like this because it seems to allow the grammar to have fewer rules and because the error messages to the user are consistent. I've tried making the 'Text' rule common regardless of context and I've also tried giving each token a different name, but it doesn't seem to have any effect on the S/R Conflicts when it's all together.

The parser is SUPPOSED to create a json-object with plain-text, sub-arrays, and various special nodes.

Specification:

/* lexical grammar */
%lex

%s bracketed

%%

<bracketed>(\\.|[^\\\,\[\]])+       { yytext = yytext.replace(/\\(.)/g, '$1'); return 'Text'; }
<INITIAL>(\\.|[^\\\[])+             { yytext = yytext.replace(/\\(.)/g, '$1'); return 'Text'; }
"["                                 { this.begin('bracketed'); return '['; }
"]"                                 { this.popState(); return ']'; }
","                                 return ','
<<EOF>>                             return 'END'

/lex

%start template

%%    

template
    : sentence END
    ;

sentence
    : /* empty */
    | sentence Text
    | sentence '[' ']'
    | sentence '[' dynamic ']'
    ;

dynamic
    : sentence
    /*| dynamic ',' sentence*/
    ;

Warnings:

Conflict in grammar: multiple actions possible when lookahead token is ] in state 5
- reduce by rule: sentence ->
- shift token (then go to state 6)

States with conflicts:
State 5
  sentence -> sentence [ .] #lookaheads= END Text [ ]
  sentence -> sentence [ .dynamic ] #lookaheads= END Text [ ]
  dynamic -> .sentence #lookaheads= ]
  sentence -> . #lookaheads= ] Text [
  sentence -> .sentence Text
  sentence -> .sentence [ ]
  sentence -> .sentence [ dynamic ]

Different generator algorithms have more or less trouble, but they all seem to have trouble.

Thanks!

Appoggiatura answered 3/10, 2012 at 20:1 Comment(0)
T
18

The conflict comes fundamentally from these two rules:

sentence: sentence '[' Text ']'
        | sentence '[' sentenceList ']'

The reason is that after seeing a sentence and a [ and looking at the next token being Text, the parser doesn't known whether to shift the Text, matching the first rule, or to treat that Text as the beginning of a sentenceList going towards matching the second rule.

Now if you had a parser generator that use 2-token lookahead, this wouldn't be a problem, but bison is LALR(1) (the 1 being one token lookahead).

There are a couple of things you could try:

  • do extra lookahead in the lexer to differentiate Text-followed-by-] from Text-not-followed-by-] as two distinct tokens then rewrite the rules to use both of these tokens.

  • Use bison's %glr-parser feature to use GLR parser. This will parse the sentence both ways and later throw away the one that doesn't match

  • refactor the grammar to not need 2-token lookahead.

One refactoring that works in your case would be to rewrite the sentence rules to make them all right-recursive instead of left-recursive:

sentence: /* empty */
        | Text sentence 
        | '[' ']' sentence
        | '[' Text ']' sentence
        | '[' sentenceList ']' sentence
        ;

This avoids having sentence (or any other rule that starts with sentence such as sentenceList) start with a null reduction of the sentence: /*empty*/ rule. So the parser can freely shift a Text in the problematic case deferring the reduction until it sees the next token. It does have memory use implications, however, as it results in a parser that will essentially shift the entire input on to the parser stack and then reduce it one sentence at a time.

Another refactor you could do would be to subsume the [Text] and [] constructs into the [sentenceList]:

sentence: /* empty */
        | sentence Text 
        | sentence '[' sentenceList ']'
        ;

sentenceList: sentence
            | sentenceList ',' sentence

So now a sentenceList is one or more sentences separated by commas (instead of two or more), and in the action for the sentence '[' sentenceList ']' rule, you'd check the sentenceList to see if it was two or more sentences and act appropriately.

Tabethatabib answered 3/10, 2012 at 22:22 Comment(3)
Great answer. And I like the suggestion you added - that opened my mind to greater processing in those actions, hadn't thought of that. I'm still working on making it all work though. Do the order of rule appearances matter?Appoggiatura
You also helped me realize that the actions aren't really necessary for Conflict resolution discussions.Appoggiatura
In the updated grammar, you have an ambiguity because dynamic -> sentence -> /*empty*/ means that '[' dynamic ']' matches the same input as '[' ']'. That's a problem with my second suggested refactor, so I've fixed that.Tabethatabib

© 2022 - 2024 — McMap. All rights reserved.