Grammar spec resolving Shift/Reduce conflicts

/* lexical grammar */ %lex %s bracketed %% <bracketed>(\\.|[^\\\,\[\]])+ { yytext = yytext.replace(/\\(.)/g, '$1'); return 'Text'; } <INITIAL>(\\.|[^\\\[])+ { yytext = yytext.replace(/\\(.)/g, '$1'); return 'Text'; } "[" { this.begin('bracketed'); return '['; } "]" { this.popState(); return ']'; } "," return ',' <<EOF>> return 'END' /lex %start template %% template : sentence END ; sentence : /* empty */ | sentence Text | sentence '[' ']' | sentence '[' dynamic ']' ; dynamic : sentence /*| dynamic ',' sentence*/ ;

Conflict in grammar: multiple actions possible when lookahead token is ] in state 5 - reduce by rule: sentence -> - shift token (then go to state 6) States with conflicts: State 5 sentence -> sentence [ .] #lookaheads= END Text [ ] sentence -> sentence [ .dynamic ] #lookaheads= END Text [ ] dynamic -> .sentence #lookaheads= ] sentence -> . #lookaheads= ] Text [ sentence -> .sentence Text sentence -> .sentence [ ] sentence -> .sentence [ dynamic ]

The conflict comes fundamentally from these two rules:

sentence: sentence '[' Text ']'
        | sentence '[' sentenceList ']'

The reason is that after seeing a sentence and a [ and looking at the next token being Text, the parser doesn't known whether to shift the Text, matching the first rule, or to treat that Text as the beginning of a sentenceList going towards matching the second rule.

Now if you had a parser generator that use 2-token lookahead, this wouldn't be a problem, but bison is LALR(1) (the 1 being one token lookahead).

There are a couple of things you could try:

do extra lookahead in the lexer to differentiate Text-followed-by-] from Text-not-followed-by-] as two distinct tokens then rewrite the rules to use both of these tokens.
Use bison's %glr-parser feature to use GLR parser. This will parse the sentence both ways and later throw away the one that doesn't match
refactor the grammar to not need 2-token lookahead.

One refactoring that works in your case would be to rewrite the sentence rules to make them all right-recursive instead of left-recursive:

sentence: /* empty */
        | Text sentence 
        | '[' ']' sentence
        | '[' Text ']' sentence
        | '[' sentenceList ']' sentence
        ;

This avoids having sentence (or any other rule that starts with sentence such as sentenceList) start with a null reduction of the sentence: /*empty*/ rule. So the parser can freely shift a Text in the problematic case deferring the reduction until it sees the next token. It does have memory use implications, however, as it results in a parser that will essentially shift the entire input on to the parser stack and then reduce it one sentence at a time.

Another refactor you could do would be to subsume the [Text] and [] constructs into the [sentenceList]:

sentence: /* empty */
        | sentence Text 
        | sentence '[' sentenceList ']'
        ;

sentenceList: sentence
            | sentenceList ',' sentence

So now a sentenceList is one or more sentences separated by commas (instead of two or more), and in the action for the sentence '[' sentenceList ']' rule, you'd check the sentenceList to see if it was two or more sentences and act appropriately.

Recommended topics

Hot tags