Grammar ambiguity: why? (problem is: "(a)" vs "(a-z)")

Asked 27/6, 2011 at 22:13 Answered 27/6, 2011 at 22:31

So I am trying to implement a pretty simple grammar for one-line statements:

# Grammar

   c         : Character c        [a-z0-9-]
  (v)        : Vowel              (= [a,e,u,i,o])
  (c)        : Consonant
  (?)        : Any character (incl. number)
  (l)        : Any alpha char     (= [a-z])
  (n)        : Any integer        (= [0-9])
  (c1-c2)    : Range from char c1 to char c2
  (c1,c2,c3) : List including chars c1, c2 and c3

  Examples:
  h(v)(c)no(l)(l)jj-k(n)
  h(v)(c)no(l)(l)(a)(a)(n)
  h(e-g)allo
  h(e,f,g)allo
  h(x,y,z)uul
  h(x,y,z)(x,y,z)(x,y,z)(x,y,z)uul

I am using the Happy parser generator (http://www.haskell.org/happy/) but for some reason there seems to be some ambiguity problem.

The error message is: "shift/reduce conflicts: 1"

I think the ambiguity is with these two lines:

  | lBracket char rBracket              { (\c -> case c of
                                                 'v' -> TVowel
                                                 'c' -> TConsonant
                                                 'l' -> TLetter
                                                 'n' -> TNumber) $2 }
  | lBracket char hyphen char rBracket  { TRange $2 $4              }

An example case is: "(a)" vs "(a-z)"

The lexer would give the following for the two cases:

(a)   : [CLBracket, CChar 'a', CRBracket]
(a-z) : [CLBracket, CChar 'a', CHyphen, CChar 'z', CRBracket]

What I don't understand is how this can be ambiguous with an LL[2] parser.

In case it helps here is the entire Happy grammar definition:

{

module XHappyParser where

import Data.Char
import Prelude   hiding (lex)
import XLexer
import XString

}

%name parse
%tokentype { Character  }
%error     { parseError }

%token
    lBracket                  { CLBracket   }
    rBracket                  { CRBracket   }
    hyphen                    { CHyphen     }
    question                  { CQuestion   }
    comma                     { CComma      }
    char                      { CChar $$    }

%%

xstring : tokens                            { XString (reverse $1)       }

tokens : token                              { [$1]                       }
       | tokens token                       { $2 : $1                    }

token : char                                { TLiteral $1                }
      | hyphen                              { TLiteral '-'               }
      | lBracket char rBracket              { (\c -> case c of
                                                     'v' -> TVowel
                                                     'c' -> TConsonant
                                                     'l' -> TLetter
                                                     'n' -> TNumber) $2 }
      | lBracket question rBracket          { TAny                      }
      | lBracket char hyphen char rBracket  { TRange $2 $4              }
      | lBracket listitems rBracket         { TList $2                  }

listitems : char                            { [$1]                      }
          | listitems comma char            { $1 ++ [$3]                }

{

parseError :: [Character] -> a
parseError _ = error "parse error"

}

Thank you!

Undercoating answered 27/6, 2011 at 22:13 Comment(0)

Here's the ambiguity:

token : [...]
      | lBracket char rBracket
      | [...] 
      | lBracket listitems rBracket

listitems : char
          | [...]

Your parser could accept (v) as both TString [TVowel] and TString [TList ['v']], not to mention the missing characters in that case expression.

One possible way of solving it would be to modify your grammar so lists are at least two items, or have some different notation for vowels, consonants, etc.

Titmouse answered 27/6, 2011 at 22:31 Comment(3)

Thanks... that solved the problem (lists with one item are useless in this case anyway, so I removed them). But what do you mean with the case statement? – Undercoating 27/6, 2011 at 22:35

@o1iver: Only that you probably want to add a default case to it to handle single characters that are not one of v, c, ?, l, n to signal a more meaningful error than "Non-exhaustive patterns in case expression". – Titmouse 27/6, 2011 at 22:38

yes I did have that before, but I wasn't sure. I think I will just throw an error if that happens... Although I guess I will have to have a better look at the Happy docs concerning error handling in general! Thanks again... – Undercoating 27/6, 2011 at 22:40

The problem seems to be:

| lBracket char rBracket
...
| lBracket listitems rBracket

or in cleaner syntax:

(c)

Can be a TVowel, TConsonant, TLetter, TNumber (as you know) or a singleton TList.

As the happy manual says, shift reduce usually isn't an issue. You can us precedence to force behavior/remove the warning if you'd like.

Lanielanier answered 27/6, 2011 at 22:30 Comment(1)

Thanks! I was looking in the wrong place. The problem really seems to be the singleton list vs the special chars ("(v)", "(n)", etc...). Thank you! – Undercoating 27/6, 2011 at 22:36

Recommended topics

Hot tags