I’m trying to parse C++ code. Therefore, I need a context-sensitive lexer. In C++, >>
is either one or two tokens (>>
or >
>
), depending on the context. To make it even more complex, there is also a token >>=
which is always the same regardless of the context.
punctuation :: Bool -> Parser Token
punctuation expectDoubleGT = do
c <- oneOf "{}[]#()<>%;:.+-*/^&|~!=,"
case c of
'>' ->
(char '=' >> return TokGTEq) <|>
if expectDoubleGT
then (string ">=" >> return TokRShiftEq) <|> return TokGT
else (char '>' >> ((char '=' >> return TokRShiftEq) <|> return TokRShift)) <|> return TokGT
When expectDoubleGT
is False
, this function works fine. However, when expectDoubleGT
is True
(the second last line above), it gives an error when the input is >>
.
*Parse> parseTest (punctuation True) ">"
TokGT
*Parse> parseTest (punctuation True) ">>="
TokRShiftEq
*Parse> parseTest (punctuation True) ">>"
parse error at (line 1, column 2):
unexpected end of input
expecting ">="
Why does the expression (string ">=" >> return TokRShiftEq) <|> return TokGT
raise an error rather than returning TokGT
when the input is >
? (the first >
was already consumed)