Why won't Parsec consider the right-hand side of my <|> alternative?
Asked Answered
A

2

7

I’m trying to parse C++ code. Therefore, I need a context-sensitive lexer. In C++, >> is either one or two tokens (>> or > >), depending on the context. To make it even more complex, there is also a token >>= which is always the same regardless of the context.

punctuation :: Bool -> Parser Token
punctuation expectDoubleGT = do
    c <- oneOf "{}[]#()<>%;:.+-*/^&|~!=,"
    case c of
        '>' ->
            (char '=' >> return TokGTEq) <|>
            if expectDoubleGT
                then (string ">=" >> return TokRShiftEq) <|> return TokGT
                else (char '>' >> ((char '=' >> return TokRShiftEq) <|> return TokRShift)) <|> return TokGT

When expectDoubleGT is False, this function works fine. However, when expectDoubleGT is True (the second last line above), it gives an error when the input is >>.

*Parse> parseTest (punctuation True) ">"
TokGT
*Parse> parseTest (punctuation True) ">>="
TokRShiftEq
*Parse> parseTest (punctuation True) ">>"
parse error at (line 1, column 2):
unexpected end of input
expecting ">="

Why does the expression (string ">=" >> return TokRShiftEq) <|> return TokGT raise an error rather than returning TokGT when the input is >? (the first > was already consumed)

Aldwin answered 9/12, 2012 at 14:29 Comment(0)
M
11

Parsec only tries the second parser in

p1 <|> p2

if p1 failed without consuming any input. On The input ">>", after the first '>' has been consumed,

string ">="

fails after consuming the left over '>', so the second parser isn't used.

You need a try

try (string ">=" >> return TokRShiftEq)

there so that if string ">=" fails, no input is consumed and the alternative parser is used.

Mosier answered 9/12, 2012 at 14:34 Comment(0)
A
-1

Use libclang. It can parse all of C++. No matter how hard you try, you won't be able to.

Antimasque answered 4/3, 2015 at 20:47 Comment(1)
While this isn't a good answer to the question, it is a useful comment. Parsing C and C++ means you should locally accept ambiguity, and I'm not sure whether Parsec can do that.Lawson

© 2022 - 2024 — McMap. All rights reserved.