Parsing block comments with Megaparsec using symbols for start and end
Asked Answered
G

1

5

I want to parse text similar to this in Haskell using Megaparsec.

# START SKIP
def foo(a,b):
    c = 2*a # Foo 
    return a + b
# END SKIP

, where # START SKIP and # END SKIP marks the start and end of the block of text to parse.

Compared to skipBlockComment I want the parser to return the lines between the start and end marker.

This is my parser.

skip :: Parser String
skip = s >> manyTill anyChar e
  where s = string "# START SKIP"
        e = string "# END SKIP"

The skip parser works as intended.

To allow for a variable amount of white space within the start and end marker, for example # START SKIP I've tried the following:

skip' :: Parser String
skip' = s >> manyTill anyChar e
  where s = symbol "#" >> symbol "START" >> symbol "SKIP"
        e = symbol "#" >> symbol "END" >> symbol "SKIP"

Using skip' to parse the above text gives the following error.

3:15:
unexpected 'F'
expecting "END", space, or tab

I would like to understand the cause of this error and how I can fix it.

Gladine answered 13/11, 2016 at 23:58 Comment(1)
The problem is you have a common prefix for your parsers. Take a look at try.Kristakristal
A
7

As Alec already commented, the problem is that as soon as e encounters '#', it counts as a consumed character. And the way parsec and its derivatives work is that as soon as you've consumed any characters, you're committed to that parsing branch – i.e. the manyTill anyChar alternative is then not considered anymore, even though e ultimately fails here.

You can easily request backtracking though, by wrapping the end delimiter in try:

skip' :: Parser String
skip' = s >> manyTill anyChar e
  where s = symbol "#" >> symbol "START" >> symbol "SKIP"
        e = try $ symbol "#" >> symbol "END" >> symbol "SKIP"

This then will before consuming '#' set a “checkpoint”, and when e fails later on (in your example, at "Foo"), it will act as if no characters had matched at all.

In fact, traditional parsec would give the same behaviour also for skip. Just, because looking for a string and only succeeding if it matches entirely is such a common task, megaparsec's string is implemented like try . string, i.e. if the failure occurs within that fixed string then it will always backtrack.

However, compound parsers still don't backtrack by default, like they do in attoparsec. The main reason is that if anything can backtrack to any point, you can't really get a clear point of failure to show in the error message.

Antependium answered 14/11, 2016 at 0:17 Comment(1)
Thank you @leftaroundabout! Very nice explanation.Gladine

© 2022 - 2024 — McMap. All rights reserved.