FParsec failing on many
Asked Answered
A

1

5

I have this test program:

open FParsec

let test p str =
    match run p str with
    | Success(result, _, _)   -> printfn "Success: %A" result
    | Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg

let str s = pstring s

let sepPart = skipNewline >>. pstring "-"

let part = manyChars (notFollowedBy sepPart >>. anyChar)

[<EntryPoint>]
let main argv = 
    let s = "AA 12345\nBB 6789\n-----\nCC 9876\nDD 54321\n-----"
    test part s
    test (many part) s

    0 // return an integer exit code

The line {test part s} works as expected but the next line, {test (many part) s} fails and I don't understand what I am doing wrong.

EDIT:

To clarify, what I am trying to do is have {test (many part) s} return ["AA 12345\nBB 6789"; "CC 9876\nDD 54321"]. In words, what I have is an input string composed of "pars" or "chunks" separated by lines with all dashes. For output I want an array where each element is one of the parts and the lines with dashes are simply discarded.

Aurochs answered 19/5, 2013 at 19:58 Comment(0)
U
11

When you execute your example, FParsec throws an exception with the following message:

Additional information: (Ln: 2, Col: 8): The combinator 'many' was applied to a parser that succeeds without consuming input and without changing the parser state in any other way. (If no exception had been raised, the combinator likely would have entered an infinite loop.)

The problem is that your part parser always succeeds, even if it can only parse an empty string. You can solve that problem by replacing manyChars in the definition of part with many1Chars.

If you search for e.g. "applied to a parser that succeeds without consuming input" you'll find several discussions of similar errors on the internet, including one in FParse's user guide: http://www.quanttec.com/fparsec/users-guide/parsing-sequences.html#the-many-parser

Update: Here's a straightforward parser definition that works:

let sepPart = skipNewline 
              >>? (skipMany1SatisfyL ((=) '-') "'-'" 
                    >>. (skipNewline <|> eof))

let part = many1CharsTill anyChar sepPart    
let parser = many part

Note that I'm using >>? in the definition of sepPart to allow this parser to backtrack to the beginning if a newline is not followed by a dash. Alternatively you could also use attempt (skipNewline >>. ...), which would also backtrack for errors after the initial dash. The documentation for many[Chars]Till p endp states an equivalence with many (notFollowedBy endp >>. p) .>> endp that is not strictly true, because many[Chars]Till does not backtrack like notFollowedBy. I will clarify the documentation.

It's better for performance if you avoid backtracking using many[Chars]Till or notFollowedBy where possible. For example, you could also parse your chunks of lines as follows:

let id = manyMinMaxSatisfyL 2 2 isUpper "id (two capital letters)"

let line = id .>>. (pchar ' ' >>. restOfLine true)

let separator = many1SatisfyL ((=) '-') "dash separator"
                >>. (skipNewline <|> eof)

let chunk = many1 line     
let parser = sepEndBy1 chunk separator

Note that this implementation doesn't require the last chunk to be ended by a separator. If you want that, you could instead use:

let chunk = many line .>> separator
let parser = many chunk

If you want to allow empty chunks with the sepEndBy definition, you could use:

let chunk = many1 line <|> (notFollowedByEof >>% [])
let parser = sepEndBy1 chunk separator
Underling answered 19/5, 2013 at 21:13 Comment(5)
This does suppress the error, however it does not return the correct result. If you look at what is returned by the first parse {test part s} you will see the result changes when manyChars is changed to many1Chars in the part parser.Aurochs
I don't see the difference, except for part returning a string and many part returning a string in a list. Could you please be more specific and explain what output you're expecting? Your parser never skips the sepPart, so maybe you want something like many (part .>> sepPart). If you want to match one or more dashes for the separator, you'll also have to use something like skipMany1 (pstring "-") or skipMany1SatisfyL ((=) '-') "'-'" in the definition of sepPart.Underling
I edited the question to better describe the expected output. I appologise for not making it clearer to begin with.Aurochs
I've updated the answer. Please ask if you find anything unclear.Underling
Sorry, I had to update the answer again, because the proposed simpler solution had a bug: there was one many too many.Underling

© 2022 - 2024 — McMap. All rights reserved.