FParsec: how to omit `many` parser failures from error messages
Asked Answered
B

2

6

Consider this parser that converts digit strings to ints:

let toInt (s:string) = 
    match Int32.TryParse(s) with
    | (true, n) -> preturn n
    | _         -> fail "Number must be below 2147483648"

let naturalNum = many1Chars digit >>= toInt <?> "natural number"

When I run it on non-numeric strings like "abc" it shows the correct error message:

Error in Ln: 1 Col: 1
abc
^
Expecting: natural number

But when I give it a numeric string exceeding the int range it gives the following counter-productive message:

Error in Ln: 1 Col: 17
9999999999999999
                ^
Note: The error occurred at the end of the input stream.
Expecting: decimal digit
Other error messages:
  Number must be below 2147483648

The primary message "Expecting: decimal digit" makes no sense, because we have to many digits already.

Is there a way to get rid of it and only show "Number must be below 2147483648"?


Full example:

open System
open FParsec

[<EntryPoint>]
let main argv =
    let toInt (s:string) = 
        match Int32.TryParse(s) with
        | (true, n) -> preturn n
        | _         -> fail "Number must be below 2147483648"

    let naturalNum = many1Chars digit >>= toInt <?> "natural number"

    match run naturalNum "9999999999999999" with
    | Failure (msg, _, _) -> printfn "%s" msg
    | Success (a, _, _)   -> printfn "%A" a

    0
Bora answered 24/5, 2019 at 22:56 Comment(3)
I don’t understand why in both cases "abc" and "9999999999" the results are different. Both of them should go into the fail ....Duax
many1Chars digit will do the fail for us when we feed the parser "abc". Only in toInt we have to do it ourselves.Bora
The effect you see is caused by FParsec's internal handling of sequencing parsers. Even if the first parser succeeds, it may generate an error message (here since it cannot parse more digits). If the second parser fails without consuming input (which is the case here, since you only convert the first parser's result) all existing error messages in the state are merged together. This is described in FParsec's documentation. A solution could be a custom sequencing operator that drops potential error messages of the first parser in the success case and then forwards the result further.Featherstone
L
2

I think the root of the problem here is that this is a non-syntactic concern, which doesn't fit well with the model of a lookahead parser. If you could express "too many digits" in a syntactic way, it would make sense for the parser too, but as it is it will instead go back and try to consume more input. I think the cleanest solution therefore would be to do the int conversion in a separate pass after the parsing.

That said, FParsec seems flexible enough that you should still be able to hack it together. This does what you ask I think:

let naturalNum: Parser<int, _> =
    fun stream ->
        let reply = many1Chars digit stream
        match reply.Status with
            | Ok ->
                match Int32.TryParse(reply.Result) with
                | (true, n) -> Reply(n)
                | _         -> Reply(Error, messageError "Number must be below 2147483648")                
            | _ ->
                Reply(Error, reply.Error)

Or if you want the "natural number" error message instead of "decimal digit", replace the last line with:

Reply(Error, messageError "Expecting: natural number")
Lamanna answered 29/5, 2019 at 14:36 Comment(1)
You are right, I was trying to solve the problem on the wrong API level. Gonna verify your solution later.Bora
F
2

The effect you see ist that the first parser of your sequence succeeds, but also generates an error message (because it could consume even more digits). Your second parser consumes no further input and if it fails FParsec will therefore merge the error messages of the two sequenced parsers (Manual on merging of error messages).

A solution would be to create a small wrapper for a parser, that removes error messages from a result in the Ok case. Then when sequenced with a second parser only the message of the second parser remain.

Untested code from the top of my head:

let purify p =
    fun stream ->
        let res = p stream
        match res.Status with
            | Ok -> Reply(res.Result)
            | _ -> res


let naturalNum = purify (many1Chars digit) >>= toInt <?> "natural number"
Featherstone answered 29/5, 2019 at 16:19 Comment(2)
Interesting solution! In this specific case I'd probably go with by glennsl's answer, but purify should come in handy in more complex scenarios.Bora
Perhaps you can combine both solutions. What is actually needed here is a map like function for parsers, where the applied function may fail. In case of failure a specific error can be created which overwrites the original errors of the parser.Featherstone

© 2022 - 2024 — McMap. All rights reserved.