FParsec choice behaves in unexpected ways
Asked Answered
L

1

6

I plan to use FParsec for the prototype of a larger project of mine. So I decided to get my first experiences with this library by means of the test program listed below. But it seems that the combination of my basic parsers (which seem to work) by using the fparsec 'choice' function produces unexpected behavior.

Basically, the goal is that all this simple calculator parser code does always returns a sum of products of either numbers or sub expressions. Sub expressions in turn shall have the same structure as the whole expression.

As I understood from the documentation of 'choice', the alternatives are attempted from left to right as specified in the list of parsers given to 'choice'. I understood that if a parser further left in the list fails but consumed input, the subsequent parsers will not be attempted.

Yet, there seems to be more to it than I can understand right now, as if it were as I stated above, the code should work. But it does not work.

It would be much appreciated if someone could explain to me a) what is going wrong and why and b) how to fix it.

In my main project I plan to compute parsers from some input and so I need to understand exactly how to combine parsers in a reliable way free of surprises.

(*
    SimpleAOSCalculator

    Should implement the following grammar:

    SimpleAOSCalculator := SUM
    SUM := SUMMAND [ '+' SUMMAND ]*
    SUMMAND := PRODUCT | SUBEXPR
    PRODUCT := FACTOR [ '*' FACTOR ]*
    FACTOR := NUMBER | SUBEXPR
    SUBEXPR := '(' SUM ')'
    NUMBER := pfloat
*)

// NOTE: If you try this in fsi, you have to change the 2 lines below to point to the spot you have your fparsec dlls stored at.
#r @"C:\hgprojects\fparsec\Build\VS11\bin\Debug\FParsecCS.dll"
#r @"C:\hgprojects\fparsec\Build\VS11\bin\Debug\FParsec.dll"

open FParsec

let testParser p input =
    match run p input with
    | Success(result, _, _) -> printfn "Success: %A" result
    | Failure(errorMsg, _, _) -> printfn "Failure %s" errorMsg
    input

type Node = 
    | Sum of SumNode
    | Product of ProductNode
    | Number of NumberNode
    | SubExpression of SubExpressionNode
and SumNode = 
    {
        Summands : Node list
    }
and ProductNode = 
    {
        Factors : Node list
    }
and NumberNode =
    {
        Value : float
    }
and SubExpressionNode =
    {
        N : Node
    }

let CreateSubExpression (n : Node) : Node =
    let s : SubExpressionNode = { N = n }
    SubExpression  s

let (PrimitiveAOSCalculator : Parser<Node,unit>), (PrimitiveAOSCalculatorImpl : Parser<Node,unit> ref) = createParserForwardedToRef()

let SubExpression : Parser<Node,unit> =
    between (pchar '(') (pchar ')') PrimitiveAOSCalculator |>> CreateSubExpression

let Number : Parser<Node,unit> =
   pfloat |>> (fun v -> Number { Value = v })

let Product : Parser<Node,unit> = 
    let Factor : Parser<Node,unit> = choice [Number; SubExpression]
    let Mult = spaces >>. pchar '*' .>> spaces
    sepBy1 Factor Mult |>> (fun l -> Product { Factors = l})

let Summand : Parser<Node,unit> =
    choice [ attempt Product; attempt SubExpression ]

let Sum = 
    let Add = (spaces >>. pchar '+' .>> spaces)
    sepBy1 Summand Add |>> (fun l -> Sum { Summands = l })

do PrimitiveAOSCalculatorImpl :=
    Sum

let rec Eval (n : Node) : float =
    match n with
    | Number(v) -> v.Value
    | Product(p) -> List.map (fun n -> Eval n) p.Factors |> List.fold (fun a b -> a * b) 1.0
    | Sum(s) -> List.map (fun t -> Eval t) s.Summands |> List.fold (fun a b -> a + b) 0.0
    | SubExpression(x) -> Eval x.N


let Calculate (term : string) : float =
    let parseResult = run PrimitiveAOSCalculator term
    match parseResult with
    | Success(ast,_,_) -> Eval ast
    | Failure(errorMessage,_,_) -> failwith ("Parsing of the expression failed: " + errorMessage)

let Show (s : string) : string =
    printfn "%s" s
    s

let test p i =
    testParser p i |> Show |> Calculate |> printfn "result = %f"

do test Product "5.1 * 2" 
do test Product "5.1"
do test Product "5.1"
do test Sum "(4 * 3) + (5 * 2)"
do test Sum "4 * 3 + 5 * 2"

do test PrimitiveAOSCalculator "42"
do test PrimitiveAOSCalculator "42 * 42"
do test PrimitiveAOSCalculator "42 + 42"
do test PrimitiveAOSCalculator "42 * 42 + 47.11"
do test PrimitiveAOSCalculator "5.1 * (32 + 88 * 3) + 1.4"

Here, $do test Sum "4 * 3 + 5 * 2" fails with the following output:

Failure Error in Ln: 1 Col: 1
4 * 3 + 5 * 2
^
Expecting: '('

The parser backtracked after:
  Error in Ln: 1 Col: 7
  4 * 3 + 5 * 2
        ^
  Expecting: '*'

4 * 3 + 5 * 2
System.Exception: Parsing of the expression failed: Error in Ln: 1 Col: 1
4 * 3 + 5 * 2
^
Expecting: '('

The parser backtracked after:
  Error in Ln: 1 Col: 7
  4 * 3 + 5 * 2
        ^
  Expecting: '*'

And I have not even the foggiest idea, why it would expect '*' here.

Lapidate answered 25/5, 2014 at 23:52 Comment(2)
I haven't tested your (very large) code, but Product seems to be wrong as it uses sepBy1 and thus it does not allow a single Number to be processed. Yet another thing: why did you make parsing and syntax tree analyzing in a single code? Wouldn't it be easier to parse string into Element list where type Element = | Number of int | Operation of char and then running syntax analysis? Imagine how hard it would be to add numeric division into your existing codebase.Pernod
@bytebuster Do I misunderstand the documentation? sepBy1 Factor Mult would mean in EBNF: Factor [ Mult Factor ]* while sepBy Factor Mult would mean in EBNF: [Factor [ '*' Factor ]*]. The structure of this (only 150 lines ;) ) app is as you suggest. SimpleAOSParser produces AST (Node tree), Eval evaluates AST.Lapidate
U
10

The basic mistake, which is one that is done very often when starting with parser combinators, is that they are not directly equivalent to an EBNF. The fundamental difference is that when you give parsec a choice, it tries them in order, and as soon as one of the choices matches even a single character then it stays in this branch. It will only backtrack if you put your choice in an attempt, and you should do this as little as possible (for performance reasons, and also for error reporting reasons -- see my last paragraph).

More specifically in your code, the mistake is in your separators. Combinators such as sepBy1 are built from choices. When it has matched an element, it then tries to match a separator. In this case the separator is spaces >>. pchar '*' .>> spaces. Since spaces matches successfully and consumes a character, it will not backtrack, even if pchar '*' then fails; it will simply consider this parser as a whole a failure. This is a very common issue regarding whitespace with parser combinators. The usual way to fix this is to always parse whitespace as a suffix of another parser, and not as a prefix. In your case, you need to:

  • Replace pfloat in Number with pfloat .>> spaces.

  • Remove the prefix spaces >>. in your separators.

  • You probably also want to add a suffix .>> spaces to both the opening and closing paren parsers.

You can write intermediary functions that will prevent this from getting too verbose:

// ...

let sp parser = parser .>> spaces

let spchar c = sp (pchar c)

let SubExpression : Parser<Node,unit> =
    between (spchar '(') (spchar ')') PrimitiveAOSCalculator |>> CreateSubExpression

let Number : Parser<Node,unit> =
    sp pfloat |>> (fun v -> Number { Value = v })

let Product : Parser<Node,unit> = 
    let Factor : Parser<Node,unit> = choice [Number; SubExpression]
    let Mult = spchar '*'
    sepBy1 Factor Mult |>> (fun l -> Product { Factors = l})

let Summand : Parser<Node,unit> =
    choice [ Product; SubExpression ]

let Sum = 
    let Add = spchar '+'
    sepBy1 Summand Add |>> (fun l -> Sum { Summands = l })

// ...

I also removed the calls to attempt in Summand. They are the reason why your errors showed up in such weird places: when the separator parser failed, the error propagated up, until it reached the call to attempt Product; this attempt turned the error into a simple "no match and no input consumed", so the choice then tried SubExpression instead of failing altogether. This ultimately told you that it was expecting '(' even though the original error was actually somewhere else. As a rule, you should avoid attempt, and if you really need it, call it on the smallest parser possible.

Unsaid answered 26/5, 2014 at 10:30 Comment(1)
Thanks a lot for your good answer! I added the attempt expressions in Summand hoping that then, the choice would work as I expected it to work, hoping that if one of the choices fails, the system backtracks and as such choice continues with the next choice. But somehow, attempt does - as you point out - fail to do that job.Lapidate

© 2022 - 2024 — McMap. All rights reserved.