FParsec: backtracking `sepBy`

Asked 12/8, 2017 at 19:5 Answered 22/8, 2017 at 8:8

Consider the following toy grammar and parser:

(* in EBNF:
  ap = "a", { "ba" }
  bp = ap, "bc"
*)
let ap = sepBy1 (pstring "a") (pstring "b")
let bp = ap .>> (pstring "bc")
let test = run bp "abababc"

I get the following output:

Error in Ln: 1 Col: 7
abababc
      ^
Expecting: 'a'

Clearly sepBy1 sees the last b and expects it to lead into another a, failing when it doesn't find one. Is there a variant of sepBy1 which would backtrack over b and make this parse succeed? Is there any reason why I shouldn't use that instead?

Evaginate answered 12/8, 2017 at 19:5 Comment(1)

I've opened github.com/stephan-tolksdorf/fparsec/issues/15 to discuss this feature. If you would like to leave a comment on that Github issue explaining your use case in more detail (e.g., the real grammar that you're struggling with, as opposed to the simplified grammar for the Stack Overflow question) then it would help Stephan (and me, since I might be involved in implementing the feature) figure out if the use case is worth the extra work of implementing & testing that feature. But as he pointed out, if you can rearrange your grammar to avoid backtracking, that's even better. – Samale 30/8, 2017 at 2:52

This is one way to implement such a variant of sepBy1:

let backtrackingSepBy1 p sep = pipe2 p (many (sep >>? p)) (fun hd tl -> hd::tl)

Avoiding backtracking in your parser grammar usually makes the parser faster, more portable and easier to debug. Hence, if you have the chance to avoid backtracking by restructuring the grammar (without complicating it too much), I'd recommend doing it.

Globular answered 22/8, 2017 at 8:8 Comment(0)

I've been looking through the FParsec source for the sepBy family of functions, and it looks like there's no way (currently) to do precisely what you're asking for with any of the sepBy functions. However, you might be able to approximate it well enough if your separator parser is simple.

To approximate what you're looking for, you could try using sepEndBy, or more likely sepEndBy1 in your case. Its intended use is to consume things like Python syntax for lists, where the final item is allowed to have a trailing comma: [ 1, 2, 3, ]. Using sepEndBy1 will consume the final separator, because that's what you want in its intended use. In your use case, though, you would have to work around the fact that sepEndBy1 consumes the final separator. E.g.,

// This parser will be either a separator *or* a prefix of the next item
let sharedParser = pstring "b"

let ap = sepEndBy1 (pstring "a") sharedParser
let restOfBp = pstring "c"
let bp = restOfBp <|> (pipe2 sharedParser restOfBp (fun b c -> b + c))

Here, the combining function in pipe2 is simple string concatenation, but in your actual usage scenario it might be different. The key idea is that if you are able to write a simple function to combine your b parser with your c parser to produce the bc result, and if the c parser is not too complicated, then this will work for you. If c is extremely complicated but b is simple, then just reverse the order of the two sides of <|>, so that b is tested first and c is only tested after b has either succeeded or failed; that way you won't have to apply the c parser twice, as you would in the sample code I just posted. I wrote it the way I did because that's slightly easier to understand.

If sepEndBy1 does not work for you, because both the b and c parsers in your real-life case are too expensive to parse twice, then the only solution I can think of would be to ask for this new sepBy variant to be added to FParsec as a new feature. My look through the FParsec code suggests that it would be possible to rewrite the implementation to optionally backtrack to before the final separator, but that it would need some significant testing and consideration of edge cases so it wouldn't be a five-minute fix. But if the sepEndBy1 workaround doesn't work for you, then go ahead and submit that feature request.

Samale answered 13/8, 2017 at 2:21 Comment(1)

On second thought, you might be able to use some combination of the attempt and/or lookAhead parser combinators to get what you need. I'll think it over and update my answer with a suggestion based on those -- though I might not have time to do so until tomorrow. – Samale 13/8, 2017 at 5:26

I'm not sure if the returned value matters, but if not, then the easiest way to go is to transcribe the grammar more closely:

let ap = pstring "a" >>. many (pstring "ba")
let bp = ap .>> pstring "bc"

Note that the pstring "ba" will not cause the same problem you had, because pstring cannot consume only part of its argument; either it consumes a full "ba", or it fails without consuming anything and therefore bp can successfully see a b if there is one.

Another possibility , if this is indeed the full grammar (ie ap is not used in another production elsewhere), is to change it to the following EBNF, which is equivalent:

ap = "ab", { "ab" }
cp = ap, "c"

This makes it easier to implement with FParsec:

let ap = many1 (pstring "ab")
let cp = ap .>> pstring "c"

Naoise answered 12/8, 2017 at 19:50 Comment(2)

-1, sorry. Anyone who has developed a real-world parser app would realize that pstring "a" is only an example, and in the real app there can be a complex parser instead. So suggesting pstring "ba" would be of no use here, and the reduplication is not always possible. – Badman 12/8, 2017 at 23:44

Well, in a real world case where b is a separator and a is what you're interested in having a list of, then you can do a .>>. many (try (b >>. a)), which is basically a more general version of my first answer. – Naoise 13/8, 2017 at 13:28

Recommended topics

Hot tags