Parser identifiers and free format text. Can this be done with FParsec?

Asked 14/5, 2013 at 3:33 Answered 14/5, 2013 at 10:4

As a follow-on to: How do I test for exactly 2 characters with fparsec?

I need to parse a string that consists of pairs of identifiers followed by freeform text. I can easily construct a parser that finds the identifiers which are of the form of newline followed by exactly two uppercase characters followed by a space. The freeform text, which is associated with the preceding identifier, is everything following the identifier up to but not including the next identifier.

So for example:

AB Now is the
time for all good
men.
CD Four score and seven years ago EF our.

contains two identifiers AB and CD and two pieces of freeform text

Now is the \ntime for all good men.
Four score and seven years ago EF our.

My problem is I don't know how to construct a parser that would match the freeform text but not match the identifiers. Is this a case where I need to do backtracking?

Can this be done and if so how?

Maomaoism answered 14/5, 2013 at 3:33 Comment(0)

I think notFollowedBy is what you're looking for. This should do the trick:

// adapted from the other question
let identifier = skipNewline >>. manyMinMaxSatisfy 2 2 CharParsers.isUpper

let freeform = manyChars (notFollowedBy identifier >>. anyChar)

Oxyacid answered 14/5, 2013 at 9:28 Comment(0)

Tarmil posted the straightforward solution.

Here's another variant, which doesn't need a newline at the beginning and which checks for a following identifier only at the end of lines:

let id = manyMinMaxSatisfyL 2 2 isUpper "ID" .>> pchar ' '

let text = 
    stringsSepBy (restOfLine true) 
                 ((notFollowedBy ((id >>% ()) <|> skipNewline <|> eof)) >>% "\n")

let parser = many (id .>>. text)

If you wanted to optimize the second parser used with the stringsSepBy combinator, you could replace it with the following version:

let notFollowedByIdOrEmptyLineOrEof : Parser<string,_> =
    fun stream ->
        let cs = stream.Peek2()
        let c0, c1 = cs.Char0, cs.Char1
        if c0 = '\r' || c0 = '\n' || c0 = EOS
           || (isUpper c0 && isUpper c1 && stream.Peek(2) = ' ')
        then Reply(Error, NoErrorMessages)
        else Reply("\n")

let text2 = stringsSepBy (restOfLine true) 
                         notFollowedByIdOrEmptyLineOrEof

Amylaceous answered 14/5, 2013 at 10:4 Comment(1)

Hi, could you please comment on this question: bitbucket.org/fparsec/main/issue/29/portable-version-of-fparsec – Furious 4/11, 2013 at 11:23

I think notFollowedBy is what you're looking for. This should do the trick:

// adapted from the other question
let identifier = skipNewline >>. manyMinMaxSatisfy 2 2 CharParsers.isUpper

let freeform = manyChars (notFollowedBy identifier >>. anyChar)

Oxyacid answered 14/5, 2013 at 9:28 Comment(0)

Recommended topics

Hot tags