Use FParsec to parse a self-describing input
Asked Answered
V

3

7

I'm using FParsec to parse an input that describes its own format. For example, consider this input:

int,str,int:4,'hello',3

The first part of the input (before the colon) describes the format of the second part of the input. In this case, the format is int, str, int, which means that the actual data consists of three comma-separated values of the given types, so the result should be 4, "hello", 3.

What is the best way to parse something like this with FParsec?

I've pasted my best effort below, but I'm not happy with it. Is there a better way to do this that is cleaner, less stateful, and less reliant on the parse monad? I think this depends on smarter management of UserState, but I don't know how to do it. Thanks.

open FParsec

type State = { Formats : string[]; Index : int32 }
    with static member Default = { Formats = [||]; Index = 0 }

type Value =
    | Integer of int
    | String of string

let parseFormat : Parser<_, State> =
    parse {
        let! formats =
            sepBy
                (pstring "int" <|> pstring "str")
                (skipString ",")
                |>> Array.ofList
        do! updateUserState (fun state -> { state with Formats = formats })
    }

let parseValue format =
    match format with
        | "int" -> pint32 |>> Integer
        | "str" ->
            between
                (skipString "'")
                (skipString "'")
                (manySatisfy (fun c -> c <> '\''))
                    |>> String
        | _ -> failwith "Unexpected"

let parseValueByState =
    parse {
        let! state = getUserState
        let format = state.Formats.[state.Index]
        do! setUserState { state with Index = state.Index + 1}
        return! parseValue format
    }

let parseData =
    sepBy
        parseValueByState
        (skipString ",")

let parse =
    parseFormat
        >>. skipString ":"
        >>. parseData

[<EntryPoint>]
let main argv =
    let result = runParserOnString parse State.Default "" "int,str,int:4,'hello',3"
    printfn "%A" result
    0
Virginavirginal answered 19/4, 2015 at 7:36 Comment(1)
+1000 if I could. This parse computation expression was quite a revelation for an F#-FParsec greenhorn like myself.Enumeration
S
3

@bytebuster beat me to it but I still post my solution. The technique is similar to @bytebuster.

Thanks for an interesting question.

In compilers I believe the preferred technique is to parse the text into an AST and on that run a type-checker. For this example a potentially simpler technique would be that parsing the type definitions returns a set of parsers for the values. These parsers are then applied on the rest of the string.

open FParsec

type Value = 
  | Integer of int
  | String  of string

type ValueParser = Parser<Value, unit>

let parseIntValue : Parser<Value, unit> =
  pint32 |>> Integer

let parseStringValue : Parser<Value, unit> =
  between
    (skipChar '\'')
    (skipChar '\'')
    (manySatisfy (fun c -> c <> '\''))
    <?> "string"
    |>> String

let parseValueParser : Parser<ValueParser, unit> =
  choice 
    [
      skipString "int"  >>% parseIntValue
      skipString "str"  >>% parseStringValue
    ]

let parseValueParsers : Parser<ValueParser list, unit> =
    sepBy1
      parseValueParser
      (skipChar ',')

// Runs a list of parsers 'ps' separated by 'sep' parser
let sepByList (ps : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> =
  let rec loop adjust ps =
    match ps with
    | []    -> preturn []
    | h::t  ->
      adjust h >>= fun v -> loop (fun pp -> sep >>. pp) t >>= fun vs -> preturn (v::vs)
  loop id ps

let parseLine : Parser<Value list, unit> =
  parseValueParsers .>> skipChar ':' >>= (fun vps -> sepByList vps (skipChar ',')) .>> eof

[<EntryPoint>]
let main argv = 
    let s = "int,str,int:4,'hello',3"

    let r = run parseLine s

    printfn "%A" r

    0

Parsing int,str,int:4,'hello',3 yields Success: [Integer 4; String "hello";Integer 3].

Parsing int,str,str:4,'hello',3 (incorrect) yields:

Failure:
Error in Ln: 1 Col: 23
int,str,str:4,'hello',3
                      ^
Expecting: string
Snivel answered 19/4, 2015 at 17:17 Comment(4)
A parser that emits parsers is pretty cool. This seems very elegant, but is a bit over my head. Can you help me understand what sepByList is doing? If you're going to use >>= explicitly, wouldn't it be simpler to just switch to the monadic syntax instead?Virginavirginal
I tried to rewrite your sepByList below to clarify what it's doing. Please let me know what you think.Virginavirginal
The purpose of sepByList is to apply the parsers in the list ps sequentially, they are separated by sep parser. I avoided the parse monad as the question stated it wanted to rely less on it. Will check the rewrite later.Snivel
Parsers that produces functions instead of values (such as other parsers) are a bit of a brain twister but that is also the power of functional languages. Btw, I rewrote the sepByList to use the same naming as you did in your rewritten version.Snivel
P
5

There seem to be several problems with the original code, so I took my liberty to rewrite it from scratch.

First, several library functions that may appear useful in other FParsec-related projects:

/// Simple Map
/// usage: let z = Map ["hello" => 1; "bye" => 2]
let (=>) x y = x,y
let makeMap x = new Map<_,_>(x)

/// A handy construct allowing NOT to write lengthy type definitions
/// and also avoid Value Restriction error
type Parser<'t> = Parser<'t, UserState>

/// A list combinator, inspired by FParsec's (>>=) combinator
let (<<+) (p1: Parser<'T list>) (p2: Parser<'T>) =
    p1 >>= fun x -> p2 >>= fun y -> preturn (y::x)

/// Runs all parsers listed in the source list;
/// All but the trailing one are also combined with a separator
let allOfSepBy separator parsers : Parser<'T list> =
    let rec fold state =
        function
        | [] -> pzero
        | hd::[] -> state <<+ hd 
        | hd::tl -> fold (state <<+ (hd .>> separator)) tl
    fold (preturn []) parsers
    |>> List.rev    // reverse the list since we appended to the top

Now, the main code. The basic idea is to run parsing in three steps:

  1. Parse out the keys (which are plain ASCII strings)
  2. Map these keys to actual Value parsers
  3. Run these parsers in order

The rest seems to be commented within the code. :)

/// The resulting type
type Output =
    | Integer of int
    | String of string

/// tag to parser mappings
let mappings =
    [
        "int" => (pint32 |>> Integer)
        "str" => (
                    manySatisfy (fun c -> c <> '\'')
                    |> between (skipChar ''') (skipChar ''')
                    |>> String
                 )
    ]
    |> makeMap

let myProcess : Parser<Output list> =
    let pKeys =                     // First, we parse out the keys
        many1Satisfy isAsciiLower   // Parse one key; keys are always ASCII strings
        |> sepBy <| (skipChar ',')  // many keys separated by comma
        .>> (skipChar ':')          // all this with trailing semicolon
    let pValues = fun keys ->
        keys                        // take the keys list
        |> List.map                 // find the required Value parser
                                    // (NO ERROR CHECK for bad keys)
            (fun p -> Map.find p mappings)
        |> allOfSepBy (skipChar ',') // they must run in order, comma-separated
    pKeys >>= pValues

Run on string: int,int,str,int,str:4,42,'hello',3,'foobar'
Returned: [Integer 4; Integer 42; String "hello"; Integer 3; String "foobar"]

Peppi answered 19/4, 2015 at 17:0 Comment(1)
I see. This is very helpful, thanks. I wasn't comfortable enough with FParsec to manage Reply objects explicitly, but this is definitely simpler.Virginavirginal
S
3

@bytebuster beat me to it but I still post my solution. The technique is similar to @bytebuster.

Thanks for an interesting question.

In compilers I believe the preferred technique is to parse the text into an AST and on that run a type-checker. For this example a potentially simpler technique would be that parsing the type definitions returns a set of parsers for the values. These parsers are then applied on the rest of the string.

open FParsec

type Value = 
  | Integer of int
  | String  of string

type ValueParser = Parser<Value, unit>

let parseIntValue : Parser<Value, unit> =
  pint32 |>> Integer

let parseStringValue : Parser<Value, unit> =
  between
    (skipChar '\'')
    (skipChar '\'')
    (manySatisfy (fun c -> c <> '\''))
    <?> "string"
    |>> String

let parseValueParser : Parser<ValueParser, unit> =
  choice 
    [
      skipString "int"  >>% parseIntValue
      skipString "str"  >>% parseStringValue
    ]

let parseValueParsers : Parser<ValueParser list, unit> =
    sepBy1
      parseValueParser
      (skipChar ',')

// Runs a list of parsers 'ps' separated by 'sep' parser
let sepByList (ps : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> =
  let rec loop adjust ps =
    match ps with
    | []    -> preturn []
    | h::t  ->
      adjust h >>= fun v -> loop (fun pp -> sep >>. pp) t >>= fun vs -> preturn (v::vs)
  loop id ps

let parseLine : Parser<Value list, unit> =
  parseValueParsers .>> skipChar ':' >>= (fun vps -> sepByList vps (skipChar ',')) .>> eof

[<EntryPoint>]
let main argv = 
    let s = "int,str,int:4,'hello',3"

    let r = run parseLine s

    printfn "%A" r

    0

Parsing int,str,int:4,'hello',3 yields Success: [Integer 4; String "hello";Integer 3].

Parsing int,str,str:4,'hello',3 (incorrect) yields:

Failure:
Error in Ln: 1 Col: 23
int,str,str:4,'hello',3
                      ^
Expecting: string
Snivel answered 19/4, 2015 at 17:17 Comment(4)
A parser that emits parsers is pretty cool. This seems very elegant, but is a bit over my head. Can you help me understand what sepByList is doing? If you're going to use >>= explicitly, wouldn't it be simpler to just switch to the monadic syntax instead?Virginavirginal
I tried to rewrite your sepByList below to clarify what it's doing. Please let me know what you think.Virginavirginal
The purpose of sepByList is to apply the parsers in the list ps sequentially, they are separated by sep parser. I avoided the parse monad as the question stated it wanted to rely less on it. Will check the rewrite later.Snivel
Parsers that produces functions instead of values (such as other parsers) are a bit of a brain twister but that is also the power of functional languages. Btw, I rewrote the sepByList to use the same naming as you did in your rewritten version.Snivel
V
1

I rewrote @FuleSnabel's sepByList as follows to help me understand it better. Does this look right?

let sepByList (parsers : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> = 
    let rec loop adjust parsers =
        parse {
            match parsers with
                | [] -> return []
                | parser :: tail ->
                    let! value = adjust parser
                    let! values = loop (fun parser -> sep >>. parser) tail
                    return value :: values
        }
    loop id parsers
Virginavirginal answered 20/4, 2015 at 4:10 Comment(2)
Looks right to me but perhaps add this to your original post instead of posting it as answer?Snivel
Sorry, yes. I'm not too familiar with Stack Overflow conventions.Virginavirginal

© 2022 - 2024 — McMap. All rights reserved.