There are many open sourced parser implementations available to us in Haskell. Parsec seems to be the standard for text parsing and attoparsec seems to be a popular choice for binary parsing but I don't know much beyond that. Is there a particular decision tree that you follow for choosing a parser implementation? Have you learned anything interesting about the strengths or weaknesses of the libraries?
You have several good options.
For lightweight parsing of String types:
For packed bytestring parsing, e.g. of HTTP headers.
For actual binary data most people use either:
The main question to ask yourself is what is the underlying string type?
- String?
- bytestring (strict)?
- bytestring (lazy)?
- unicode text
That decision largely determines which parser toolset you'll use.
The second question to ask is: do I already have a grammar for the data type? If so, I can just use happy
And obviously for custom data types there are a variety of good existing parsers:
Just to add to Don's post: Personally, I quite like Text.ParserCombinators.ReadP (part of base) for no-nonsense quick and easy stuff. Particularly when Parsec seems like overkill.
There is a bytestringreadp library for the bytestring version, but it doesn't cover Char8 bytestrings, and I suspect attoparsec would be a better choice at this point.
I recently converted some code from Parsec to Attoparsec. Both are quite capable.
Attoparsec wins on performance and memory footprint, but Parsec provides better error reporting and has more complete documentation.
Bryan O’Sullivan’s blog post What’s in a parser? Attoparsec rewired (2/2) includes a nice performance benchmark comparing several implementations along with some comments comparing memory usage.
© 2022 - 2024 — McMap. All rights reserved.