I'm trying to augment Haskell's Attoparsec parser library with a function
takeRegex :: Regex -> Parser ByteString
using one of the regexp implementations.
(Motivation: Good regex libraries can provide performance that is linear to the length of the input, while attoparsec needs to backtrack. A portion of my input is particularly amenable to parsing using regexps, and even the backtracking Text.Regex.PCRE library gets me 4x speedup over attoparsec code for that piece.)
Attoparsec used to have a getInput :: Parser ByteString
function to get (witout consuming) remaining input; that would probably have been quite perfect for my purposes, as my input is non-incremental, strict and reasonably small – I run the parser for a line of a log file at a time. With it, it seems I could have done something like
takeRegex re = do
input <- getInput
m <- matchM re input
take $ length m
Unfortunately recent versions of attoparsec lack this function. Is there some way to achieve the same? Why has the function been removed?
Now there is the takeByteString :: Parser ByteString
function, which takes and consumes the rest of the input. If there was a function to attempt a parse and get the result without actually consuming anything, this could be used in conjunction with that, but I cannot seem to find (or figure out how to implement) such a function either.
Is there some way to achieve this with the current version of attoparsec?
Data.Attoparsec.Combinator.lookAhead
allows you to apply a parser (and get the result, if any) without consuming input. It's not restricted to one character of input. Wouldn't it be useful, here? – Remote