It's actually surprisingly easy to build Parsec-from-scratch. The actual library code itself is heavily generalized and optimized which contorts the core abstraction, but if you're just building things from scratch to understand more about what's going on you can write it in just a few lines of code. I'll build a slightly weaker Applicative
parser below.
Essentially, we want to produce a Applicative
, Parser
along with a primitive parser value
satisfy :: (Char -> Bool) -> Parser Char
and a few combinators such as try
, which "undoes" a parser if it fails
try :: Parser a -> Parser a
and orElse
which allows us to continue with a second parser if the first parser fails. Usually this is actually written with the infix combinator (<|>)
orElse, (<|>) :: Parser a -> Parser a -> Parser a
Since our Applicative
needs to keep track of the current stream state and be able to fail, we'll build it by combining the state Applicative
and the Either
applicative.
type Error = String
newtype Parser a = P { unP :: String -> (String, Either Error a) }
instance Functor Parser where
fmap f (P st) = P $ \stream -> case st stream of
(res, Left err) -> (res, Left err)
(res, Right a ) -> (res, Right (f a))
instance Applicative Parser where
pure a = P (\stream -> (stream, Right a))
P ff <*> P xx = P $ \stream0 -> case ff stream0 of -- produce an f
(stream1, Left err) -> (stream1, Left err)
(stream1, Right f ) -> case xx stream1 of -- produce an x
(stream2, Left err) -> (stream2, Left err)
(stream2, Right x ) -> (stream2, Right (f x)) -- return (f x)
If we follow the (<*>)
method in the Applicative
instance carefully we see that it just passes the stream into the f
-producing Parser
, takes the result stream and, if successful, passes it to the x
-producing Parser
and if they both succeed it returns their application (f x)
. This means that if we have a function-producing parser and an argument-producing parser we can sequence them with (<*>)
-- given
parseChar :: Char -> Parser Char
parseHi :: Parser (Char, Char) -- parses 'h' then 'i'
parseHi = pure (,) <$> parseChar 'h' <*> parseChar 'i'
We can use the mechanics of this Applicative
to build the needed combinators as well. Here's satisfy
-- | Peek at the next character and return successfully if it satisfies a predicate
satisfy :: (Char -> Bool) -> Parser Char
satisfy f = P $ \stream -> case stream of
[] -> ([], Left "end of stream")
(c:cs) | f c -> (cs, Right c)
| otherwise -> (cs, Left "did not satisfy")
And here's try
-- | Run a parser but if it fails revert the stream to it's original state
try :: Parser a -> Parser a
try (P f) = P $ \stream0 -> case f stream0 of
(_ , Left err) -> (stream0, Left err)
(stream1, Right a ) -> (stream1, Right a )
And here's orElse
orElse :: Parser a -> Parser a -> Parser a
orElse (P f1) (P f2) = P $ \stream0 -> case f1 stream0 of
(stream1, Left err) -> f2 stream1
(stream1, Right a ) -> (stream1, Right a)
Typically at this point we also note that Parser
forms an Alternative
instance with orElse
if we also provide an immediately failing parser, empty
instance Alternative Parser where
empty = P $ \stream -> (stream, Left "empty")
(<|>) = orElse
many = manyParser
some = someParser
And we can write manyParser
and someParser
as combinators which run a parser repeatedly.
-- | 0 or more
manyParser :: Parser a -> Parser [a]
manyParser (P f) = P go where
go stream = case f stream of
(_ , Left err) -> (stream, Right []) -- throws away the error
(stream', Right a ) -> case go stream' of
(streamFin, Left err) -> (streamFin, Left err)
(streamFin, Right as) -> (streamFin, Right (a : as))
-- | 1 or more
someParser :: Parser a -> Parser [a]
someParser (P f) = P $ \stream -> case f stream of
(stream', Left err) -> (stream', Left err)
(stream', Right a ) ->
let (P fmany) = manyParser (P f)
in case fmany stream' of
(stream'', Left err) -> (stream'', Left err)
(stream'', Right as) -> (stream'', Right (a:as))
And from here we can begin to work at much higher levels of abstraction.
char :: Char -> Parser Char
char c = satisfy (== c)
string :: String -> Parser String
string [] = pure []
string (c:cs) = (:) <$> char c <*> string cs
oneOf :: [Char] -> Parser Char
oneOf cs = satisfy (`elem` cs)
parens :: Parser a -> Parser a
parens parseA = dropFirstAndLast <$> char '(' <*> parseA <*> char ')'
where
dropFirstAndLast _ a _ = a