Writing a parser from scratch in Haskell
Asked Answered
D

2

32

Is there any good tutorial for writing a parser for a given grammar in Haskell from scratch?

I found:

  1. parsing expressions and statements (HaskellWiki)

  2. Parsing a simple imperative language (HaskellWiki)

  3. Using parsec (Real World Haskell)

but all of these use the parsec library, and while that may be interesting for industrial applications I'm specifically looking for examples that are 'bottom up' without the use of sophisticated libraries.

The only one I found which uses 'basic' Haskell is this: Parsing with Haskell which uses some very foreign syntax (it's very hard to distinguish whats part of the program or whats only 'pseudocode') and there's no explicit grammar definition.

Any suggestions are highly appreciated!

Dunlin answered 18/12, 2013 at 14:26 Comment(4)
I found the parsec user manual very helpful. legacy.cs.uu.nl/daan/parsec.htmlDisgraceful
Indeed it's helpful and I'm working with it right now, it's good to get a feeling for the topic, but as I mentioned a complete tutorial/example for the implementation of a parser from ground up would be a great to take a look 'under the hood'.Dunlin
Jeroen Fokker's Functional Parsers is worth a read.Arruda
As well as the Parsec manual, the original distribution of Parsec from Daan Leijen's legacy page has complete example parsers for two small languages - Henk and Mondrian.Selway
F
46

It's actually surprisingly easy to build Parsec-from-scratch. The actual library code itself is heavily generalized and optimized which contorts the core abstraction, but if you're just building things from scratch to understand more about what's going on you can write it in just a few lines of code. I'll build a slightly weaker Applicative parser below.

Essentially, we want to produce a Applicative, Parser along with a primitive parser value

satisfy :: (Char -> Bool) -> Parser Char

and a few combinators such as try, which "undoes" a parser if it fails

try :: Parser a -> Parser a

and orElse which allows us to continue with a second parser if the first parser fails. Usually this is actually written with the infix combinator (<|>)

orElse, (<|>) :: Parser a -> Parser a -> Parser a

Since our Applicative needs to keep track of the current stream state and be able to fail, we'll build it by combining the state Applicative and the Either applicative.

type Error = String
newtype Parser a = P { unP :: String -> (String, Either Error a) }

instance Functor Parser where
  fmap f (P st) = P $ \stream -> case st stream of
    (res, Left err) -> (res, Left err)
    (res, Right a ) -> (res, Right (f a))

instance Applicative Parser where
  pure a = P (\stream -> (stream, Right a))
  P ff <*> P xx = P $ \stream0 -> case ff stream0 of   -- produce an f
    (stream1, Left err) -> (stream1, Left err)
    (stream1, Right f ) -> case xx stream1 of          -- produce an x
      (stream2, Left err) -> (stream2, Left err)
      (stream2, Right x ) -> (stream2, Right (f x))    -- return (f x)

If we follow the (<*>) method in the Applicative instance carefully we see that it just passes the stream into the f-producing Parser, takes the result stream and, if successful, passes it to the x-producing Parser and if they both succeed it returns their application (f x). This means that if we have a function-producing parser and an argument-producing parser we can sequence them with (<*>)

-- given
parseChar :: Char -> Parser Char

parseHi   :: Parser (Char, Char) -- parses 'h' then 'i'
parseHi = pure (,) <$> parseChar 'h' <*> parseChar 'i'

We can use the mechanics of this Applicative to build the needed combinators as well. Here's satisfy

-- | Peek at the next character and return successfully if it satisfies a predicate
satisfy :: (Char -> Bool) -> Parser Char
satisfy f = P $ \stream -> case stream of
  []                 -> ([], Left "end of stream")
  (c:cs) | f c       -> (cs, Right c)
         | otherwise -> (cs, Left "did not satisfy")

And here's try

-- | Run a parser but if it fails revert the stream to it's original state
try :: Parser a -> Parser a
try (P f) = P $ \stream0 -> case f stream0 of
  (_      , Left err) -> (stream0, Left err)
  (stream1, Right a ) -> (stream1, Right a )

And here's orElse

orElse :: Parser a -> Parser a -> Parser a
orElse (P f1) (P f2) = P $ \stream0 -> case f1 stream0 of
  (stream1, Left err) -> f2 stream1
  (stream1, Right a ) -> (stream1, Right a)

Typically at this point we also note that Parser forms an Alternative instance with orElse if we also provide an immediately failing parser, empty

instance Alternative Parser where
  empty = P $ \stream -> (stream, Left "empty")
  (<|>) = orElse

  many = manyParser
  some = someParser

And we can write manyParser and someParser as combinators which run a parser repeatedly.

-- | 0 or more
manyParser :: Parser a -> Parser [a]
manyParser (P f) = P go where
  go stream = case f stream of
    (_      , Left err) -> (stream, Right [])  -- throws away the error
    (stream', Right a ) -> case go stream' of 
      (streamFin, Left err) -> (streamFin, Left err)
      (streamFin, Right as) -> (streamFin, Right (a : as))

-- | 1 or more
someParser :: Parser a -> Parser [a]
someParser (P f) = P $ \stream -> case f stream of
  (stream', Left err) -> (stream', Left err)
  (stream', Right a ) -> 
    let (P fmany) = manyParser (P f)
    in case fmany stream' of
      (stream'', Left err) -> (stream'', Left err)
      (stream'', Right as) -> (stream'', Right (a:as))

And from here we can begin to work at much higher levels of abstraction.

char :: Char -> Parser Char
char c = satisfy (== c)

string :: String -> Parser String
string []     = pure []
string (c:cs) = (:) <$> char c <*> string cs

oneOf :: [Char] -> Parser Char
oneOf cs = satisfy (`elem` cs)

parens :: Parser a -> Parser a
parens parseA = dropFirstAndLast <$> char '(' <*> parseA <*> char ')'
  where
    dropFirstAndLast _ a _ = a
Frowzy answered 18/12, 2013 at 16:2 Comment(2)
There is probably a typo in orElse function - there should be a f2 stream0 call. Am I right? BTW, excellent answer - thanks!!!Maramarabel
Depends on how you want to do backtracking. You can build what you want using try to constrain where backtracking occurs. Different designs make different choices about backtracking semantics.Frowzy
I
7

I really liked Graham Hutton's "Programming in Haskell". It gives a gentle introduction to monads and parser combinators. The eighth chapter builds a basic parser library.

Here is the link to Programming in Haskell book site. You will also find link to the parser library and basic expression parser.

Further, if you are interested, you can also look at uu-parsinglib applicative style parser combinators developed at Utrecht University.

Impulse answered 18/12, 2013 at 15:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.