Table-based parsers require separation of lexical analysis and parsing because of their limited lookahead capability. Looking ahead far enough to combine lexical analysis into the parser would explode the state space.
Combinator-based approaches do not usually suffer this problem, as they are typically doing recursive-descent parsing. Unless otherwise noted by the library author, there is no harm in combining the phases and not much to gain by separating them.
Although uu-parsinglib provides the Str
class to abstract over different string-like inputs, looking at its definition shows that it still assumes that you are ultimately reading a sequence of Char, whether they be from a String, ByteString, Text, etc. So trying to get it to parse a MyToken stream seems like it could be difficult. Parsec might be a better choice if you feel you need to do that.
As to your question about your string implementation, combinators take a string-like input containing syntactic structure and return the corresponding semantic value, if they match. Inside the combinator, you get to build that semantic value from what you parse directly by taking from the input stream and by combining the semantic values from sub-combinators you call.
So, your 'String matching' combinator in your example will have a list of tokens in its scope thanks to the parsing it did. You can use the full power of Haskell to combine those tokens into a single MyString value in whatever way makes sense for your language: Maybe a 'SplicedString' type that represents what values are to be sliced into it.
The string combinator was probably called by an 'expression' combinator, which will be able to combine the MyString value with other parsed values into a MyExpression value. It's combinators returning semantic values all the way back up!
"test: $name"
. So generally I've written some "end" combinators, which results in lists of tokens and then I'm summing them together. Is this a bad approach? – OsteotomyNewline
andIndent Int
(indentation of new line) tokens). The tokenstream may work, but maybe there is a better approach? – OsteotomytokenizeSegment :: P String [MyToken]
andparseSegment :: P [MyToken] SegmentAST
then do something likeparse parseSegment <$> tokenizeSegment :: P String (Either ParseError SegmentAST)
. You can also handle that error by deconstructing theEither
and passing theParseError
up to the main parser. – RusticatetokenizeSegment
. For example the result could be[Newline, Indentation 2]
(Indentation means the intentation level) and you cannot convert that to any AST structure unless you know the next segment? – OsteotomypToken
combinator could always return one token if I can check a state inside of it (for example after newline I'll switch to anewline state
etc) - but then I have to somehow convertuu-parsinglib
monads to support state. – Osteotomy