Question
I know Parsec
and uu-parsinglib
and I've written parsers in both of them. Recently I discovered, that there is a problem in uu-parsinglib
, which could significantly affect its performance and I do not see a way to solve it.
Lets consider following Parsec parsers:
pa = char 'a'
pb = char 'b'
pTest = many $ try(pa <* pb)
What would be the equivalent in uu-parsinglib
? It would not be the following:
pa = pSym 'a'
pb = pSym 'b'
pTest = pList_ng (pa <* pb)
The difference is, that in Parsec
, many
would eat (pa <* pb)
(pairs of "ab"
) greedy until it is not matched anymore, while in uu-parsinglib
, pList_ng
is not greedy, so it will keep in memory possible backtrack ways after parsing each (pa <* pb)
.
Is there any way to write something like pList(try(pa <* pb))
in uu-parsinglib
?
Example
A good example would be
pExample = pTest <* (pa <* pb)
and a sample input of "ababab"
.
With Parsec
, we would get error (because pTest
is greedy parsing pairs of "ab"
), but in uu-parsinglib
, the string would be parsed with no problems.
Edit
We cannot switch from pList_ng
to pList
, because it would be not equivalent to Parsec
version. For example:
pExample2 = pTest <* pa
and a sample input of "ababa"
would success in Parsec
, but fail in uu-parsinglib
, when using greedy pList
.
Of course uu-parsinglib
will succeed if we use pList_ng
here, but it could be a lot slower for some inputs and rules. For example, considering the input "ababab"
, Parsec
would simply fail, because pTest
would consume whole string and pa
would not be matched. uu-parsinglib
will fail also, but checking a more steps - it will match whole string and fail, then throw away last "ab"
pair and fail again, etc. If we have some nested rules and funny input text, it will make a huge difference.
A little benchmark
To prove, that the problem exists in real, consider following grammar (in a pseudocode - but I think it is very intuitive):
pattern = many("ab") + "a"
input = many(pattern)
So as input to our program we get a string containing patterns, for example "abababaaba" contains 2 patterns "abababa" and "aba".
Lets make parsers in both libraries!
uu-parsinglib
:
import Data.Char
import qualified Text.ParserCombinators.UU as UU
import Text.ParserCombinators.UU hiding(parse)
import Text.ParserCombinators.UU.BasicInstances hiding (Parser)
import System.TimeIt (timeIt)
pa = pSym 'a'
pb = pSym 'b'
pTest = pList $ pList_ng ((\x y -> [x,y]) <$> pa <*> pb) <* pa
main :: IO ()
main = do
timeIt maininner
return ()
maininner = do
let (x,e) = UU.parse ((,) <$> pTest <*> pEnd) (createStr (LineColPos 0 0 0) (concat $ replicate 1000 (concat (replicate 1000 "ab") ++ "a")))
print $ length x
Parsec
:
import Control.Applicative
import Text.Parsec hiding (many, optional, parse, (<|>))
import qualified Text.Parsec as Parsec
import System.TimeIt (timeIt)
pa = char 'a'
pb = char 'b'
pTest = many $ many (try ((\x y -> [x,y]) <$> pa <*> pb)) <* pa
main :: IO ()
main = do
timeIt maininner2
return ()
maininner2 = do
let Right x = Parsec.runParser pTest (0::Int) "test" $ (concat $ replicate 1000 (concat (replicate 1000 "ab") ++ "a"))
print $ length x
Result? uu-parsinglib
is > 300% slower:
uu-parsinglib - 3.19s
Parsec - 1.04s
(compiled with ghc -O3
flag)
_ng
variant? Could you clarify? – Strohlamb
combinator to specify that you're parsing an ambiguous grammar. If I understand danilo2's example correctly, switching topList
should make uu-parsinglib behave similar to Parsec (that is, produce an error). – StrohlpList
will not make simmilar behaviour toParsec
. If we use the greedypList
instead ofpList_ng
, Parsec would success onpExample2
, butuu-parsinglib
would fail. – MeatheadParsec
version,try(pa <* pb)
is treated as "single element" - ifpa
fails, the whole element fails. Combinatormany
is greedy. I want to get inuu-parsinglib
exactly the same behaviour than inpExample2
inParsec
. (Please take a look at the updated question) – Meatheaduu-parsinglib
is a big and stable library and should allow for tweaking the performance crutial elements, likeParsec
does. – Meatheaduu-parsinglib
internals as well, but I do not think we have to make a benchmarks to see one clear thing - If we focus onExample1
then we see, thatuu-parsinglib
is making a backtracking (when providing input of"ababab"
), so it does also inExample2
. I do not like the idea of writing 2 compilers in both libraries and benchmark them to prove one is slower than another (and I put a great emphasis on performance in case of this compiler). (and no, I will not useattoparsec
:) ) – Meathead