Scala Parser Combinator, Ambiguous Grammar & Parse Forest

You cannot do this with Scala's built-in parser combinators. Packrat combinators are restricted to only unambiguous grammars. If you try to deal with ambiguity, you lose the ability to memoize and the parse complexity becomes O(k^n) even for unambiguous trails. So, don't do that.

What you want is a parser combinator framework that correctly handles ambiguity. As far as I know, the only such framework for Scala is my GLL combinators. The syntax is almost identical to Scala's parser combinators (the main difference being that higher-arity functions work correctly), but the output is a Stream[A], where A is the result type. Thus, you get a parse forest. Note that the result is not actually a SPPF (shared packed parse forest), so if you produce an exponential number of results, the stream will have proportional size. This was done to maintain ultimate flexibility in the result type.

GLL combinators is O(n^3) in the worst case, even for pathologically ambiguous grammars. In the average case, where the parse trail is unambiguous, GLL combinators is O(n). The constant time overhead is currently a bit excessive, but optimization is an ongoing project. We use GLL combinators in production at Precog, so you can expect the library to be well maintained going forward.

Recommended topics

Hot tags