Scala Parser Combinators tricks for recursive bnf?

Asked 27/7, 2010 at 12:42 Answered 27/7, 2010 at 14:28

Im trying to match this syntax:

pgm ::= exprs
exprs ::= expr [; exprs]
expr ::= ID | expr . [0-9]+

My scala packrat parser combinator looks like this:

import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    def pgm = repsep(expr,";")
    def expr :Parser[Any]= ident | expr~"."~num
    def num = numericLit

       def parse(input: String) =
    phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def main(args: Array[String]) {
      val prg = "x.1.2.3;" +
            "y.4.1.1;" +
            "z;" +
            "n.1.10.30"


            parse(prg);
    }
}

But this doesnt work. Either it "matches greedy" and tells me:

[1.2] failure: end of input expected 
x.1.2.3;y.4.1.1;z;n.1.10.30

or if I change the | to a ||| I get a stackoverflow:

Exception in thread "main" java.lang.StackOverflowError
at java.lang.Character.isLetter(Unknown Source)
at java.lang.Character.isLetter(Unknown Source)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
...

I kindoff understand why I get the errors; what can I do to parse a syntax like the above? It doesnt seem that esoteric to me

EDIT: Based on the paper referenced in http://scala-programming-language.1934581.n4.nabble.com/Packrat-parser-guidance-td1956908.html I found out that my program didnt actually use the new packrat parser.

Ie. change Parser[Any] to PackratParser[Any] and use lazy val instead of def

I rewrote the above to this:

import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    lazy val pgm : PackratParser[Any] = repsep(expr,";")
    lazy val expr :PackratParser[Any]= expr~"."~num | ident
    lazy val num = numericLit

    def parse(input: String) =
    phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def main(args: Array[String]) {
      val prg = "x.1.2.3 ;" +
            "y.4.1.1;" +
            "z;" +
            "n.1.10.30"


            parse(prg);
    }
}

Heidt answered 27/7, 2010 at 12:42 Comment(0)

The problem is (at least partially) that you're not actually using Packrat parsers. See the documentation for Scala's PackratParsers trait, which says

Using PackratParsers is very similar to using Parsers:

any class/trait that extends Parsers (directly or through a subclass) can mix in PackratParsers. Example: object MyGrammar extends StandardTokenParsers with PackratParsers

each grammar production previously declared as a def without formal parameters becomes a lazy val, and its type is changed from Parser[Elem] to PackratParser[Elem]. So, for example, def production: Parser[Int] = {...} becomes lazy val production: PackratParser[Int] = {...}

Important: using PackratParsers is not an all or nothing decision. They can be free mixed with regular Parsers in a single grammar.

I don't know enough about Scala 2.8's parser combinators to fix this entirely, but with the following modifications, I was able to get it to parse as far as the semicolon, which is an improvement over what you've accomplished.

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    lazy val pgm:PackratParser[Any] = repsep(expr,";")
    lazy val expr:PackratParser[Any]= ident ||| (expr~"."~numericLit)

    def parse(input: String) = phrase(expr)(lex(input)) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def lex(input:String) = new PackratReader(new lexical.Scanner(input))
}

Salvo answered 27/7, 2010 at 14:28 Comment(1)

Exactly! I was rereading documentation and figured this out. The last thing needed is a typo in the parse method: phrase(expr) should be phrase(pgm). Cheers! – Heidt 27/7, 2010 at 14:33

The production

expr ::= ID | expr . [0-9]+

is left recursive. It expands to

expr ::= ID
expr ::= expr . [0-9]+

where the left recursion occurs on the 2nd line. This is what causes the parser to overflow the stack.

You should rewrite your grammar avoiding left recursive productions.

expr ::= ID {. [0-9]+}

Angiology answered 27/7, 2010 at 13:21 Comment(1)

Wasnt the packrat parser supposed to allow me to do left-recursion? – Heidt 27/7, 2010 at 13:53

Recommended topics

Hot tags