Parser combinator not terminating - how to log what is going on?
Asked Answered
S

3

7

I am experimenting with parser combinators and I often run into what seems like infinite recursions. Here is the first one I ran into:

import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader

class CombinatorParserTest extends Parsers {

  type Elem = Char

  def notComma = elem("not comma", _ != ',')

  def notEndLine = elem("not end line", x => x != '\r' && x != '\n')

  def text = rep(notComma | notEndLine)

}

object CombinatorParserTest {

  def main(args:Array[String]): Unit = {
    val p = new CombinatorParserTest()
    val r = p.text(new CharSequenceReader(","))
    // does not get here
    println(r)
  }

}

How can I print what is going on? And why does this not finish?

Samuelson answered 5/3, 2010 at 15:37 Comment(0)
A
4

Logging the attempts to parse notComma and notEndLine show that it is the end-of-file (shown as a CTRL-Z in the log(...)("mesg") output) that is being repeatedly parsed. Here's how I modified your parser for this purpose:

def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))

I'm not entirely sure what's going on (I tried many variations on your grammar), but I think it's something like this: The EOF is not really a character artificially introduced into the input stream, but rather a sort of perpetual condition at the end of the input. Thus this never-consumed EOF pseudo-character is repeatedly parsed as "either not a comma or not an end-of-line."

Aquila answered 5/3, 2010 at 16:53 Comment(1)
I do think that EOF is artificially introduced, but you're right in saying that it is repeatedly parsed at it seems repeatedly provided when requesting an additional character when the input is already at the end of the sequence.Samuelson
S
2

Ok, I think I've figured this out. `CharSequenceReader returns '\032' as a marker for the end of the input. So if I modify my input like this, it works:

import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader

class CombinatorParserTest extends Parsers {

  type Elem = Char

  import CharSequenceReader.EofCh

  def notComma = elem("not comma", x => x != ',' && x!=EofCh)

  def notEndLine = elem("not end line", x => x != '\r' && x != '\n' && x!=EofCh)

  //def text = rep(notComma | notEndLine)
  def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))

}

object CombinatorParserTest {

  def main(args:Array[String]): Unit = {
    val p = new CombinatorParserTest()
    val r = p.text(new CharSequenceReader(","))
    println(r)
  }

}

See source code for CharSequenceReader here. If the scaladoc mentioned it, it would have saved me a lot of time.

Samuelson answered 6/3, 2010 at 2:10 Comment(2)
Figure out where it should be mentioned, and open a doc ticket. If you can provide a patch with the modified scaladoc, so much the better.Bumblebee
Submitted lampsvn.epfl.ch/trac/scala/ticket/3147. There are multiple files using EofCh, so I'm not sure where the best place is.Samuelson
B
0

I find the logging function is extremely awkward to type. Like why do I have to do log(parser)("string")? Why not have something as simple as parser.log("string")?. Anyways, to overcome that, I made this instead:

trait Logging { self: Parsers =>

    // Used to turn logging on or off
    val debug: Boolean

    // Much easier than having to wrap a parser with a log function and type a message
    // i.e. log(someParser)("Message") vs someParser.log("Message")
    implicit class Logged[+A](parser: Parser[A]) {
        def log(msg: String): Parser[A] =
            if (debug) self.log(parser)(msg) else parser
    }
}

Now in your parser, you can mix-in this trait like so:

import scala.util.parsing.combinator.Parsers
import scala.util.parsing.input.CharSequenceReader


object CombinatorParserTest extends App with Parsers with Logging {

    type Elem = Char

    override val debug: Boolean = true

    def notComma: Parser[Char] = elem("not comma", _ != ',')
    def notEndLine: Parser[Char] = elem("not end line", x => x != '\r' && x != '\n')
    def text: Parser[List[Char]] = rep(notComma.log("notComma") | notEndLine.log("notEndLine"))

    val r = text(new CharSequenceReader(","))

    println(r)
}

You can also override the debug field to turn off the logging if so desired.

Running this also shows the second parser correctly parsed the comma:

trying notComma at scala.util.parsing.input.CharSequenceReader@506e6d5e
notComma --> [1.1] failure: not comma expected

,
^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@506e6d5e
notEndLine --> [1.2] parsed: ,
trying notComma at scala.util.parsing.input.CharSequenceReader@15975490
notComma --> [1.2] failure: end of input

,
 ^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@15975490
notEndLine --> [1.2] failure: end of input

,
 ^
The result is List(,)

Process finished with exit code 0
Basilica answered 28/4, 2017 at 17:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.