How can I create a parser combinator in which line endings are significant?
Asked Answered
A

2

16

I am creating a DSL, and using Scala's parser combinator library to parse the DSL. The DSL follows a simple, Ruby-like syntax. A source file can contain a series of blocks that look like this:

create_model do
  at 0,0,0
end

Line endings are significant in the DSL, as they are effectively used as statement terminators.

I wrote a Scala parser that looks like this:

class ML3D extends JavaTokenParsers {
  override val whiteSpace = """[ \t]+""".r

  def model: Parser[Any] = commandList
  def commandList: Parser[Any] = rep(commandBlock)
  def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"
  def eol: Parser[Any] = """(\r?\n)+""".r
  def command: Parser[Any] = commandName~opt(commandLabel)
  def commandName: Parser[Any] = ident
  def commandLabel: Parser[Any] = stringLiteral
  def statementList: Parser[Any] = rep(statement)
  def statement: Parser[Any] = functionName~argumentList~eol
  def functionName: Parser[Any] = ident
  def argumentList: Parser[Any] = repsep(argument, ",")
  def argument: Parser[Any] = stringLiteral | constant
  def constant: Parser[Any] = wholeNumber | floatingPointNumber
}

Since line endings matter, I overrode whiteSpace so that it'll only treat spaces and tabs as whitespace (instead of treating new lines as whitespace, and thus ignoring them).

This works, except for the "end" statement for commandBlock. Since my source file contains a trailing new line, the parser complains that it was expecting just an end but got a new line after the end keyword.

So I changed commandBlock's definition to this:

def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)

(That is, I added an optional new line after "end").

But now, when parsing the source file, I get the following error:

[4.1] failure: `end' expected but `' found

I think this is because, after it sucks it the trailing new line, the parser is encountering an empty string which it thinks is invalid, but I'm not sure why it's doing this.

Any tips on how to fix this? I might extending the wrong parser from Scala's parser combinator library, so any suggestions on how to create a language definition with significant new line characters is also welcome.

Adjunct answered 4/3, 2010 at 20:54 Comment(0)
D
9

I get the same error in both ways, but I think you are misinterpreting it. What it's saying is that it is expecting an end, but it already reached the end of the input.

And the reason that is happening is that end is being read as a statement. Now, I'm sure there's a nice way to solve this, but I'm not experienced enough with Scala parsers. It seems the way to go would be to use token parsers with a scanning part, but I couldn't figure a way to make the standard token parser not treat newlines as whitespace.

So, here's an alternative:

import scala.util.parsing.combinator.JavaTokenParsers

class ML3D extends JavaTokenParsers {
  override val whiteSpace = """[ \t]+""".r
  def keywords: Parser[Any] = "do" | "end"
  def identifier: Parser[Any] = not(keywords)~ident

  def model: Parser[Any] = commandList
  def commandList: Parser[Any] = rep(commandBlock)
  def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)
  def eol: Parser[Any] = """(\r?\n)+""".r
  def command: Parser[Any] = commandName~opt(commandLabel)
  def commandName: Parser[Any] = identifier
  def commandLabel: Parser[Any] = stringLiteral
  def statementList: Parser[Any] = rep(statement)
  def statement: Parser[Any] = functionName~argumentList~eol
  def functionName: Parser[Any] = identifier
  def argumentList: Parser[Any] = repsep(argument, ",")
  def argument: Parser[Any] = stringLiteral | constant
  def constant: Parser[Any] = wholeNumber | floatingPointNumber
}
Daveen answered 5/3, 2010 at 2:28 Comment(4)
I like your interpretation of the error message. I wonder if there is a way to have the parser print what it's trying to match as it goes along. That would make troubleshooting easier.Blacktop
You can wrap any reference to a production appearing in another production's right-hand side in log(...) and you'll get trace output whenever the parse attempts to match that non-terminal. E.g., to log a particular attempt to match model replace that non-terminal reference in a rule with log(model).Physiognomy
Ah, yes, I see the issue now -- end was being read under functionName, since it was a valid function name. I implemented your changes and it works fine now, thanks a lot.Adjunct
@Randall I have created a separate question for logging parse attempts #2388392. The tip is really helpful.Blacktop
P
0

You can either override the protected val whiteSpace (a Regex) whose default is """\s+""".r or override the protected def handleWhiteSpace(...) method if you need more control than is readily achieved with a regular expression. Both these members orginate in RegexParsers, which is the base class for JavaTokenParsers.

Physiognomy answered 4/3, 2010 at 21:0 Comment(2)
I am overriding whiteSpace (see the code above), but that still results in an error.Adjunct
Yes, I see. Try changing that opt(eol) to eol * (or, equally, rep(eol)).Physiognomy

© 2022 - 2024 — McMap. All rights reserved.