I am writing a Scala parser combinator grammar that reads newline-delimited word lists, where lists are separated by one or more blank lines. Given the following string:
cat
mouse
horse
apple
orange
pear
I would like to have it return List(List(cat, mouse, horse), List(apple, orange, pear))
.
I wrote this basic grammar which treats word lists as newline-delimited words. Note that I had to override the default definition of whitespace
.
import util.parsing.combinator.RegexParsers
object WordList extends RegexParsers {
private val eol = sys.props("line.separator")
override val whiteSpace = """[ \t]+""".r
val list: Parser[List[String]] = repsep( """\w+""".r, eol)
val lists: Parser[List[List[String]]] = repsep(list, eol)
def main(args: Array[String]) {
val s =
"""cat
|mouse
|horse
|
|apple
|orange
|pear""".stripMargin
println(parseAll(lists, s))
}
}
This incorrectly treats blank lines as empty word lists, i.e. it returns
[8.1] parsed: List(List(cat, mouse, horse), List(), List(apple, orange, pear))
(Note the empty list in the middle.)
I can put an optional end of line at the end of each list.
val list: Parser[List[String]] = repsep( """\w+""".r, eol) <~ opt(eol)
This handles the case where there is a single blank line between lists, but has the same problem with multiple blank lines.
I tried changing the lists
definition to allow multiple end-of-line delimiters:
val lists:Parser[List[List[String]]] = repsep(list, rep(eol))
but this hangs on the above input.
What is the correct grammar that will handle multiple blank lines as delimiters?
skipWhitespace
andeoi
examples are very helpful.) – Whiting