Scala combinator parser, what does >> mean?
Asked Answered
S

2

5

I am little bit confusing about ">>" in scala. Daniel said in Scala parser combinators parsing xml? that it could be used to parameterize the parser base on result from previous parser. Could someone give me some example/hint ? I already read scaladoc but still not understand it.

thanks

Streit answered 2/11, 2011 at 10:55 Comment(0)
E
16

As I said, it serves to parameterize a parser, but let's walk through an example to make it clear.

Let's start with a simple parser, that parses a number follow by a word:

def numberAndWord = number ~ word
def number        = "\\d+".r
def word          = "\\w+".r

Under RegexParsers, this will parse stuff like "3 fruits".

Now, let's say you also want a list of what these "n things" are. For example, "3 fruits: banana, apple, orange". Let's try to parse that to see how it goes.

First, how do I parse "N" things? As it happen, there's a repN method:

def threeThings = repN(3, word)

That will parse "banana apple orange", but not "banana, apple, orange". I need a separator. There's repsep that provides that, but that won't let me specify how many repetitions I want. So, let's provide the separator ourselves:

def threeThings = word ~ repN(2, "," ~> word)

Ok, that words. We can write the whole example now, for three things, like this:

def listOfThings = "3" ~ word ~ ":" ~ threeThings
def word         = "\\w+".r
def threeThings  = word ~ repN(2, "," ~> word)

That kind of works, except that I'm fixing "N" in 3. I want to let the user specify how many. And that's where >>, also known as into (and, yes, it is flatMap for Parser), comes into. First, let's change threeThings:

def things(n: Int) = n match {
  case 1          => word ^^ (List(_))
  case x if x > 1 => word ~ repN(x - 1, "," ~> word) ^^ { case w ~ l => w :: l }
  case x          => err("Invalid repetitions: "+x)
}

This is slightly more complicated than you might have expected, because I'm forcing it to return Parser[List[String]]. But how do I pass a parameter to things? I mean, this won't work:

def listOfThings = number ~ word ~ ":" ~ things(/* what do I put here?*/)

But we can rewrite that like this:

def listOfThings = (number ~ word <~ ":") >> {
  case n ~ what => things(n.toInt)
}

That is almost good enough, except that I now lost n and what: it only returns "List(banana, apple, orange)", not how many there ought to be, and what they are. I can do that like this:

def listOfThings   = (number ~ word <~ ":") >> {
  case n ~ what => things(n.toInt) ^^ { list => new ~(n.toInt, new ~(what, list)) }
}
def number         = "\\d+".r
def word           = "\\w+".r
def things(n: Int) = n match {
  case 1          => word ^^ (List(_))
  case x if x > 1 => word ~ repN(x - 1, "," ~> word) ^^ { case w ~ l => w :: l }
  case x          => err("Invalid repetitions: "+x)
}

Just a final comment. You might have wondered asked yourself "what do you mean flatMap? Isn't that a monad/for-comprehension thingy?" Why, yes, and yes! :-) Here's another way of writing listOfThings:

def listOfThings   = for {
  nOfWhat  <- number ~ word <~ ":"
  n ~ what = nOfWhat
  list     <- things(n.toInt)
}  yield new ~(n.toInt, new ~(what, list))

I'm not doing n ~ what <- number ~ word <~ ":" because that uses filter or withFilter in Scala, which is not implemented by Parsers. But here's even another way of writing it, that doesn't have the exact same semantics, but produce the same results:

def listOfThings   = for {
  n    <- number
  what <- word
  _    <- ":" : Parser[String]
  list <- things(n.toInt)
}  yield new ~(n.toInt, new ~(what, list))

This might even give one to think that maybe the claim that "monads are everywhere" might have something to it. :-)

Engorge answered 2/11, 2011 at 17:54 Comment(4)
Thanks, but just one more thing. I still don't understand about new in { list => new ~(n.toInt, new ~(what, list)) } }. I'm still relatively new to scala so I only knew about using new for generate instance from class.Streit
@Streit That is exactly what is happening here. When you parse something with the ~ operator, the result will be an instance of the ~ class. I could have -- arguably, I should have -- returned a tuple instead -- it is easier. However, I wanted to preserve the original return value in the example code.Engorge
@Streit Not true. Scala has functions which are objects. Here, however, the ~ class and Parser's ~ operator have the same name by design, and ~, the operator, returns a Parser[~[T,U]], where T and U are the types of the parsers this operator applies to. However, class and operator could have entirely different names. By the way, this duality operator/class can also be found with ::.Engorge
thanks for the clear explanation. It seems like I am still having much to learn. but yeah, what I really meant in previous comment is "I was mistaken a ~ for normal operator" so it confused me first time.Streit
M
5

The method >> takes a function that is given the result of the parser and uses it to contruct a new parser. As stated, this can be used to parameterize a parser on the result of a previous parser.

Example

The following parser parses a line with n + 1 integer values. The first value n states the number of values to follow. This first integer is parsed and then the result of this parse is used to construct a parser that parses n further integers.

Parser definition

The following line assumes, that you can parse an integer with parseInt: Parser[Int]. It first parses an integer value n and then uses >> to parse n additional integers which form the result of the parser. So the initial n is not returned by the parser (though it's the size of the returned list).

def intLine: Parser[Seq[Int]] = parseInt >> (n => repN(n,parseInt))

Valid inputs

1 42
3 1 2 3
0

Invalid inputs

0 1
1
3 42 42
Myall answered 2/11, 2011 at 11:50 Comment(2)
Isn't that an alias for flatMap ?Prestissimo
@Prestissimo According to the scaladocs, it's an alias for into. According to the sources, >> calls into, which calls flatMap.Myall

© 2022 - 2024 — McMap. All rights reserved.