Grammars, Scala Parsing Combinators and Orderless Sets
Asked Answered
D

5

7

I'm writing an application that will take in various "command" strings. I've been looking at the Scala combinator library to tokenize the commands. I find in a lot of cases I want to say: "These tokens are an orderless set, and so they can appear in any order, and some might not appear".

With my current knowledge of grammars I would have to define all combinations of sequences as such (pseudo grammar):

command = action~content
action = alphanum
content  = (tokenA~tokenB~tokenC | tokenB~tokenC~tokenA | tokenC~tokenB~tokenA ....... )

So my question is, considering tokenA-C are unique, is there a shorter way to define a set of any order using a grammar?

Donell answered 23/11, 2009 at 8:3 Comment(0)
L
3

There are ways around it. Take a look at the parser here, for example. It accepts 4 pre-defined numbers, which may appear in any other, but must appear once, and only once.

OTOH, you could write a combinator, if this pattern happens often:

def comb3[A](a: Parser[A], b: Parser[A], c: Parser[A]) =
  a ~ b ~ c | a ~ c ~ b | b ~ a ~ c | b ~ c ~ a | c ~ a ~ b | c ~ b ~ a
Luhey answered 23/11, 2009 at 11:9 Comment(0)
S
4

You can use the "Parser.^?" operator to check a group of parse elements for duplicates.

  def tokens = tokenA | tokenB | tokenC
  def uniqueTokens = (tokens*) ^? (
    { case t if (t == t.removeDuplicates) => t },
    { "duplicate tokens found: " + _ })

Here is an example that allows you to enter any of the four stooges in any order, but fails to parse if a duplicate is encountered:

package blevins.example

import scala.util.parsing.combinator._  

case class Stooge(name: String)

object StoogesParser extends RegexParsers {
  def moe = "Moe".r
  def larry = "Larry".r
  def curly = "Curly".r
  def shemp = "Shemp".r
  def stooge = ( moe | larry | curly | shemp ) ^^ { case s => Stooge(s) }
  def certifiedStooge = stooge | """\w+""".r ^? (
    { case s: Stooge => s },
    { "not a stooge: " + _ })

  def stooges = (certifiedStooge*) ^? (
    { case x if (x == x.removeDuplicates) => x.toSet },
    { "duplicate stooge in: " + _ })

  def parse(s: String): String = {
    parseAll(stooges, new scala.util.parsing.input.CharSequenceReader(s)) match {
      case Success(r,_) => r.mkString(" ")
      case Failure(r,_) => "failure: " + r
      case Error(r,_) => "error: " + r
    }
  }

}

And some example usage:

package blevins.example

object App extends Application {

  def printParse(s: String): Unit = println(StoogesParser.parse(s))

  printParse("Moe Shemp Larry")
  printParse("Moe Shemp Shemp")
  printParse("Curly Beyonce")

  /* Output:
     Stooge(Moe) Stooge(Shemp) Stooge(Larry)
     failure: duplicate stooge in: List(Stooge(Moe), Stooge(Shemp), Stooge(Shemp))
     failure: not a stooge: Beyonce
  */
}
Sain answered 23/11, 2009 at 23:11 Comment(0)
L
3

There are ways around it. Take a look at the parser here, for example. It accepts 4 pre-defined numbers, which may appear in any other, but must appear once, and only once.

OTOH, you could write a combinator, if this pattern happens often:

def comb3[A](a: Parser[A], b: Parser[A], c: Parser[A]) =
  a ~ b ~ c | a ~ c ~ b | b ~ a ~ c | b ~ c ~ a | c ~ a ~ b | c ~ b ~ a
Luhey answered 23/11, 2009 at 11:9 Comment(0)
C
1

I would not try to enforce this requirement syntactically. I'd write a production that admits multiple tokens from the set allowed and then use a non-parsing approach to ascertaining the acceptability of the keywords actually given. In addition to allowing a simpler grammar, it will allow you to more easily continue parsing after emitting a diagnostic about the erroneous usage.

Randall Schulz

Comic answered 23/11, 2009 at 14:40 Comment(0)
M
0

You could of course write a combination rule that does this for you if you encounter this situation frequently.

On the other hand, maybe the option exists to make "tokenA..C" just "token" and then differentiate inside the handler of "token"

Matildematin answered 23/11, 2009 at 8:11 Comment(4)
In this case each token is a json style object property. So a command might look like "todo message:link Todo class to database" due: next tuesday". So the generic rule defined in scala style is something like "token = alphanum~':'~repsep(alphanum, ' '). But I do need to handle specific properties differently.Donell
And you have to make sure, that the same one doesn't occur more than once?Matildematin
Yea that's the plan, some properties are optional, and they should only occur once.Donell
I think that's not doable on a grammar level without specifying it by hand.Matildematin
L
0

I don't know what kind of constructs you want to support, but I gather you should be specifying a more specific grammar. From your comment to another answer:

todo message:link Todo class to database

I guess you don't want to accept something like

todo message:database Todo to link class

So you probably want to define some message-level keywords like "link" and "to"...

def token = alphanum~':'~ "link" ~ alphanum ~ "class" ~ "to" ~ alphanum 
  ^^ { (a:String,b:String,c:String) => /* a == "message", b="Todo", c="database" */ }

I guess you would have to define your grammar at that level.

Litigate answered 23/11, 2009 at 10:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.