How should I specify the type of JSON-like unstructured data in Scala?
Asked Answered
S

3

10

I'm considering porting a very simple text-templating library to scala, mostly as an exercise in learning the language. The library is currently implemented in both Python and Javascript, and its basic operation more or less boils down to this (in python):

template = CompiledTemplate('Text {spam} blah {eggs[1]}')
data = { 'spam': 1, 'eggs': [ 'first', 'second', { 'key': 'value' }, true ] }
output = template.render(data)

None of this is terribly difficult to do in Scala, but the thing I'm unclear about is how to best express the static type of the data parameter.

Basically this parameter should be able to contain the sorts of things you'd find in JSON: a few primitives (strings, ints, booleans, null), or lists of zero or more items, or maps of zero or more items. (For the purposes of this question the maps can be constrained to having string keys, which seems to be how Scala likes things anyways.)

My initial thought was just to use a Map[string, Any] as a top-level object, but that's doesn't seem entirely correct to me. In fact I don't want to add arbitrary objects of any sort of class in there; I want only the elements I outlined above. At the same time, I think in Java the closest I'd really be able to get would be Map<String, ?>, and I know one of the Scala authors designed Java's generics.

One thing I'm particularly curious about is how other functional languages with similar type systems handle this sort of problem. I have a feeling that what I really want to do here is come up with a set of case classes that I can pattern-match on, but I'm not quite able to envision how that would look.

I have Programming in Scala, but to be honest my eyes started glazing over a bit at the covariance / contravariance stuff and I'm hoping somebody can explain this to me a bit more clearly and succinctly.

Stewartstewed answered 8/4, 2009 at 7:22 Comment(0)
E
15

You're spot on that you want some sort of case classes to model your datatypes. In functional languages these sorts of things are called "Abstract Data Types", and you can read all about how Haskell uses them by Googling around a bit. Scala's equivalent of Haskell's ADTs uses sealed traits and case classes.

Let's look at a rewrite of the JSON parser combinator from the Scala standard library or the Programming in Scala book. Instead of using Map[String, Any] to represent JSON objects, and instead of using Any to represent arbitrary JSON values, it uses an abstract data type, JsValue, to represnt JSON values. JsValue has several subtypes, representing the possible kinds of JSON values: JsString, JsNumber, JsObject, JsArray, JsBoolean (JsTrue, JsFalse), and JsNull.

Manipulating JSON data of this form involves pattern matching. Since the JsValue is sealed, the compiler will warn you if you haven't dealt with all the cases. For example, the code for toJson, a method that takes a JsValue and returns a String representation of that values, looks like this:

  def toJson(x: JsValue): String = x match {
    case JsNull => "null"
    case JsBoolean(b) => b.toString
    case JsString(s) => "\"" + s + "\""
    case JsNumber(n) => n.toString
    case JsArray(xs) => xs.map(toJson).mkString("[",", ","]")
    case JsObject(m) => m.map{case (key, value) => toJson(key) + " : " + toJson(value)}.mkString("{",", ","}")
  }

Pattern matching both lets us make sure we're dealing with every case, and also "unwraps" the underlying value from its JsType. It provides a type-safe way of knowing that we've handled every case.

Furthermore, if you know at compile-time the structure of the JSON data you're dealing with, you can do something really cool like n8han's extractors. Very powerful stuff, check it out.

Earthstar answered 10/4, 2009 at 7:18 Comment(2)
Thanks, that was just what I was looking for. What is the source of the rewritten JSON parser you linked to on pastebin? (I noticed that the built-in parser in Scala's libs just uses Map[String, Any].)Stewartstewed
I wrote the parser linked to on pastebin. I've been meaning to make a full-fledged project out of it, but haven't found the time.Earthstar
S
1

Well, there are a couple ways to approach this. I would probably just use Map[String, Any], which should work just fine for your purposes (as long as the map is from collection.immutable rather than collection.mutable). However, if you really want to go through some pain, it is possible to give a type for this:

sealed trait InnerData[+A] {
  val value: A
}

case class InnerString(value: String) extends InnerData[String]
case class InnerMap[A, +B](value: Map[A, B]) extends InnerData[Map[A, B]]
case class InnerBoolean(value: Boolean) extends InnerData[Boolean]

Now, assuming that you were reading the JSON data field into a Scala field named jsData, you would give that field the following type:

val jsData: Map[String, Either[Int, InnerData[_]]

Every time you pull a field out of jsData, you would need to pattern match, checking whether the value was of type Left[Int] or Right[InnerData[_]] (the two sub-types of Either[Int, InnerData[_]]). Once you have the inner data, you would then pattern match on that to determine whether it represents an InnerString, InnerMap or InnerBoolean.

Technically, you have to do this sort of pattern matching anyway in order to use the data once you pull it out of JSON. The advantage to the well-typed approach is the compiler will check you to ensure that you haven't missed any possibilities. The disadvantage is that you can't just skip impossibilities (like 'eggs' mapping to an Int). Also, there is some overhead imposed by all of these wrapper objects, so watch out for that.

Note that Scala does allow you to define a type alias which should cut down on the amount of LoC required for this:

type DataType[A] = Map[String, Either[Int, InnerData[A]]]

val jsData: DataType[_]

Add a few implicit conversions to make the API pretty, and you should be all nice and dandy.

Serdab answered 8/4, 2009 at 17:22 Comment(2)
Can you elaborate a bit on this Either[Int, InnerData[A]] type? I don't understand why it's Int, for one thing, since the primitives I want are from the set (Int, String, Boolean, null). Thanks!Stewartstewed
Either in Java :-) ibm.com/developerworks/java/library/j-ft13/index.htmlAcreage
B
1

JSON is used as an example in "Programming in Scala", in the chapter on combinator parsing.

Backstairs answered 8/4, 2009 at 17:26 Comment(1)
I did see that section, but the resulting datatypes are a sort of parse tree that comes from parsing a JSON string, and I'm not necessarily going to have a literal JSON string to parse; code which calls render() might have arbitrary data that it's assembled from some other source.Stewartstewed

© 2022 - 2024 — McMap. All rights reserved.