Performing complicated XPath queries in Scala

T

5

13

What's the simplest API to use in scala to perform the following XPath queries on a document?

//s:Annotation[@type='attitude']/s:Content/s:Parameter[@role='type' and not(text())]

//s:Annotation[s:Content/s:Parameter[@role='id' and not(text())]]/@type

(s is defined as a nickname for a particular namespace)

The only documentation I can find on Scala's XML libraries has no information on performing complicated real XPath queries.

I used to like JDOM for this purpose (in Java), but since JDOM doesn't support generics, it will be painful to work with in Scala. (Other XML libraries for Java have tended to be even more painful in Java, but I admit I don't know the landscape real well.)

Tristantristas answered 16/6, 2010 at 19:1 Comment(2)

What does s:... means? I assume it is related to namespaces, but I couldn't find that in the XPath specification. – Shaver 16/6, 2010 at 21:58

Yeah, it's a namespace prefix. See the second-to-last paragraph of the introduction where it says "In the following grammar, the non-terminals QName and NCName are defined in [XML Names], and S is defined in [XML]." The expression s:Annotation is a QName. – Tristantristas 16/6, 2010 at 23:48

T

3

I think I'm going to go with lightly pimping XOM. It's a bit of a shame the XOM authors decided against exposing collections of child nodes and the like, but they had more work and less advantage to doing so in Java than in Scala. (And it is an otherwise well-designed library.)

EDIT: I wound up pimping JDOM after all, because XOM doesn't compile XPath queries ahead of time. Since most of my effort was directed towards XPath this time, I was able to come up with a good model that sidesteps most of the generics issues. It shouldn't be too hard to come up with reasonable genericized versions of the methods getChildren and getAttributes and getAdditionalNamespaces in org.jdom.Element (by pimping the library with new methods that have slightly changed names.) I don't think there's a fix for getContent, and I'm not sure about getDescendants.

Tristantristas answered 17/6, 2010 at 14:23 Comment(1)

I posted the full code of my JDOM wrapper at #4228649 – Tristantristas 19/11, 2010 at 19:59

S

12

//s:Annotation[@type='attitude']/s:Content/s:Parameter[@role='type' and not(text())]

Well, I don't understand the s: notation, and couldn't find it on XPath spec either. However, ignoring that this would look like this:

(
  (xml 
    \\ "Annotation" 
    filter (_ \ "@type" contains Text("x"))
  ) 
  \ "Content" 
  \ "Parameter" 
  filter (el => (el \ "@type" contains Text("type")) && el.isInstanceOf[Text])
)

Note the necessity of parenthesis because of higher precedence of \ over filter. I have changed the formatting to a multi-line expression as the Scala equivalent is just way too verbose for a single line.

I can't answer about namespaces, though. No clue how to work with them on searches, if it's even possible. The docs mention @{uri}attribute for prefixed attributes, not does not mention anything about prefixed elements. Also, note that you need to pass an uri which resolves to the namespace you want, as literal namespaces in search are not supported.

Shaver answered 16/6, 2010 at 22:14 Comment(5)

well, that's ugly but at least it's doable. – Tristantristas 16/6, 2010 at 23:50

@Ken All of Java's libraries are available... I do think it's a shame not to have better XPath support. – Shaver 17/6, 2010 at 4:5

instead of: (xml \\ "Annotation" filter (_ \ "@type" contains Text("x"))) I would use: (xml \\ "Annotation" filter (x => (x \ "@type").text == "x")) – Cephalochordate 29/6, 2011 at 16:20

@David You can have more than one value for an attribute, so your alternative is not strictly equivalent. Either yours or mine will be correct -- it depends on what exactly the semantics for the attribute is. – Shaver 29/6, 2011 at 16:22

The s:-notation means: element in the namespace bound to the s prefix... This is definitely in the XPath spec ;) – Parrotfish 11/9, 2015 at 14:14

T

3

I think I'm going to go with lightly pimping XOM. It's a bit of a shame the XOM authors decided against exposing collections of child nodes and the like, but they had more work and less advantage to doing so in Java than in Scala. (And it is an otherwise well-designed library.)

EDIT: I wound up pimping JDOM after all, because XOM doesn't compile XPath queries ahead of time. Since most of my effort was directed towards XPath this time, I was able to come up with a good model that sidesteps most of the generics issues. It shouldn't be too hard to come up with reasonable genericized versions of the methods getChildren and getAttributes and getAdditionalNamespaces in org.jdom.Element (by pimping the library with new methods that have slightly changed names.) I don't think there's a fix for getContent, and I'm not sure about getDescendants.

Tristantristas answered 17/6, 2010 at 14:23 Comment(1)

I posted the full code of my JDOM wrapper at #4228649 – Tristantristas 19/11, 2010 at 19:59

A

3

Scales Xml adds both string based full XPath evaluation and an internal DSL providing a fairly complete coverage for querying

Amphipod answered 4/11, 2011 at 22:31 Comment(2)

Couldn't find the "string based full XPath" example, am I missing something? – Balladry 13/8, 2013 at 19:55

Please see scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.2/… for example code – Amphipod 30/9, 2013 at 19:36

T

1

I guess when scalaxmljaxen is mature, we'll be able to do this reliably on scala's built-in XML classes.

Tristantristas answered 21/6, 2010 at 2:40 Comment(0)

D

0

I would suggest using kantan.xpath:

 import kantan.xpath._
 import kantan.xpath.implicits._

 input.evalXPath[List[String]](xp"/annotation[@type='attitude']/content/parameter[@role='type' and not(text())]/@value")

This yields:

res1: kantan.xpath.XPathResult[List[String]] = Success(List(foobar))

Dive answered 26/10, 2017 at 14:36 Comment(0)

Recommended topics

Hot tags