How to get a concise bounded description of a resource with Sesame?
Asked Answered
P

1

6

I've been testing Sesame 2.7.2 and I got a big surprise when faced to the fact that DESCRIBE queries do not include blank nodes closure [EDIT: the right term for this is CBD for concise bounded description]

If I correctly understand, the SPARQL spec is quite loose on that and says that what is returned is actually up to the provider, but I'm still surprised at the choice, since bnodes (in the results of the describe query) cannot be used in subsequent SPARQL queries.

So the question is: how can I get a closed description of a resource <uri1> without doing:

  1. query DESCRIBE <uri1>
  2. iterate over the result to determine which objects are blank nodes
  3. then DESCRIBE ?b WHERE { <uri1> pred_relating_to_bnode_ ?b }
  4. do it recursively and chaining over as long as bnodes are found

If I'm not mistaken, depth-2 bnodes would have to be described with

DESCRIBE ?b2 WHERE {<uri1> <p1&> ?b . ?b <p2> ?b2 }

unless there is a simpler way to do this?

Finally, would it not be better and simpler to let DESCRIBE return a closed description of a resource where you can still obtain the currently returned result with something like the following?

CONSTRUCT {<uri1> ?p ?o} WHERE {<uri1> ?p ?o}

EDIT: here is an example of a closed result I want to get back from Sesame

<urn:sites#1> a my:WebSite .
<urn:sites#1> my:domainName _:autos1 .
<urn:sites#1> my:online "true"^^xsd:boolean .
_:autos1 a rdf:Alt .
_:autos1 rdf:_1 _:autos2
_:autos2 my:url "192.168.2.111:15001"@fr
_:autos2 my:url "192.168.2.111:15002"@en

Currently: DESCRIBE <urn:sites#1> returns me the same result as the query CONSTRUCT WHERE {<urn:sites#1> ?p ?o}, so I get only that

<urn:sites#1> a my:WebSite .
<urn:sites#1> my:domainName _:autos1 .
<urn:sites#1> my:online "true"^^xsd:boolean .
Pronto answered 28/6, 2013 at 11:1 Comment(7)
This doesn't answer your question, but your last query can be shortened; when the construct pattern is the same as the where, you can omit the former, to get construct where { <uri1> ?p ?o }.Unbodied
Can you show an example of the data that you're looking at, the results you're getting, and the results that you're expecting, or would like? If I understand you, then when you have data like :Alice :likes :Bill, [ :named :Carl ] . you're getting :Alice :likes Bill, [] for results from describe :Alice, but you want the full data. Is this right?Unbodied
Also, can you specify what you mean by closed? In my previous comment, one could argue that it's not closed, since maybe :Bill :likes :Daphne, so we didn't keep following the links from the described resource. Do you have a particular definition in mind?Unbodied
Hi Joshua, thanks for the query syntax tip, it helps ;)Pronto
As for your example, no it would not be closed if the queried dataset contains :Bill :likes :Daphne . By closure, I mean that the result should contains any triple that is directly linked to the resource whether directly or by any arbitrary-length path of blank nodes should be included in the result thus stopping the exploration only to literals or URI reference objects.Pronto
Though it doesn't look like you're going to be able to do this in Sesame, as a question for anyone else finding this question, is the CBD — Concise Bounded Description W3C Member Submission a description of the kind of result you'd wanted?Unbodied
@JoshuaTaylor thanks for the info, I did not know the official term but this is exactly what I meant with my *blank nodes closure" definitionPronto
U
4

Partial solutions using SPARQL

Based on your comments, this isn't an exact solution yet, but note that you can describe multiple things in a given describe query. For instance, given the data:

@prefix : <http://example.org/> .

:Alice :named "Alice" ;
       :likes :Bill, [ :named "Carl" ;
                       :likes [ :named "Daphne" ]].
:Bill :likes :Elaine ;
      :named "Bill" .

you can run the query:

PREFIX : <http://example.org/>

describe :Alice ?object where {
  :Alice :likes* ?object .
  FILTER( isBlank( ?object ) )
}

and get the results:

@prefix :        <http://example.org/> .

:Alice
      :likes        :Bill ;
      :likes        [ :likes        [ :named        "Daphne"
                                    ] ;
                      :named        "Carl"
                    ] ;
      :named        "Alice" .

That's not a complete description of course, because it's only following :likes out from :Alice, not arbitrary predicates. But it does get the blank nodes named "Carl" and "Daphne", which is a start.

The larger issue in Sesame

It looks like you're going to have to do something like what's described above, and possibly with multiple searches, or you're going to have to modify Sesame. The alternative to writing some creative SPARQL is to change the way that Sesame implements describe queries. Some endpoints make this relatively easy, but Sesame doesn't seem to be one of them. There's a mailing list thread from 2011, Custom SPARQL DESCRIBE Implementation, that seems addressed at this same problem.

Roberto García asks:

I'm trying to customise the behaviour of SPARQL DESCRIBE queries. I'm willing to get something similar to CBD (i.e. all properties and values for the described resource plus all properties and values for the blank nodes connected to it).

I have tried to reproduce a similar behaviour using a CONSTRUCT query but the performance is not good and the query gets quite complex if I try to consider long chains of properties pointing to blank nodes starting from the described resource.

Jeen Broekstra replies:

The implementation of DESCRIBE in Sesame is hardcoded in the query parser. It can only be changed by adapting the parser itself, and even then it will be tricky, as the query model has no easy way to express it either: it needs an extension of the algebra.

> If this is not possible, any advice about how to implement it using CONSTRUCT queries?

I'm not sure it's technically possible to do this in a single query. CBDs are recursive in nature, and while SPARQL does have some support for recursivity (property chains), the problem is that you have to do an intermediate check in every step of the property chain to see if the bound value is a blank node or not. This is not something that SPARQL supports out of the box: property chains are defined to have only length of the path as the stop condition.

Perhaps something is possible using a convoluted combination of subqueries, unions and optionals, but I doubt it.

I think the best workaround is instead to use the standard DESCRIBE format that Sesame supports, and for each blank node value in that result do a separate consecutive query. In other words: you solve it by hand.

The only other option is to log a feature request for support of CBDs in Sesame. I can't give any guarantees about if/when that will be followed up on though.

Unbodied answered 28/6, 2013 at 15:6 Comment(12)
I am aware that you can include several resources in the description result but you got to know what you're looking for beforehand. To me, the object of a DESCRIBE query would more be to get whatever you have in the set that relates to the queried resource(s). I am aware of the query cost that could ensue from that but it would seem much more performant if the store handled it than queryiong back and forth until every blank node is resolved. All in all, it would seem to me that it relates to a * propertypath in terms of performance.Pronto
the main problem is to descend the blanked path since you cannot directly query about a blank node in sparql (by the way, I'm working with the sparql endpoint here); so like I said in the question, you would have to expand the previous query every time you encounter a blank node so (on top of network load) you always query the same set adding patterns to get to the next depth and reevaluating the same joins plus one on each iteration. It would be much more efficient to perform closure since you can always get the legacy/equivalent result with the simple CONSTRUCT query.Pronto
@Pronto OK, I did some digging, and it looks like you're not going to do much better than making some iterative SPARQL queries, unfortunately. If you can restrict the properties that you're interested in, you could use a path like (p1|p2|…|pn)* and the isBlank filter above. Depending on your needs, that might be sufficient.Unbodied
Ouchhhh !!! you killed me with Jeen Broekstra's answer ;) It is very disappointing. I looked to externalize query processing and provide serious persistence to lighten my DotNetRdf store usage (DotNetRdf provides closure with its Sparql ipementation) not to add myself some more work... :PPronto
@Pronto Yeah, Jena has DescribeHandlers to make some of this stuff pretty easy; I was surprised to hear that other implementations don't have similar functionality. It's quite handy to be able to, e.g., customize describe handling based on, e.g., the namespace of the described resource, or to dynamically generate responses for special resources (e.g., describe http://example.org/system-status).Unbodied
Yeah Josh, I'm aware of this principle since Rob Vesse pointed me to how to intercept the queries in DotNetRdf to implement user-filtered views. The problem here is that I'm using Sesame as an HTTP endpoint so I cannot intercept anything without degrading response time due to "wasted" network communication... and I really don't need to take a peek in sesame source code (even if I were able to solve the issue, which I really doubt ;P)Pronto
@JoshuaTaylor As Max points out dotNetRDF does support customizable DESCRIBE (we have 7 implementations in fact - dotnetrdf.org/api/index.asp?Namespace=VDS.RDF.Query.Describe - plus ability to add more). Problem is these only work for our in-memory implementation and not over external stores.Jetty
@Pronto Might I suggest filing a bug against Sesame (or +1'ing an existing bug for this) since this seems like a major omission in SesameJetty
@Jetty I was doing just that when my connection died :P I was going to file it this morning but thought I would check here before... Filed as SES-1876Pronto
FWIW, working on it. No promises on when this will be fixed (unless you pay me a lot of cash ;))Estefanaestel
@JeenBroekstra Sorry but I've got nothing to offer but my meager rusty left-overs of Java skills and, anyway, my gratitude if you pull through this !! ;))Pronto
Since Sesame release 2.7.4, the Sesame SPARQL engine uses Symmetric Concise Bounded Descriptions (S-CBD) as the default result format for DESCRIBE queries. Customizing the precise format is still on the ToDo list though.Estefanaestel

© 2022 - 2024 — McMap. All rights reserved.