Alternative for OPTIONAL Keyword in SPARQL-Queries?
Asked Answered
W

1

8

I have a sparql-Query, that asks for certain properties of URIs of a given type. As I am not sure, whether those properties exists, I use the OPTIONAL Keyword:

PREFIX mbo: <http://creativeartefact.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
SELECT * WHERE {
  ?uri a mbo:LiveMusicEvent. 
    OPTIONAL {?uri rdfs:label ?label}. 
    OPTIONAL {?uri mbo:organisedBy ?organiser}. 
    OPTIONAL {?uri mbo:takesPlaceAt ?venue}. 
    OPTIONAL {?uri mbo:begin ?begin}. 
    OPTIONAL {?uri mbo:end ?end}. 
}

When I run this query against my SPARQL-Endpoint (Virtuoso Server), I got the error:

Virtuoso 42000 Error The estimated execution time -721420288 (sec) exceeds the limit of 400 (sec).

When I reduce the OPTIONAL clauses, after the first removed clause the estimated execution time is 4106 seconds, when I remove two clauses, the query is executed (and return the values instantly).

I cannot see, why the estimated execution time is skyrocketing like this with the additional OPTIONAL clauses, but maybe I'm just using a wrong constructed query?

Waterhouse answered 1/9, 2014 at 16:14 Comment(0)
C
10

OPTIONAL patterns are generally expensive to evaluate (compared to "normal" join patterns) for a SPARQL engine. In this case, the error indicates that Virtuoso's query planner estimates the query to be too complex to perform within the set time limit (notice that it estimates this - so the precise value may be wrong).

You have several alternatives. Most of them involve doing more than one query, though. A common pattern is the "retrieve-and-iterate" pattern - you first do a query that retrieves all instances of mbo:LiveMusicEvent:

 SELECT ?uri WHERE { ?uri a mbo:LiveMusicEvent } 

and then you iterate over the result and retrieve each instance's optional properties :

SELECT * 
WHERE { VALUES(?uri) { <http://example.org/instance1> } 
        OPTIONAL {?uri rdfs:label ?label}. 
        OPTIONAL {?uri mbo:organisedBy ?organiser}. 
        OPTIONAL {?uri mbo:takesPlaceAt ?venue}. 
        OPTIONAL {?uri mbo:begin ?begin}. 
        OPTIONAL {?uri mbo:end ?end}. 
}

As you can see I use a VALUES clause to insert the instance id results from the first query into this second query. In this example, I am assuming you iterate one by one and therefore do a query for each instance, but as a further optimization you might tinker with adding more than one instance into the VALUES clause in one go (obviously not all of them at once though, as that would make the query the same complexity as the original one).

By the way, VALUES is a SPARQL 1.1 feature, and I am not certain that Virtuoso supports it. If not, you can achieve the same effect either by using a FILTER clause or by just 'manually' replacing all occurrences of the variable ?uri with the instance id for each iteration.

Another way to handle it is to first do a CONSTRUCT query that retrieves a relevant subset of data from the larger source, and then do your more complex query with optionals on that subset. For example:

 CONSTRUCT 
 WHERE { 
    ?uri a mbo:LiveMusicEvent; 
         ?p ?o . 
 }

will retrieve all data about the LiveMusicEvent instances as an RDF graph. Pop that graph into a local RDF model (e.g. a Sesame Model or in-memory Repository if you're working in Java), and query it further from there.

Carditis answered 1/9, 2014 at 20:6 Comment(7)
One of the things I'd really like (but don't have much idea how expensive it would be) is the ability to do something like from [construct { ... } where { ... }] select ... where { ... }. It would make some difficult tasks very easy.Alcides
@JoshuaTaylor it has always struck me as kind of odd that CONSTRUCT (which, in a way, is the more "natural" query type for RDF) is not easy to combine/chain with other queries. We have subselects - why not sub-constructs? I reckon it's one of those features where the WG just went "Might be nice, but Not Now".Carditis
Yes would be nice to have in future, we actually have an implementation of CONSTRUCT sub-queries internally at YarcData though not in the FROM clause. We use them as a way to invoke more traditional graph analytics (e.g. k-means, shortest path etc) by using them as sub-queries with custom modifiers applied such that can be sanely nested back inside a regular graph patternPikeperch
@Pikeperch That's actually pretty exciting to hear. Do you end up having to put any limitations on what can be constructed? Does the implementation construct queries within construct queries? (Once you have that, you can implement concise bounded descriptions.) And while subqueries are conceptually evaluated first, is there any optimization to not fully evaluate the construct first, but instead to use it as more as "another way to search"?Alcides
Wow, I did not thought that it would be THAT difficult ... thanks for sheding some light! I am just thinking what's the best way to approach this. Maybe make the CONSTRUCT clause into a SELECT clause and then iterating through the result set, picking the needed properties? Are there any backdraws to such a solution? (For the record: I am using RobVs dotNetRdf)Waterhouse
@Aaginor, the point of me recommending CONSTRUCT is that in a CONSTRUCT such as the one above you don't need optional patterns - you simply grab all properties of the thing you're interested in. Since the result is a graph you can easily query on it further. You could do something similar with a SELECT but it would be far more complex as you can not do further (SPARQL) queries on the result but have to pick apart a complex combination of variable bindings instead.Carditis
@JeenBroekstra: Hm, I did not intend to do further SPARQL-Queries. With the select-query, I have all properties (even not needed ones) and then I'd use foreach on the SparqlResultSet. Within, I'd build a switch ... case clause, where I ask for the wanted properties. switch(predicateUri) case "label": ...Waterhouse

© 2022 - 2024 — McMap. All rights reserved.