How to get a path between IRIs or between two nodes of certain rdf:type using a SPARQL query?
Asked Answered
F

1

1

Trying to execute a query using rdf4j console against a sparql endpoint to find the path between 2 nodes using property wildcards but no luck. The first query gives an error as

Malformed query: Not a valid (absolute) IRI:

The second query crashes the console. Should I try to use the query using a different way to query the endpoint as this maybe an rdf4j issue or is the query itself wrong?

PREFIX xy: <http://mainuri/>

select
*

where

{

  <http://uriOfInstanceOfData> ((<>|!<>)|^(<>|!<>))* ?x .
  ?x ?p ?o .
  ?o ((<>|!<>)|^(<>|!<>))* <http://uriOfInstanceOfData>.
  }

AND

PREFIX xy: <http://mainuri/>

select
*

where

{

  <http://uriOfInstanceOfData> (xy:|!xy:)* ?x .
  ?x ?p ?o .
  ?o (xy:|!xy:)* <http://uriOfInstanceOfData>.
  }
Ferrick answered 14/10, 2019 at 21:30 Comment(8)
When you say it 'crashes the console', what precisely happens? Does it freeze? Does it exit? Do you get any errors? On what kind of dataset are you executing this?Hamby
It shows errors and then exits. Exits before one can catch a glimpse of what the errors are. The dataset is wikidata.Ferrick
Fwiw I think you're barking up the wrong tree with the approach. SPARQL is notoriously not fit-for-purpose for doing these kinds of graph traversal queries, and you're unlikely to get something that will work well on a dataset as large as Wikidata. You're probably better off using some iterative approach using, for example, the RDF4J Java API.Hamby
Out of curiosity though: can you share how you connected your RDF4J console with Wikidata (created a SPARQL endpoint using create sparql, or something else?), and what the exact query (including the actual IRIs for the nodes you're querying for) are? Just so I can reproduce.Hamby
The RDF4J repo is hooked to another endpoint, can't really talk about that. For testing tried out the query on Wikidata using WDQS, SELECT * WHERE{ wd:Q11571 (wdt:|!wdt:)* ?x . ?x ?p ?o . ?o (wdt:|!wdt:)* wd:Q45. } Could you point to some resources for doing path traversals with rdf4j api? ThanksFerrick
I tried your query locally (querying the wikidata endpoint directly) and it doesn't crash the console, but it reports a timeout error at the server end. Evidently, the query is too expensive for the Wikidata endpoint to evaluate within a reasonable time limit. As for documentation on the RDF4J APIs, the official project docs are a good starting point, in particular the getting started tutorials, and the sections on the Model API and the Repository API.Hamby
I agree with @JeenBroekstra and I can tell from my experience with DBpedia, Wikidata and other larger datasets and many triple stores, those kind of path retrieval queries with the property path wildcard pattern will never scale on large datasets. Clearly, those kind of queries are beyond what SPARQL was made for and is more what Gremlin, Cypher, etc. are designed for. Those graph database usually have optimized datastructures and indexes. Moreover, Wikidata public endpoint as well as others like DBpedia have some restrictions on the queries enabled to allowed fair use among all users.Militate
Thanks for the input guys! Pretty new to the graph db and specially the RDF world. Plus trying to access and manipulate this all programmatically which is equating to a pretty steep learning curve. Hopefully, the RDF4J Api provides the functionality I am looking for.Ferrick
H
2

The first query is syntactically incorrect: <> is not a valid IRI reference. The SPARQL grammar allows the empty string, but the specification also notes that any IRI reference must be a string that (after escape processing results) in a valid RFC3987 IRI. Since an IRI requires, at a mimimum, a scheme identifier, an empty string can by definition not be a valid IRI.

The second query works when I try it on a small test dataset. However it is likely very expensive to process.

EDIT the query I actually tried:

PREFIX xy: <http://mainuri/>
select
*
where
{
  rdfs:domain (xy:|!xy:)* ?x .
  ?x ?p ?o .
  ?o (xy:|!xy:)* rdf:Property.
}

On a local in-memory database with basic RDFS inferencing enabled, that gives the following result:

Evaluating SPARQL query...
+------------------------+------------------------+------------------------+
| x                      | p                      | o                      |
+------------------------+------------------------+------------------------+
| rdfs:domain            | rdf:type               | rdf:Property           |
| rdfs:domain            | rdfs:domain            | rdf:Property           |
+------------------------+------------------------+------------------------+
2 result(s) (28 ms)
Hamby answered 14/10, 2019 at 23:28 Comment(1)
I tried out yours, just with <http://uriOfInstanceOfData> replaced with a IRI that actually exists in my local test database.Hamby

© 2022 - 2025 — McMap. All rights reserved.