How to traverse RDF tree in SPARQL and find connections outside of the tree?
Asked Answered
N

1

5

Considering a tree with the root node "A" and "hasChild" relationships (e.g. product structure) as following: tree diagram

Target is to find out: What nodes have parents outside of the tree?

In this case the answer should be ‘B’ and ‘Q’ as they have parents outside of the tree.

The query should go to every node and check its parents rather than creating a list of children and the check every one of them I guess.

How can I efficiently (should work for millions of nodes) traverse this tree an SPARQL and answer this?

This is what i tried but gave 0 results:

PREFIX xxx:         <http://example.org/xxx#>

select * where {
   xxx:A   xxx:hasChild*  ?child .
   ?child  ^xxx:hasChild  ?foreignParent . 
   ?child  ^xxx:hasChild  ?parent .
   FILTER (?parent =! ?foreignParent) .
}

Attached the respective sample data:

<?xml version="1.0"?>
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xxx="http://example.org/xxx#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
  xml:base="http://example.org/xxx">
  <owl:Ontology rdf:about="">
    <owl:versionInfo>Created with TopBraid Composer</owl:versionInfo>
  </owl:Ontology>
  <owl:Class rdf:ID="Other">
    <rdfs:label>Other</rdfs:label>
    <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  </owl:Class>
  <owl:Class rdf:ID="Item">
    <rdfs:label>Item</rdfs:label>
    <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  </owl:Class>
  <rdf:Property rdf:ID="hasChild">
    <rdfs:range rdf:resource="#Item"/>
    <rdfs:range rdf:resource="#Other"/>
    <rdfs:domain rdf:resource="#Item"/>
    <rdfs:label>has child</rdfs:label>
  </rdf:Property>
  <xxx:Other rdf:ID="Fake_1">
    <xxx:hasChild>
      <xxx:Item rdf:ID="B">
        <xxx:hasChild>
          <xxx:Item rdf:ID="D">
            <xxx:hasChild>
              <xxx:Item rdf:ID="F"/>
            </xxx:hasChild>
            <xxx:hasChild>
              <xxx:Item rdf:ID="E"/>
            </xxx:hasChild>
          </xxx:Item>
        </xxx:hasChild>
        <xxx:hasChild>
          <xxx:Item rdf:ID="C"/>
        </xxx:hasChild>
      </xxx:Item>
    </xxx:hasChild>
    <rdfs:label>Fake 1</rdfs:label>
  </xxx:Other>
  <xxx:Other rdf:ID="Fake_2">
    <xxx:hasChild>
      <xxx:Item rdf:ID="Q"/>
    </xxx:hasChild>
    <rdfs:label>Fake 2</rdfs:label>
  </xxx:Other>
  <xxx:Item rdf:ID="A">
    <xxx:hasChild>
      <xxx:Item rdf:ID="G">
        <xxx:hasChild>
          <xxx:Item rdf:ID="X">
            <xxx:hasChild>
              <xxx:Item rdf:ID="Z"/>
            </xxx:hasChild>
            <xxx:hasChild>
              <xxx:Item rdf:ID="Y"/>
            </xxx:hasChild>
          </xxx:Item>
        </xxx:hasChild>
        <xxx:hasChild>
          <xxx:Item rdf:ID="R">
            <xxx:hasChild>
              <xxx:Item rdf:ID="W"/>
            </xxx:hasChild>
            <xxx:hasChild>
              <xxx:Item rdf:ID="S">
                <xxx:hasChild>
                  <xxx:Item rdf:ID="V"/>
                </xxx:hasChild>
                <xxx:hasChild>
                  <xxx:Item rdf:ID="U"/>
                </xxx:hasChild>
                <xxx:hasChild>
                  <xxx:Item rdf:ID="T"/>
                </xxx:hasChild>
              </xxx:Item>
            </xxx:hasChild>
          </xxx:Item>
        </xxx:hasChild>
        <xxx:hasChild>
          <xxx:Item rdf:ID="M">
            <xxx:hasChild rdf:resource="#Q"/>
            <xxx:hasChild>
              <xxx:Item rdf:ID="P"/>
            </xxx:hasChild>
            <xxx:hasChild>
              <xxx:Item rdf:ID="O"/>
            </xxx:hasChild>
            <xxx:hasChild>
              <xxx:Item rdf:ID="N"/>
            </xxx:hasChild>
          </xxx:Item>
        </xxx:hasChild>
        <xxx:hasChild>
          <xxx:Item rdf:ID="H">
            <xxx:hasChild>
              <xxx:Item rdf:ID="L"/>
            </xxx:hasChild>
            <xxx:hasChild>
              <xxx:Item rdf:ID="K"/>
            </xxx:hasChild>
            <xxx:hasChild>
              <xxx:Item rdf:ID="J"/>
            </xxx:hasChild>
            <xxx:hasChild>
              <xxx:Item rdf:ID="I"/>
            </xxx:hasChild>
          </xxx:Item>
        </xxx:hasChild>
      </xxx:Item>
    </xxx:hasChild>
    <xxx:hasChild rdf:resource="#B"/>
  </xxx:Item>
</rdf:RDF>
Nepenthe answered 17/10, 2019 at 21:33 Comment(0)
G
6

The trick is to make sure there is no path from your tree root to your foreign parent node. You can do that by means of a FILTER NOT EXISTS construction, like so:

PREFIX xxx: <http://example.org/xxx#>
SELECT ?child ?foreignParent 
WHERE {
   xxx:A xxx:hasChild+ ?child.
   ?child ^xxx:hasChild ?foreignParent. 
   FILTER NOT EXISTS { xxx:A xxx:hasChild* ?foreignParent }
}

Whether this scales to "millions of nodes" will depend on a) the depth of the tree and b) the triplestore you use. I ran the query using RDF4J on my laptop with the test data you provided, and got this:

Evaluating SPARQL query...
+-------------------------------------+-------------------------------------+
| child                               | foreignParent                       |
+-------------------------------------+-------------------------------------+
| <http://example.org/xxx#B>          | <http://example.org/xxx#Fake_1>     |
| <http://example.org/xxx#Q>          | <http://example.org/xxx#Fake_2>     |
+-------------------------------------+-------------------------------------+
2 result(s) (19 ms)
Gaiser answered 17/10, 2019 at 22:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.