Is there any way to optimize SPARQL queries?
Asked Answered
F

1

6

I have unmanaged triples stored as part of individual documents that I am storing in my content db. Essentially each document represent a person, and the defined triple specifies the document URI for the person's manager. I am trying to use SPARQL to determine the length of paths between a manager and all of the people below them in the hierarchy.

The triples in the document look like

<sem:triple xmlns:sem="http://marklogic.com/semantics">
    <sem:subject>http://rdf.abbvienet.com/infrastructure/person/10740024</sem:subject>
    <sem:predicate>http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager</sem:predicate>
    <sem:object>http://rdf.abbvienet.com/infrastructure/person/10206242</sem:object>
</sem:triple>

I have found the following sparql query, which can be used to return a manager, aperson below them in the hierarchy, and the number of nodes distant they are.

select  ?manager ?leaf (count(?mid) as ?distance) { 
  BIND(<http://rdf.abbvienet.com/infrastructure/person/10025613> as ?manager)
  ?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid .
  ?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+ ?manager .
}
group by ?manager ?leaf 
order by ?manager ?leaf

This works, but is very slow, even in the case where the hierarchy tree I am looking at is one or two levels deep, around 15s. I have 63,139 manager triples of this type in the db.

Franconian answered 20/6, 2016 at 11:47 Comment(1)
Shouldn't that be ORDER BY ?leaf as you have only one binding for ?manager.Pyonephritis
B
7

I think the biggest problem is going to be the BIND() - MarkLogic 8 doesn't optimize the pattern you're using at all well. Can you try substituting your constant into the places you use the ?manager variable to see if that makes a big difference? i.e.:

select  ?leaf (count(?mid) as ?distance) { 
  ?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid .
  ?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+
    <http://rdf.abbvienet.com/infrastructure/person/10025613> .
}
group by ?leaf 
order by ?leaf

StackOverflow isn't a great place to answer performance questions like this, as it really needs a conversation where we work together to help you. Maybe you can try contacting support or the MarkLogic developer mailing list for this kind of question?

Baalbeer answered 20/6, 2016 at 13:45 Comment(2)
Without the bind in place, it has a very fast execution. Thanks.Franconian
Just another comment. It is also fast if I set bind parameters in the sem:sparql call. It is only slow when I bind in SPARQL directly.Franconian

© 2022 - 2024 — McMap. All rights reserved.