Ontologies, OWL, Sparql: Modelling that "something is not there" and performance considerations

E

2

7

we want to model that "something is not there" as opposed to missing information, e.g. an explicit statement that "a patient did not get chemotherapy" or that "a patient does not have dyspnea" is different from missing information about whether a patient has dyspnea.

We thought about several approaches, e.g.

Using a negation class: "No_Dyspnea". But that seems semantically problematic, since what type would that class be? It cannot be a descendant of the "Dyspnea" class.
Using "not there" object properties, e.g. "denies" or "does_not_have" and then an individual of the Dyspnea root class as the object of that triple.

Using blank nodes that describe that the individual belongs to the group of things that do not have dyspnea. E.g.:

dat:PatientW2 a [ rdf:type owl:Class;
              owl:complementOf [
                rdf:type owl:Restriction ;
                    owl:onProperty roo:has_finding;
                    owl:someValuesFrom nci:Dyspnea;
              ]
            ] .

We feel like the 3rd option is the most "ontologically correct" way of expressing this. However, when playing around with it we encountered severe performance problems in simple scenarios.

We are using Sesame with an OWLIM-Lite store and imported the NCI thesaurus (280MB, about 80,000 concepts) and another very small ontology into the store and added two individuals, one having that complementOf/restriction class.

The following query took forever to execute and I terminated it after 15 minutes:

select *
where {
  ?s a [ rdf:type owl:Class;
                      owl:complementOf [
                        rdf:type owl:Restriction ;
                        owl:onProperty roo:has_finding;
                        owl:someValuesFrom nci:Dyspnea;
                  ]
                ] .
} Limit 100

Does anybody know why? I would assume that this approach creates a lot of blank nodes and the query engine has to go through the entire NCI thesaurus and compare all blank nodes with this one?

If I put this triple in a separate graph and only query that graph, the query returns the result instantaneously.

To sum things up. The two basic questions are:

Is the third approach really the best for modelling "something is not there"
Is this going to affect query performance?

EDIT 1

We discussed the proposed options. It actually helped us in clarifying what we are really trying to achieve:

We want to be able to state that "Patient has Dyspnea" or "Patient does not have Dyspnea" at a particular point in time.
In the future there may/will be more information about that patient, e.g. that he/she now has dyspnea.
We want to be able to write Sparql queries that ask for "all patients that have dyspnea" and "all patients that do not have dyspnea".
We want to keep the Sparql as simple and intuitive as possible. E.g. only use one property "has_finding" rather than having to know about two properties (one for "has_exclusion"). Or having to know about some complex blank node construct.

We played around with options:

Negative Property Assertions: This sounded like the best solution to this problem since we are stating that one individual is not related to another individual on that property. The issues are that we have to create an individual of Dyspnea for the sake of having something as owl:targetIndividual. And we cannot find a way of querying the negative assertion easily other then going through the whole owl:sourceIndividual and owl:targetIndividual chain. Which makes the Sparql quite lengthy and puts a burden on the person writing the query to know about it.
Blank node with complementOf: We would be stating something with this that we do not want to state. This would state that "Patient1 can never have a finding of dyspnea". Whereas we want to state the "Patient1 does not have a dyspnea finding now (or at date X)". So we should not use this approach.
Using an Exclusion/Inclusion Types (Option 1 and 2): After a closer look a Jeen's suggestion we believe that using general :Exclusion and :Inclusion classes along with only one property has_finding and giving the dyspnea individual the inclusion/exclusion type is the easiest to understand, query and provides enough reasoning abilities. Example:

:Patient1 a :Patient . :Dyspnea1 a :Dyspnea . :Dyspnea1 a :Exclusion. :Patient1 ex:has_finding :Dyspnea1 .

That way, the person writing the Sparql query only has to know that:

There is one property has_finding, which represents the intentions properly. Since "No dyspnea" is technically a finding as well.
But just querying using has_finding will not give sufficient information about whether the person actually has it or not. The query also needs to contain a triple about whether the dyspnea individual is a :Exclusion (or inclusion depending on the goal of the query).
While this puts some additional burden on the query writer, it is less than negative property assertions and easier to understand.

We would really appreciate some feedback on these conclusions!

Edrisedrock answered 16/6, 2015 at 20:52 Comment(2)

Can you give an example of the SPARQL query you wrote with the negative object property assertions? I agree that it will be more complicated than ?s ?p ?o, but it shouldn't be all that lengthy... Asking whether ~(s,p,o) should just be [ owl:sourceIndividual ?s ; owl:assertionProperty ?p ; owl:targetIndividual ?o ] which can still be a single line. – Whitish 17/6, 2015 at 19:9

@JoshuaTaylor The query would look exactly like you described. 3 extra lines. But that would only give me the patients without dyspnea. What if I want the patients with dyspnea? Or both sets? Querying for "patients without dyspnea" is very different from querying for "patients with dyspnea" or both sets in the result. Ideally I want to use the same property for getting to both. E.g. :Patient1 :has_finding ?finding . ?finding a :Dyspnea . That would return those with a positive dyspnea finding as well as those with a confirmed negative finding. – Edrisedrock 17/6, 2015 at 21:25

F

2

With respect to the modeling question, I'd like to offer a fourth alternative, which is, in fact, a mix of your options 1 and 2: introduce a separate class (hierarchy) for these 'excluded/missing' symptoms, diseases or treatments, and have the specific exclusions as instances:

 :Exclusion a owl:Class .
 :ExcludedSymptom rdfs:subClassOf :Exclusion .
 :ExcludedTreatment rdfs:subClassOf :Exclusion .

 :excludedDyspnea a :ExcludedSymptom .
 :excludedChemo a :ExcludedTreatment .

 :Patient a owl:Class ;
          owl:equivalentClass [ a owl:Restriction ;
                                owl:onProperty :excluded ;
                                owl:allValuesFrom :Exclusion ] .

 // john is a patient without Dyspnea 
 :john a :Patient ;
       :excluded :excludedDyspnea .

Optionally, you can link the exclusion instances semantically with the treatment/symptom/diseases:

  :excludedDyspnea :ofSymptom :Dyspnea .

In my view, this is just as "ontologically correct" (this kind of thing is quite subjective to be honest) as your other options, and possibly a lot easier to maintain, query, and indeed reason with.

As for your second question: while I can't speak for the behavior of the particular reasoner you're using, in general any construction involving complementOf is computationally very heavy, but perhaps more importantly, it probably does not capture what you intend.

OWL has an open world assumption, which (in broad terms) means that we cannot decide a certain fact is untrue simply because that fact is currently unknown. Your complementOf construction will logically be an empty class, because for any individual X, even if we currently do not know that X has been diagnosed with Dyspnea, there is a possibility that in the future that fact will become known, and therefore X will not be in the complement class.

EDIT

In response to your edit, with the proposal using a single :hasFinding property, I think that generally looks good, though I would perhaps modify it slightly:

   :patient1 a :Patient;
             :hasFinding :dyspneaFinding1 .

   :dyspneaFinding1 a :Finding ;
                    :of :Dyspnea ;
                    :conclusion false .

You have now separated the 'finding' as a concept a bit more cleanly from the symptom/treatment that it is a finding of. Also, whether or not the finding is positive or negative is explicitly modeled (rather than implied by the presence/absense of an 'excluded' property or a 'Exclusion' type).

(As an aside: since we link an individual with a class here via a non-typing relation (... :of :Dyspnea) we must rely on OWL 2 punning to make this valid in OWL DL)

To query for a patient with a finding (whether positive or negative) about Dyspnea:

 SELECT ?x 
 WHERE {
    ?x a :Patient; 
       :hasFinding [ :of :Dyspnea ] .
 }

And to query for patients with confirmed absense of Dyspnea:

 SELECT ?x 
 WHERE {
    ?x a :Patient; 
       :hasFinding [ :of :Dyspnea ;
                     :conclusion false ] .
 }

Fir answered 16/6, 2015 at 22:2 Comment(6)

Hi Jeen, thanks for help! That is an interesting approach and I believe we would definitely need the last statement that :excludedDyspnea :ofSymptom :Dyspnea. Otherwise we could not tell what excluded dyspnea means clinically. I am going to play around with this approach a bit. – Edrisedrock 17/6, 2015 at 0:12

@Wolfgang, no problem. By the way, instead of thanking people in a comment, it's more useful (and more appreciated) if instead you vote on their answers (and if something has definitely answered your question, accept it). See stackoverflow.com/help/someone-answers for more information. – Fir 17/6, 2015 at 3:38

I added some more detail and discussion of your proposed solutions to the question. It would be great if you could comment briefly on the conclusions! – Edrisedrock 17/6, 2015 at 19:2

You are suggesting something that we actually started with in the beginning. We had a :was_observed property. But we decided to move away from that since it is not extensible to more categories, e.g. "most likely not". We do however like your first suggestion of using a :Exclusion class and assigning it as an additional type to the dyspnea individual. I do not understand why the :conclusion property would be more explicit than the type. In both cases we assert an additional triple. And when querying we have to know that we either need to check the conclusion or the type. – Edrisedrock 17/6, 2015 at 23:53

Fair enough. As said: it's all rather subjective and since you know your own use case best, you're in the better position to judge what works for you. Of course, if the objection is simply that true/false is too simplistic, there is nothing to stop you defining a larger range of possible values for the :conclusion property. – Fir 17/6, 2015 at 23:55

Thanks for all your help! I accepted your answer. Your initial suggestion fits our use case best. Even though Joshua's suggestion of using negative property assertions works as well. That is the nice and frustrating thing about ontologies: There often is no simple right or wrong. – Edrisedrock 18/6, 2015 at 0:0

W

3

If your diseases are represented as individuals, then you can use negative object property assertions to literally say, e.g.,

¬hasFinding(john,Dyspnea)

NegativeObjectPropertyAssertion(hasFinding john Dyspnea)

Of course, if you have lots of things that aren't the case, then this might get a bit involved. It's probably the most semantically correct, though. It also means that your query could match directly against the data in the ontology, which might make for quicker results. (Of course, you'd still have the issues of trying to infer when the negative object property holds.)

This doesn't work if diseases are represented as classes, though. If diseases are represented by classes, then you can use class expressions, similar to what you propose. E.g.,

(∀ hasFinding.¬Dyspnea)(john)

ClassAssertion(ObjectAllValuesFrom(hasFinding ObjectComplementOf(Dyspnea)) john)

This is similar to your third option, but I wonder if it might perform better. It seems like a slightly more direct way of saying what you're trying to say (i.e., if someone has a disease, it's not one of these diseases).

I do agree with Jeen's answer, though; there's a lot of subjectivity here, and a great deal of getting it "right" is actually just a matter of finding something that's reasonable to work with, performs well enough for you, and that seems not entirely unnatural.

Whitish answered 16/6, 2015 at 22:7 Comment(4)

I was not aware of the NegativeObjectPropertyAssertion construct (I stopped reading after OWL 1 :) ). Seems useful in this case. – Fir 16/6, 2015 at 22:15

@JeenBroekstra It's useful, but ~p(a,b) is semantically equivalent to (p only not {b})(a), so it's just syntactic sugar in a sense. It's a bit easier to query with SPARQL, though. – Whitish 16/6, 2015 at 22:18

Hi Joshua, thanks for your comments! The negative property assertion does sound very interesting. I am going to take a look whether I can use it. Regarding your second example, I am not that familiar with the () notation. Did you just remove the owl:Restriction? Also in terms of your "translation into a sentence": We want to express that this patient does not have Dyspnea at that particular point in time. Not that if the patient has a finding, it cannot be Dyspnea. Because in the future, the patient could have a dyspnea finding. Just on a different date. – Edrisedrock 17/6, 2015 at 0:16

@JoshuaTaylor, I added some details to the question, discussing your proposed solutions. It would be great if you could share your thoughts on that. – Edrisedrock 17/6, 2015 at 19:7

F

2

With respect to the modeling question, I'd like to offer a fourth alternative, which is, in fact, a mix of your options 1 and 2: introduce a separate class (hierarchy) for these 'excluded/missing' symptoms, diseases or treatments, and have the specific exclusions as instances:

 :Exclusion a owl:Class .
 :ExcludedSymptom rdfs:subClassOf :Exclusion .
 :ExcludedTreatment rdfs:subClassOf :Exclusion .

 :excludedDyspnea a :ExcludedSymptom .
 :excludedChemo a :ExcludedTreatment .

 :Patient a owl:Class ;
          owl:equivalentClass [ a owl:Restriction ;
                                owl:onProperty :excluded ;
                                owl:allValuesFrom :Exclusion ] .

 // john is a patient without Dyspnea 
 :john a :Patient ;
       :excluded :excludedDyspnea .

Optionally, you can link the exclusion instances semantically with the treatment/symptom/diseases:

  :excludedDyspnea :ofSymptom :Dyspnea .

In my view, this is just as "ontologically correct" (this kind of thing is quite subjective to be honest) as your other options, and possibly a lot easier to maintain, query, and indeed reason with.

As for your second question: while I can't speak for the behavior of the particular reasoner you're using, in general any construction involving complementOf is computationally very heavy, but perhaps more importantly, it probably does not capture what you intend.

OWL has an open world assumption, which (in broad terms) means that we cannot decide a certain fact is untrue simply because that fact is currently unknown. Your complementOf construction will logically be an empty class, because for any individual X, even if we currently do not know that X has been diagnosed with Dyspnea, there is a possibility that in the future that fact will become known, and therefore X will not be in the complement class.

EDIT

In response to your edit, with the proposal using a single :hasFinding property, I think that generally looks good, though I would perhaps modify it slightly:

   :patient1 a :Patient;
             :hasFinding :dyspneaFinding1 .

   :dyspneaFinding1 a :Finding ;
                    :of :Dyspnea ;
                    :conclusion false .

You have now separated the 'finding' as a concept a bit more cleanly from the symptom/treatment that it is a finding of. Also, whether or not the finding is positive or negative is explicitly modeled (rather than implied by the presence/absense of an 'excluded' property or a 'Exclusion' type).

(As an aside: since we link an individual with a class here via a non-typing relation (... :of :Dyspnea) we must rely on OWL 2 punning to make this valid in OWL DL)

To query for a patient with a finding (whether positive or negative) about Dyspnea:

 SELECT ?x 
 WHERE {
    ?x a :Patient; 
       :hasFinding [ :of :Dyspnea ] .
 }

And to query for patients with confirmed absense of Dyspnea:

 SELECT ?x 
 WHERE {
    ?x a :Patient; 
       :hasFinding [ :of :Dyspnea ;
                     :conclusion false ] .
 }

Fir answered 16/6, 2015 at 22:2 Comment(6)

Hi Jeen, thanks for help! That is an interesting approach and I believe we would definitely need the last statement that :excludedDyspnea :ofSymptom :Dyspnea. Otherwise we could not tell what excluded dyspnea means clinically. I am going to play around with this approach a bit. – Edrisedrock 17/6, 2015 at 0:12

@Wolfgang, no problem. By the way, instead of thanking people in a comment, it's more useful (and more appreciated) if instead you vote on their answers (and if something has definitely answered your question, accept it). See stackoverflow.com/help/someone-answers for more information. – Fir 17/6, 2015 at 3:38

I added some more detail and discussion of your proposed solutions to the question. It would be great if you could comment briefly on the conclusions! – Edrisedrock 17/6, 2015 at 19:2

You are suggesting something that we actually started with in the beginning. We had a :was_observed property. But we decided to move away from that since it is not extensible to more categories, e.g. "most likely not". We do however like your first suggestion of using a :Exclusion class and assigning it as an additional type to the dyspnea individual. I do not understand why the :conclusion property would be more explicit than the type. In both cases we assert an additional triple. And when querying we have to know that we either need to check the conclusion or the type. – Edrisedrock 17/6, 2015 at 23:53

Fair enough. As said: it's all rather subjective and since you know your own use case best, you're in the better position to judge what works for you. Of course, if the objection is simply that true/false is too simplistic, there is nothing to stop you defining a larger range of possible values for the :conclusion property. – Fir 17/6, 2015 at 23:55

Thanks for all your help! I accepted your answer. Your initial suggestion fits our use case best. Even though Joshua's suggestion of using negative property assertions works as well. That is the nice and frustrating thing about ontologies: There often is no simple right or wrong. – Edrisedrock 18/6, 2015 at 0:0

Recommended topics

Hot tags