we want to model that "something is not there" as opposed to missing information, e.g. an explicit statement that "a patient did not get chemotherapy" or that "a patient does not have dyspnea" is different from missing information about whether a patient has dyspnea.
We thought about several approaches, e.g.
- Using a negation class: "No_Dyspnea". But that seems semantically problematic, since what type would that class be? It cannot be a descendant of the "Dyspnea" class.
- Using "not there" object properties, e.g. "denies" or "does_not_have" and then an individual of the Dyspnea root class as the object of that triple.
Using blank nodes that describe that the individual belongs to the group of things that do not have dyspnea. E.g.:
dat:PatientW2 a [ rdf:type owl:Class; owl:complementOf [ rdf:type owl:Restriction ; owl:onProperty roo:has_finding; owl:someValuesFrom nci:Dyspnea; ] ] .
We feel like the 3rd option is the most "ontologically correct" way of expressing this. However, when playing around with it we encountered severe performance problems in simple scenarios.
We are using Sesame with an OWLIM-Lite store and imported the NCI thesaurus (280MB, about 80,000 concepts) and another very small ontology into the store and added two individuals, one having that complementOf/restriction class.
The following query took forever to execute and I terminated it after 15 minutes:
select *
where {
?s a [ rdf:type owl:Class;
owl:complementOf [
rdf:type owl:Restriction ;
owl:onProperty roo:has_finding;
owl:someValuesFrom nci:Dyspnea;
]
] .
} Limit 100
Does anybody know why? I would assume that this approach creates a lot of blank nodes and the query engine has to go through the entire NCI thesaurus and compare all blank nodes with this one?
If I put this triple in a separate graph and only query that graph, the query returns the result instantaneously.
To sum things up. The two basic questions are:
- Is the third approach really the best for modelling "something is not there"
- Is this going to affect query performance?
EDIT 1
We discussed the proposed options. It actually helped us in clarifying what we are really trying to achieve:
We want to be able to state that "Patient has Dyspnea" or "Patient does not have Dyspnea" at a particular point in time.
In the future there may/will be more information about that patient, e.g. that he/she now has dyspnea.
We want to be able to write Sparql queries that ask for "all patients that have dyspnea" and "all patients that do not have dyspnea".
We want to keep the Sparql as simple and intuitive as possible. E.g. only use one property "has_finding" rather than having to know about two properties (one for "has_exclusion"). Or having to know about some complex blank node construct.
We played around with options:
- Negative Property Assertions: This sounded like the best solution to this problem since we are stating that one individual is not related to another individual on that property. The issues are that we have to create an individual of Dyspnea for the sake of having something as
owl:targetIndividual
. And we cannot find a way of querying the negative assertion easily other then going through the wholeowl:sourceIndividual
andowl:targetIndividual
chain. Which makes the Sparql quite lengthy and puts a burden on the person writing the query to know about it. Blank node with complementOf: We would be stating something with this that we do not want to state. This would state that "Patient1 can never have a finding of dyspnea". Whereas we want to state the "Patient1 does not have a dyspnea finding now (or at date X)". So we should not use this approach.
Using an Exclusion/Inclusion Types (Option 1 and 2): After a closer look a Jeen's suggestion we believe that using general
:Exclusion
and:Inclusion
classes along with only one propertyhas_finding
and giving the dyspnea individual the inclusion/exclusion type is the easiest to understand, query and provides enough reasoning abilities. Example::Patient1 a :Patient . :Dyspnea1 a :Dyspnea . :Dyspnea1 a :Exclusion. :Patient1 ex:has_finding :Dyspnea1 .
That way, the person writing the Sparql query only has to know that:
- There is one property
has_finding
, which represents the intentions properly. Since "No dyspnea" is technically a finding as well. - But just querying using
has_finding
will not give sufficient information about whether the person actually has it or not. The query also needs to contain a triple about whether the dyspnea individual isa :Exclusion
(or inclusion depending on the goal of the query). - While this puts some additional burden on the query writer, it is less than negative property assertions and easier to understand.
We would really appreciate some feedback on these conclusions!
[ owl:sourceIndividual ?s ; owl:assertionProperty ?p ; owl:targetIndividual ?o ]
which can still be a single line. – Whitish:Patient1 :has_finding ?finding . ?finding a :Dyspnea
. That would return those with a positive dyspnea finding as well as those with a confirmed negative finding. – Edrisedrock