How to parse and load an ontology in Python?
Asked Answered
A

1

6

I have an ontology in an 'owl' file (nif.owl). I am familiar with Java, but it kept crashing; therefore, I tried using Python. However, since I have not used Python before, I am not sure if I am loading the ontology correctly!

Here is the part that I believe is related to loading the ontology:

g = rdflib.Graph()
g.parse ('nif.owl', format='xml')
nif = rdflib.Namespace('http://purl.org/nif/ontology/nif.owl')
g.bind('nif', nif)

I believe the problem is where the g.parse sets the format to 'xml'. I think maybe the 'xml' is wrong.

I have also attached the header of the ontology file as an image.

enter image description here

The reason I think there is a mistake with the code is the result I get which I showed in the image below: enter image description here

Thanks!

PS: Below is the full code in case there is something wrong with it:

import logging
import rdflib
import time

logging.basicConfig()
logger = logging.getLogger('logger')
logger.warning('The system may break down')

start_time = time.time()

g = rdflib.Graph()
g.parse ('nif.owl', format='xml')
nif = rdflib.Namespace('http://purl.org/nif/ontology/nif.owl')
g.bind('nif', nif)
query = """
select distinct ?p 
where { ?s ?p ?o}
        LIMIT 5
        """
result = g.query(query)
print(result.serialize(format='csv'))

print("--- %s seconds ---" % (time.time() - start_time))
Anestassia answered 17/6, 2019 at 12:16 Comment(16)
Possible duplicate of How to get Python to make new lines in a print statement with \r\n?Shillyshally
AFAICT the code does exactly what it should be expected to do. It’s just looking weird because the print statement formats the single-column CSV output into a single line. See the question I linked above for the solution to that.Shillyshally
nothing has "crashing" here - it's just your broken output of the query result.Opsis
that said, are you sure rdflib is the appropriate API here? I might be wrong, but your ontology basically contains just a set of owl:import statements - this, in fact is a feature of OWL 2 - I doubt that rdflib will handle those imports is intended given that rdflib is designed for RDF. I'm pretty sure it will not load all imports into the graph. Honestly, if you don't need SPARQL, a dedicate OWL lib like owlready2 would be the better way to work on OWL ontologies.Opsis
@AKSW I have to use SPARQL because the program is based on that and I have to later use these queries as part of an application which is being designed. I have gone through many problems such as getting a machine with bigger memory and changing from Java to Python. Do you know of any methods that I can use SPARQL and this file? I wanted to use Protege but faced many problems while loading the owl file. Is there an endpoint I can use or something like that?Anestassia
@RFNO it's not clear what the problem is, exactly. You are "not sure if you're loading it correctly" - why not? What are you trying to do with it after you load it? Is that succeeding or failing? Note that the image you pasted in your question just shows a warning, not an error - it doesn't necessarily mean that something went wrong. Also: claiming that doing this in Java "kept crashing" and therefore you switched to Python is a little like saying that your milk keeps boiling over when you heat it on a gas hob, so you switched to induction instead.Vivacity
@JeenBroekstra Thank you for your comment. As you see in other comments it seemed there was nothing wrong with the code. However, seems that I am not getting the result I wanted because rdflib is not compatible with OWL2. If you know how to resolve this issue please let me know. Also, regarding switching from Java to Python, I know that much but I have a limited time and I try to use it on query bits (sparql) rather than getting the programming language libraries running. Therefore I switched to Python where I could get a tested code (for rdflib) from my friend.Anestassia
What was not working with Protege?Opsis
@AKSW I couldn't load the ontology in Protege. It stalled all the time. Then I got a bigger unit on cloud but it was the same. Of course, this was couple of years ago and with a smaller ontology. Furthermore, I checked Protege wiki today. Protege 3.x supports SPARQL but not OWL2. Protege 4.0+ supports OWL2 but not SPARQL. In addition, in my project, I need to show the code (SPARQL, Python or in any other forms). Therefore, I cannot just use a query portal for ontologies I am using.Anestassia
Not sure where you were looking, but Protege supports OWL and SPARQL. And the latest version is 5.5.0 - whether the ontology fits into memory, I don't know.Opsis
I looked it up here: protegewiki.stanford.edu/wiki/Protege4Migration From memory, a smaller ontology (FMA) was not fitting into the memory. But I'll try this one too. Right now, I am trying the owlready2 as you suggested but had some problem using that one too!Anestassia
if you still want to use rdflib, the only workaround is to load all the ontologies into the same graph. Either you provide the list of ontology URLs and use a loop that adds the data, or you make use the owl:import statements, i.e. use the RDF/SPARQL capabilities of rdflib and get the import URLs from the ontology document. (this indeed has to be done transitively if the imported ontologies do import other ontologies)Opsis
Let us continue this discussion in chat.Anestassia
@AKSW I couldnt get owlready2 to work! Can you elaborate on the method you mentioned? Do you mean that I should load the ontology and then import the URIs using owl:import? Can they be imported using 'prefix' (Sparql)?Anestassia
I mean, it's clear what I'm saying or not? You want to load the whole dataset, right? So either you make a list of all the ontologies that belong to the dataset manually (that is clearly also not that difficult, we're not talking about thousands of ontologies here) or you use your initial ontology that contains the owl:import statements which in fact do provide the URLs to all the ontologies. How you parse those URLs from the file doesn't matter. It's RDF/XML, so either you use SPARQL here or you use any other XML lib for Python.Opsis
@AKSW Got it. Of course I asked a friend who have been working with ontologies for a long time and he mentioned if no reasoning is needed, then rdflib is sufficient.Anestassia
H
4

There is nothing wrong with your code except that the format should be format='application/rdf+xml'.

Hamby answered 1/6, 2020 at 17:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.