TTL file format - I have no idea what this is
Asked Answered
Y

2

15

I have a file which has a structure, but I don't know what format it is, nor how to parse it. The file extension is ttl, but I have never encountered this before.

Some lines from the file looks like this:

<http://data.europa.eu/esco/label/790ff9ed-c43b-435c-b6b3-6a4a6e8e8326>
    a                   skosxl:Label ;
    skosxl:literalForm  "gérer des opérations d’allègement"@fr .

<http://data.europa.eu/esco/label/98570af6-b237-4cdd-b555-98fe3de26ef8>
    a                   skosxl:Label ;
    esco:hasLabelRole   <http://data.europa.eu/esco/label-role/neutral> , <http://data.europa.eu/esco/label-role/male> , <http://data.europa.eu/esco/label-role/female> ;
    skosxl:literalForm  "particleboard machine technician"@en .

<http://data.europa.eu/esco/label/aaac5531-fc8d-40d5-bfb8-fc9ba741ac21>
    a                   skosxl:Label ;
    esco:hasLabelRole   "http://data.europa.eu/esco/label-role/female" , "http://data.europa.eu/esco/label-role/standard-female" ;
    skosxl:literalForm  "pracovnice denní péče o děti"@cs .

And it goes on like this for 400 more MB. Additional attributes are added, for some, but not all nodes.

It reminds me of some form of XML, but I don't have much experience working with different formats. It also looks like something that can be modeles as a graph. Do you have any idea what data format it is, and how I could parse it in python?

Yellowbird answered 21/4, 2018 at 14:0 Comment(3)
What is the extension?Ammonate
The file extension is ttlYellowbird
This might help: #15172302 It seems like it could be a turtle file: w3.org/TeamSubmission/turtle/#sec-tutorialAmmonate
E
22

Yes, @Phil is correct that is turtle syntax for storing RDF data.

I would suggest you import it into an RDF store of some sort rather than try and parse 400MB+ yourself. You can use GraphDB, Blazegraph, Virtuso and the list goes on. A search for RDF stores should give many other options.

Then you can use SPARQL to query the RDF store (which is like SQL for relational databases) using Python RDFlib. Here is an example from RDFLib.

Elisabethelisabethville answered 21/4, 2018 at 18:41 Comment(1)
I second loading it into a RDF store considering it's over 400MB!Fondness
V
7

That looks like turtle - a data description language for the semantic web.

The :has label and :label are specified for two different semantic libraries defined to share data (esco and skosxl there should not be much problem finding these libraries with a search engine, assuming the data is in the semantic web) . :literal form could be thought of as the value in an XML tag.

They represent ontologies in a data structure:

Subject : 10 Predicate : Name Object : John

As for python, read the data as a file, use the subject as the keys of a dictionary, put the values in a database, its unclear what you want to do with the data.

Semantic data is open, incomplete and could have an unusual, complex structure. The example above is very simple the primer linked above may help.

Violet answered 21/4, 2018 at 14:32 Comment(3)
I really appreciate your answer, but I am still not sure how to parse it. I try it with the ``` g = Graph() g.load(self.datapath) ``` but it doesn't quite work because the dataformat is different. I don't see anything about .ttl in the documentation either.Yellowbird
nvm, just found how to open it (I hope, it's still loading..). How can I parse through the individual values? I don't quite understand the triples thing.Yellowbird
I would try using split by the base url - data.europa.eu are using it as a unique resource, then split out each attribute, feed the lot into a python data structure.Violet

© 2022 - 2024 — McMap. All rights reserved.