I need to know the difference between RDB2RDF tools.
Could anybody tell me what are the pros and cons of RDB2RDF tools? Especially for the following ones: Virtuoso, Ultrawrap, Ontop, Morph, Xsparql, D2RQ,....
There are two W3C-standardized ways to convert relational data to RDF:
- Direct Mapping — non-customizable default mapping. Direct Mapping is suitable when relational data is well normalized, there are primary keys, foreign keys etc.
- R2RML — customizable mapping.
In the survey below, I consider R2RML implementations only.
Many R2RML implementations are listed here. I do not consider tools that are:
- dead,
- paid,
- requiring programming,
- full-stack (i. e. claim to replace all the software you already use),
- working in the wrapper mode only, not in the ETL mode.
XSPARQL
Syntax example
java -jar cli-0.5-jar-with-dependencies.jar -h
java -server -jar -Dfile.encoding=utf-8 cli-0.5-jar-with-dependencies.jar --mysql --dbName=mydb --dbServer=127.0.0.1 --dbUser=root --r2rml=r2rml.ttl > result.ttl
Remarks
cli-0.5-jar-with-dependencies.jar
— command-line jar.
Version 0.5 is preferable, you will recieve "Prefix cannot be null" in the latter ones.
Conclusion
Intermediate translation into XQuery is used, very slow.
ONTOP
Ontop is a popular Protégé plugin, but also available as a set of command line utilities.
Syntax example
ontop materialize --url "jdbc:mysql://localhost:3306/mydb" --mapping "../r2rml.ttl" --username root --password "65536" --driver-class com.mysql.jdbc.Driver --disable-reasoning --format turtle --output result.ttl
Remarks
- In MySQL, you have to set
SET GLOBAL SQL_MODE-ANSI_QUOTES;
Conclusion
Ontop was designed for working with ontologies and generates many ontological garbage like ... rdf:type owl:namedIndividual
.
Ontop tries to parse and rewrite an SQL query from rr:sqlQuery
, does not understand many SQL constructs and honestly suggests you to create appropriate SQL view in your relational database.
R2RML support is partial. Ontop R2RML manual. Really fast.
RDB2RDF::R2RML
I haven't been able to install this Perl module: there are many dependencies that are absent on CPAN.
D2RQ
D2RQ is a full-stack solution, however one can extract standalone tool from the D2RQ distribution.
R2RML is supported in the preview version only.
D2RQ provides its own mapping language (by the way, as well as Ontop).
Conclusion
As well as I remember, D2RQ divides your SQL query from rr:sqlQuery
into many "atomic" queries and extracts database records one by one, which is really slow.
CONCLUSION
My personal choice is Ontop.
See also:
I haven't thought about this as rigorously as @Stanislav Kralin, or defined what I expect in terms of performance, elegance, expressiveness, etc.
More and more of the triplestores offer their own bridge between relational data and semantic triples. I'm thinking especially of Stardog and GraphDB. I believe that Stardog (and Virtuoso's?) solutions don't actually concretely dump triples. Rather they create a virtual semantic view of one or more tables.
D2R was the first instantiator I used. I'm surprised @Stanislav Kralin included it, because it is kinda dead (or un-maintained) and it does kinda require programming (or writing out statements in a declarative language.) I didn't know about the R2RML preview... I'll have to check that out, because I was concerned about using their proprietary language.
I believe some of my academic colleagues use the reference R2RML parser.
I have been pretty happy with Karma from ISI. Instantiating tabular/relational data is a big part of my research, and I have certainly found some edge cases that have been difficult to implement, for example linking multiple singleton instances.
- The documentation is good
- installation is easy
- there's a nice web GUI, plus a command line bulk transformation script
Karma doesn't use just pure R2RML:
- They use R2RML
- with JSON worksheets as the object of at least one triple
- with Python data transformations in the JSON
- with JSON worksheets as the object of at least one triple
© 2022 - 2024 — McMap. All rights reserved.