What are the pros and cons of RDB2RDF tools? [closed]
Asked Answered
M

2

1

I need to know the difference between RDB2RDF tools. Could anybody tell me what are the pros and cons of RDB2RDF tools? Especially for the following ones: Virtuoso, Ultrawrap, Ontop, Morph, Xsparql, D2RQ,....

Mckeehan answered 3/8, 2017 at 6:37 Comment(2)
Stackoverflow is not a platform for tool comparison! Read survey papers if you need comparisonDeuteranopia
This sort of question definitely doesn't belong here. I think it should prompt creation of a page like this. You don't necessarily need all the answers, but it's very helpful if you can start by creating columns for the features you care about, and rows for the tools you care about. Others can add tools and/or features you've left out, and anyone can fill in the blanks. It'll be much easier to work with the info thereafter, than the streaming info now found below!Coly
M
7

There are two W3C-standardized ways to convert relational data to RDF:

  1. Direct Mapping — non-customizable default mapping. Direct Mapping is suitable when relational data is well normalized, there are primary keys, foreign keys etc.
  2. R2RML — customizable mapping.

In the survey below, I consider R2RML implementations only.

Many R2RML implementations are listed here. I do not consider tools that are:

  • dead,
  • paid,
  • requiring programming,
  • full-stack (i. e. claim to replace all the software you already use),
  • working in the wrapper mode only, not in the ETL mode.

XSPARQL

Syntax example

java -jar cli-0.5-jar-with-dependencies.jar -h
java -server -jar -Dfile.encoding=utf-8 cli-0.5-jar-with-dependencies.jar --mysql --dbName=mydb --dbServer=127.0.0.1 --dbUser=root --r2rml=r2rml.ttl > result.ttl

Remarks

  • cli-0.5-jar-with-dependencies.jarcommand-line jar.
    Version 0.5 is preferable, you will recieve "Prefix cannot be null" in the latter ones.

Conclusion

Intermediate translation into XQuery is used, very slow.

ONTOP

Ontop is a popular Protégé plugin, but also available as a set of command line utilities.

Syntax example

ontop materialize --url "jdbc:mysql://localhost:3306/mydb" --mapping "../r2rml.ttl" --username root --password "65536" --driver-class com.mysql.jdbc.Driver --disable-reasoning --format turtle --output result.ttl

Remarks

  • In MySQL, you have to set SET GLOBAL SQL_MODE-ANSI_QUOTES;

Conclusion

Ontop was designed for working with ontologies and generates many ontological garbage like ... rdf:type owl:namedIndividual.

Ontop tries to parse and rewrite an SQL query from rr:sqlQuery, does not understand many SQL constructs and honestly suggests you to create appropriate SQL view in your relational database.

R2RML support is partial. Ontop R2RML manual. Really fast.

RDB2RDF::R2RML

I haven't been able to install this Perl module: there are many dependencies that are absent on CPAN.

D2RQ

D2RQ is a full-stack solution, however one can extract standalone tool from the D2RQ distribution.

R2RML is supported in the preview version only.

D2RQ provides its own mapping language (by the way, as well as Ontop).

Conclusion

As well as I remember, D2RQ divides your SQL query from rr:sqlQuery into many "atomic" queries and extracts database records one by one, which is really slow.

D2RQ R2RML Manual.

CONCLUSION

My personal choice is Ontop.

See also:

Meantime answered 3/8, 2017 at 9:32 Comment(3)
Thank you for your kind response,Could you please provide more information about the cons of Virtuoso and UltrawrapMckeehan
@rawanaz, unfortunately, I have not tried these software (at least in this capacity in case of Virtuoso). However developers of Virtuoso and Ultrawrap are on Stackoverflow.Meantime
Ultrawrap is a commercial product that supports the W3C RDB2RDF standards. You can contact Capsenta directly if you are interested in learning more about Ultrawrap.Absolutism
D
3

I haven't thought about this as rigorously as @Stanislav Kralin, or defined what I expect in terms of performance, elegance, expressiveness, etc.

More and more of the triplestores offer their own bridge between relational data and semantic triples. I'm thinking especially of Stardog and GraphDB. I believe that Stardog (and Virtuoso's?) solutions don't actually concretely dump triples. Rather they create a virtual semantic view of one or more tables.

D2R was the first instantiator I used. I'm surprised @Stanislav Kralin included it, because it is kinda dead (or un-maintained) and it does kinda require programming (or writing out statements in a declarative language.) I didn't know about the R2RML preview... I'll have to check that out, because I was concerned about using their proprietary language.

I believe some of my academic colleagues use the reference R2RML parser.

I have been pretty happy with Karma from ISI. Instantiating tabular/relational data is a big part of my research, and I have certainly found some edge cases that have been difficult to implement, for example linking multiple singleton instances.

  • The documentation is good
  • installation is easy
  • there's a nice web GUI, plus a command line bulk transformation script

Karma doesn't use just pure R2RML:

  • They use R2RML
    • with JSON worksheets as the object of at least one triple
      • with Python data transformations in the JSON
Denotation answered 3/8, 2017 at 14:13 Comment(3)
Replication from RDB to RDF spheres is possible with Virtuoso's "Linked Data Views", using R2RML or proprietary language (initially implemented before R2RML was standardized). Open Source Edition can only expose/transform local RDB data as RDF; Commercial Edition can also expose/transform any remote ODBC-accessible RDB data. Major improvements to this functionality are coming in Virtuoso 8, now in beta. (ObDisclaimer: OpenLink Software produces Virtuoso and employs me.)Coly
@TallTed, will there be an opensource version of Virtuoso 8? If so, do you know roughly when?Denotation
There will be a VOS 8. Timing for this has not been set.Coly

© 2022 - 2024 — McMap. All rights reserved.