I was wondering what are the advantages of using Triple Stores over a relational database?
The viewpoint of the CTO of a company that extensively uses RDF Triplestores commercially:
Schema flexibility - it's possible to do the equivalent of a schema change to an RDF store live, and without any downtime, or redesign - it's not a free lunch, you need to be careful with how your software works, but it's a pretty easy thing to do.
More modern - RDF stores are typically queried over HTTP it's very easy to fit them into Service Architectures without hacky bridging solutions, or performance penalties. Also they handle internationalised content better than typical SQL databases - e.g. you can have multiple values in different languages.
Standardisation - the level of standardisation of implementations using RDF and SPARQL is much higher than SQL. It's possible to swap out one triplestore for another, though you have to be careful you're not stepping outside the standards. Moving data between stores is easy, as they all speak the same language.
Expressivity - it's much easier to model complex data in RDF than in SQL, and the query language makes it easier to do things like LEFT JOINs (called OPTIONAL in SPARQL). Conversely though, if your data is very tabular, then SQL is much easier.
Provenance - SPARQL lets you track where each piece of information came from, and you can store metadata about it, letting you easily do sophisticated queries, only taking into account data from certain sources, or with a certain trust level, on from some date range etc.
There are downsides though. SQL databases are generally much more mature, and have more features than typical RDF databases. Things like transactions are often much more crude, or non existent. Also, the cost per unit information stored in RDF v's SQL is noticeably higher. It's hard to generalise, but it can be significant if you have a lot of data - though at least in our case it's an overall benefit financially given the flexibility and power.
Both commenters are correct, especially since Semantic Web is not a database, it's a bit more general than that.
But I guess you might mean triple store, rather than Semantic Web in general, as triple store v. relational database is a somewhat more meaningful comparison. I'll preface the rest of my answer by noting that I'm not an expert in relational database systems, but I have a little bit of knowledge about triple stores.
Triple (or quad) stores are basically databases for data on the semantic web, particularly RDF. That's kind of where the similarity between triples stores & relational databases end. Both store data, both have query languages, both can be used to build applications on top of; so I guess if you squint your eyes, they're pretty similar. But the type of data each stores is quite different, so the two technologies optimize for different use cases and data structures, so they're not really interchangeable.
A lot of people have done work in overlaying a triples view of the world on top of a relational database, and that can work, and also will be slower than a system dedicated for storing and retrieving triples. Part of the problems is that SPARQL, the standard query language used by triple stores, can require a lot of self joins, something relational databases are not optimized for. If you look at benchmarks, such as SP2B, you can see that Oracle, which just overlays SPARQL support on its relational system, runs in the middle or at the back of the pack when compared with systems that more natively support RDF.
Of course, the RDF systems would probably get crushed by Oracle if they were doing SQL queries over relational data. But that's kind of the point, you pick the tool that's well suited for the application you want to build.
So if you're thinking about building a semantic web application, or just trying to get some familiarity in the area, I'd recommend ultimately going with a dedicated triple store.
I won't delve into reasoning and how that plays into query answering in triple stores, as that's yet another discussion, but it's another important distinction between relational systems and triple stores that do reasoning.
Some triplestores (Virtuoso, Jena SDB) are based on relational databases and simply provide an RDF / SPARQL interface. So to rephrase the question slighty, are triplestores built from the ground up as a triplestore more performant than those that aren't - @steve-harris definitely knows the answer to that ;) but I wager a yes.
Secondly, what features do triplestores have that RDBMS don't. The simple answer is support for SPARQL, RDF, OWL etc. (i.e the Semantic Web Technology stack) and to make it a fair fight, its better to define the value of SPARQL based on SPARQL 1.1 (it has considerably more features than 1.0). This provides support for federation (so so cool), property path expressions and entailment regimes along with an standards set of update protocols, graph management protocols (that SPARQL 1.0 didn't have and sorely lacked). Also @steve-harris points out that transactions are not part of the standard (can of worms) although many vendors provide non-standardised mechanisms for transactions (Virtuoso supports JDBC and Hibernate compliant connection pooling and management along with all the transactional features of Hibernate)
The big drawback in my mind is that not many triplestores support all of SPARQL 1.1 (since it is still not in recommendation) and this is where the real benefits lie.
Having said that, I am and always have been an advocate of substituting RDBMS with triplestores and platforms I deliver run entirely off triplestores (Volkswagen in my last role was an example of this), deprecating the need for RDBMS. An additional advantage is that Object to RDF mapping is more flexible and provides more options and flexibility than traditional ORM (also known as putting a square peg in a round hole).
Also you can still use a database but use RDF as a data exchange format which is very flexible.
© 2022 - 2024 — McMap. All rights reserved.