Comparison of Relational Databases and Graph Databases
Asked Answered
G

8

128

Can someone explain to me the advantages and disadvantages for a relation database such as MySQL compared to a graph database such as Neo4j?

In SQL you have multiple tables with various ids linking them. Then you have to join to connect the tables. From the perspective of a newbie why would you design the database to require a join rather than having the connections explicit as edges from the start as with a graph database. Conceptually it would make no sense to a newbie. Presumably there is a very technical but non-conceptual reason for this?

Greer answered 24/10, 2012 at 9:31 Comment(1)
The access methods are different. In a Relational Database, you use Relational Algebra, best augmented with recursion, an awkward but popular representation of which is (recursive, with procedural extras) SQL. In a Graph Database, you use graph traversal languages like Gremlin. The underlying DB implementations down to the on-disk layout would be chosen to provide best performance for the respective access method, and arbitary tuning/variation may be found in implementations.Fashion
R
156

There actually is conceptual reasoning behind both styles. Wikipedia on the relational model and graph databases gives good overviews of this.

The primary difference is that in a graph database, the relationships are stored at the individual record level, while in a relational database, the structure is defined at a higher level (the table definitions).

This has important ramifications:

  • A relational database is much faster when operating on huge numbers of records. In a graph database, each record has to be examined individually during a query in order to determine the structure of the data, while this is known ahead of time in a relational database.
  • Relational databases use less storage space, because they don't have to store all of those relationships.

Storing all of the relationships at the individual-record level only makes sense if there is going to be a lot of variation in the relationships; otherwise you are just duplicating the same things over and over. This means that graph databases are well-suited to irregular, complex structures. But in the real world, most databases require regular, relatively simple structures. This is why relational databases predominate.

Reword answered 24/10, 2012 at 9:51 Comment(8)
Storing relationships at the record level makes sense in other cases as well, as it provides index-free adjacency. That is, graph traversals can be performed with no index lookups leading to much better performance. And it's not duplication, as you store the actual relationships, which differ.Lezley
You say: "In a graph database, each record has to be examined individually during a query in order to determine the structure of the data". Is this a universal property of graph databases or more or less true in general? How about OrientDb which supports full schema for vertices and edges?Periphrasis
@LodewijkBogaards some graph databases, like Neo4j, allow basic indexing. If the query hits the indexes, I believe there is no need to determine the structure of the data behind the index. But it depends on the query.Yasui
I strongly disagree to both the points. Graph database is always faster when there are foreign keys. Because we don't need join operations. Relational databases have to store the foreign key in many tables. An edge and a foreign key should take a same storage space.Joshia
@Joshia Do you also have a documentation from which we can also conclude the same ?Gan
pointing to spin from a vendor doesn't create a definitive conclusion.Distressed
Your characterizations of the data structures/abstractions don't make much sense. A relational table represents an n-ary relationship/association & each row is an instance of values/entities so related/associated. It's not clear how that is not "at the individual record level". The operators are table-level & programming is largely declarative & automatically optimized, which is maybe what you are trying to get at. The actual difference is the graph DB's specialized data structure & querying, allowing specialized implementation at the cost of specialized programming.Hurleigh
Your comment, "In a graph database, each record has to be examined individually during a query in order to determine the structure of the data" is absolutely not true for Objectivity/DB. It is a massively scalable object/graph database that uses a schema based approach for defining its data model. In addition to having schema, it also has an object placement management capability that allows me to create highly optimized placement strategies. I can place objects of two different types on the same disk page and dramatically improve performance over exabyte scale, distributed graph databases.Palladous
C
108

The key difference between a graph and relational database is that relational databases work with sets while graph databases work with paths.

This manifests itself in unexpected and unhelpful ways for a RDBMS user. For example when trying to emulate path operations (e.g. friends of friends) by recursively joining in a relational database, query latency grows unpredictably and massively as does memory usage, not to mention that it tortures SQL to express those kinds of operations. More data means slower in a set-based database, even if you can delay the pain through judicious indexing.

As Dan1111 hinted at, most graph databases don't suffer this kind of join pain because they express relationships at a fundamental level. That is, relationships physically exist on disk and they are named, directed, and can be themselves decorated with properties (this is called the property graph model, see: https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model). This means if you chose to, you could look at the relationships on disk and see how they "join" entities. Relationships are therefore first-class entities in a graph database and are semantically far stronger than those implied relationships reified at runtime in a relational store.

So why should you care? For two reasons:

  1. Graph databases are much faster than relational databases for connected data - a strength of the underlying model. A consequence of this is that query latency in a graph database is proportional to how much of the graph you choose to explore in a query, and is not proportional to the amount of data stored, thus defusing the join bomb.
  2. Graph databases make modelling and querying much more pleasant meaning faster development and fewer WTF moments. For example expressing friend-of-friend for a typical social network in Neo4j's Cypher query language is just MATCH (me)-[:FRIEND]->()-[:FRIEND]->(foaf) RETURN foaf.
Cyanotype answered 30/7, 2013 at 9:17 Comment(8)
"Relationships are therefore first-class entities in a graph database". The same is typically true in a relational database: entities are mapped to tuples in relations, as are many-many relationships. Is the distinction you describe for the one-many relationships, which are often merged into entity relationships?Semipro
This comparison seems a little biased. What about drawbacks?Karaganda
A little? Too biased in my honest opinion. Looks like a "This is a good product! Buy this" ad to me at best!Promise
This needs a massive caveat: this guy is the "chief scientist" at Neo Technology, who make the Neo4J graph database.Urtication
How about an arbitrary search... give me all users that are 35 to 55 and shop at walmart in the last 90 days.Distressed
Huge conflict of interest in this answer. -1Yakut
There is no disclaimer from the answer coming from Jim Webber, member of Neo4JRao
Are product plugs now allowed on stackexchange? I am working on a few I'd like to peddle here.Harlequin
R
23

Dan1111 has already given an answer flagged as correct. A couple of additional points are worth noting in passing.

First, in almost every implementation of graph databases, the records are "pinned" because there are an unknown number of pointers pointing at the record in its current location. This means that a record cannot be shuffled to a new location without either leaving a forwarding address at the old location or breaking an unknown number of pointers.

Theoretically, one could shuffle all the records at once and figure out a way to locate and repair all the pointers. In practice this is an operation that could take weeks on a large graph database, during which time the database would have to be off the air. It's just not feasible.

By contrast, in a relational database, records can be reshuffled on a fairly large scale, and the only thing that has to be done is to rebuild any indexes that have been affected. This is a fairly large operation, but nowhere near as large as the equivalent for a graph database.

The second point worth noting in passing is that the world wide web can be seen as a gigantic graph database. Web pages contain hyperlinks, and hyperlinks reference, among other things, other web pages. The reference is via URLs, which function like pointers.

When a web page is moved to a different URL without leaving a forwarding address at the old URL, an unknown number of hyperlinks will become broken. These broken links then give rise to the dreaded, "Error 404: page not found" message that interrupts the pleasure of so many surfers.

Rossen answered 26/10, 2012 at 5:12 Comment(6)
Only that most graph databases have integrity rules that don't allow for broken links.Listel
If the DBMS pins the target, this will obviously prevent link breakage due to moving the target of the link. I don't know of any graph databases that don't pin records that might be targets of links.Rossen
Are graph databases usually schema-less because a schema change would be a very heavy operation because of the need to rewrite all pointers? Can the reshuffling problem not be circumvented by simply storing virtual pointers, which go through a lookup table? This would still perform at O(1) right?Periphrasis
I've been operating under a definition of graph databases that would include pre-relational databases such as hierarchical or network ones. Some of these databases had schemas, albeit not relational schemas. I'm not sure whether or not my operational definition agrees with the standard definition.Rossen
A data structure that provides a mapping between virtual pointers and physical pointers is essentially the same thing as an index, with about the same costs. You might as well go ahead and use a relational database.Rossen
Is it effeicent to use graph and relational databases in web project?(project is similar to facebook)Seafarer
A
11

With a relational database we can model and query a graph by using foreign keys and self-joins. Just because RDBMS’ contain the word relational does not mean that they are good at handling relationships. The word relational in RDBMS stems from relational algebra and not from relationship. In an RDBMS, the relationship itself does not exist as an object in its own right. It either needs to be represented explicitly as a foreign key or implicitly as a value in a link table (when using a generic/universal modelling approach). Links between data sets are stored in the data itself.

The more we increase the search depth in a relational database the more self-joins we need to perform and the more our query performance suffers. The deeper we go in our hierarchy the more tables we need to join and the slower our query gets. Mathematically the cost grows exponentially in a relational database. In other words the more complex our queries and relationships get the more we benefit from a graph versus a relational database. We don’t have performance problems in a graph database when navigating the graph. This is because a graph database stores the relationships as separate objects. However, the superior read performance comes at the cost of slower writes.

In certain situations it is easier to change the data model in a graph database than it is in an RDBMS, e.g. in an RDBMS if I change a table relationship from 1:n to m:n I need to apply DDL with potential downtime.

RDBMS has on the other hand advantages in other areas, e.g. aggregating data or doing timestamped version control on data.

I discuss some of the other pros and cons in my blog post on graph databases for data warehousing

Avidin answered 16/6, 2017 at 18:48 Comment(1)
"The word relational in RDBMS stems from relational algebra"--Sort of. "and not from relationship."--Not relationship in the sense of FK, but yes relationship in that the relational in relational algebra & RDBMS comes from relation in the sense of table representing a relationship/association. FKs are wrongly called relationships by methods that misunderstand the relational model. FKs need not be known or exist to record or query. They are for integrity. What is necessary & sufficient to query is to know the relationship/association that a (base or query result) table represents.Hurleigh
D
5

While the relational model can easily represent the data that is contained in a graph model, we face two significant problems in practice:

  1. SQL lacks the syntax to easily perform graph traversal, especially traversals where the depth is unknown or unbounded. For instance, using SQL to determine friends of your friends is easy enough, but it is hard to solve the “degrees of separation” problem.
  2. Performance degrades quickly as we traverse the graph. Each level of traversal adds significantly to query response time.

Reference: Next Generation Databases

Drama answered 2/10, 2018 at 8:18 Comment(1)
Recursive SQL solves both these problems. Postgres, SQL Server, Oracle, and now even MySQL support recursive queries.Harlequin
T
4

Graph databases are worth investigating for the use cases that they excel in, but I have had some reason to question some assertions in the responses above. In particular:

A relational database is much faster when operating on huge numbers of records (dan1111's first bullet point)

Graph databases are much faster than relational databases for connected data - a strength of the underlying model. A consequence of this is that query latency in a graph database is proportional to how much of the graph you choose to explore in a query, and is not proportional to the amount of data stored, thus defusing the join bomb. (Jim Webber's first bullet point)

In other words the more complex our queries and relationships get the more we benefit from a graph versus a relational database. (Uli Bethke's 2nd paragraph)

While these assertions may well have merit, I have yet to find a way to get my specific use case to align with them. Reference: Graph Database or Relational Database Common Table Extensions: Comparing acyclic graph query performance

Transparent answered 10/8, 2020 at 0:1 Comment(0)
H
1

Relational Databases are much more efficient in storing tabular data. Despite the word “relational” in their name, relational databases are much less effective at storing or expressing relationships between stored data elements. The term 'relational' in relational databases relates more to relating columns within a table, not relating information in different tables. Relationships between columns exist to support set operations. So as Database grows in millions or billions records it becomes extremely slow to retrieve data from relational databases.

Unlike a relational database, a graph database is structured entirely around data relationships. Graph databases treat relationships not as a schema structure but as data, like other values. It is very fast to retrieve data from graph databases. From a relational database standpoint, you could think of this as pre-materializing JOINs once at insertion time instead of computing them for every query. Because the data is structured entirely around data relationships, real-time query performance can be achieved no matter how large or connected the dataset gets. The graph databases take more storage space compared to relational database.

Hydrous answered 23/10, 2020 at 16:16 Comment(0)
O
0

Relational databases like MySQL use joins and normalization to ensure data consistency and optimize storage. Graph databases like Neo4j represent relationships as edges, providing a more natural and efficient way to handle interconnected data, making them suitable for scenarios with complex relationships.

Osrock answered 23/7, 2023 at 16:11 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.