Combining a Graph Database and an RDBMS

Asked 28/7, 2016 at 21:16 Answered 27/2, 2021 at 14:10

Solved mysql database neo4j rdbms graph-databases

Is it bad design / difficult to implement two or more databases for an application?

For instance, let's say I have User objects which I would like to store into a relational database. These 'User' objects have relationships with one another and have user feeds (think of Twitter / Facebook) and I want to store these relationships to be able to find friends of a friend, to see how "deep" I am into a chain of feeds, etc. These relationships would be stored in the graph database.

Is there any better way to go about this or would using a graph database for relationships and a relational database for data storage be the best solution?

Lebkuchen answered 28/7, 2016 at 21:16 Comment(1)

I wanna to implement the same approach in my project - store all data in RDBMS and all relationships in neo4j. If u already implemented this - is it a good idea or not ??? If select (without any relationship) some huge data from RDBMS, it's faster than with graph DB??? Any other pros and cons? – Mientao 7/10, 2019 at 20:53

Disclaimer: I haven't used the enterprise version of neo4j yet, so it may have capabilities to help you here that I am not aware of.

If you can keep everything in neo4j, that's best as it keeps complexity low on several fronts, including data modeling, keeping data in sync, and keeping queries easy and atomic instead of splitting them between separate databases.

It would help to know what your requirements are for using RDBMS, and if those requirements justify the complexity introduced from above.

If you are determined to do this, then you've got a choice between high data redundancy, or going with a more skeletal neo4j db which only keeps IDs, relationships, and minimal data.

With high data redundancy, mirroring most if not all data in neo4j, then you've got the added complexity of keeping everything in sync and consistent between your dbs (not trivial at all). This buys you richer queries through neo4j since most data is all in the same db, and greater ability to cut off from your rdbms and go with just neo4j in the future. But any query that alters both dbs will not be automatically atomic...you'll have to do some kind of enforcement in your server side code for this, and that's likely to be tricky.

With a skeletal approach, most of your neo4j queries will have ID inputs and ID outputs. Rich data may not be available, so then you'll take those IDs and do your select on your rdbms on those ids for the data you need. Any queries involving relationships and expecting data back will require usage of both dbs, which can be troublesome for developers, though probably fine on your server-side code. You'll be avoiding issues of synchronization and probably atomicity as well, since the common data between the two dbs will be minimal.

It's worth noting that there are some solutions for integrating with other databases in a similar pattern to what you've proposed.

GraphAware has a concept of graph-aided search which offers integration between ElasticSearch and neo4j, though this is primarily to address requirements for rich searchability. This can be used to either feed ElasticSearch for use as a pure search engine, or it can let you store all your data in ES and boost or affect results based upon relational data in neo4j.

Medallist answered 28/7, 2016 at 21:54 Comment(8)

My plan was to have little data redundancy between the two systems (looking into Neo4J and MySQL). I'd store all of the data necessary for the application using MySQL and keep only the relationships in the Neo4J database. Let's say for instance 10 users join and create their profiles. This data is stored into MySQL. Let's say User1 adds User2. This relationship would be stored into Neo4J. If I do run into synchronicity problems, then only the friends feature / following features would be down. Creating a profile, logging in / managing a profile will all still be functional. Good approach? – Lebkuchen 28/7, 2016 at 22:12

It works...you'll be dodging the data synchronization bullet, but your devs will need two tools to do more complex queries involving and/or returning both relationships and data. Not the end of the world, but it's not a streamlined workflow. That said, the big unanswered question is why do you need the rdbms? If it's something forced on you from those higher up on the business food chain, there's no way around it. But if it's your choice, what are your reasons for including it instead of standardizing on one db? – Medallist 28/7, 2016 at 23:18

We were planning on developing a tool to be able to do queries to both databases simultaneously in order to load both data from the MySQL database and the relationships from Neo4J. Also, we have a lot of data that's best stored in an RDBMS (have a decent amount of FK's) which Neo4J can't handle as well. That's why I figured we'd use MySQL for the bulk of the data as it handles that tremendously well and use Neo4J for the relationships. Sort of utilizing the strengths of the two databases and avoiding the weaknesses. – Lebkuchen 29/7, 2016 at 0:10

Fair enough...though if you do have neo4j available, it can't hurt to attempt to try modeling and adding this data in neo4j, just to check if there is the possibility of handling the data well, in case the choice was based on an assumption. Maintaining 2 dbs has enough overhead that you'll want to be certain of your assumptions. As far as foreign keys go, those are typically used for joining. In neo4j that would transform into relationships between nodes, the foreign keys may not be needed at all. – Medallist 29/7, 2016 at 0:21

Hmm, that is a good idea. I'll run it by the other devs and see what they think. This is largely based on research so not necessarily baseless assumptions. We're going to need to pull anywhere from 1,000 to 10,000,000 rows at a time for various tasks. The table we'd take upward of 10,000,000 rows from contains a foreign key, so we'd need to join the tables in order to get specific demographics from the associated table. From what I've read, Neo4J is awful at handling a job like this. Please correct me if I'm wrong though, and I'll still attempt using only Neo4J. – Lebkuchen 29/7, 2016 at 12:4

If you can provide a sample sql query, it might be clearer how to model the data in neo4j for a fast query. Though in general, it sounds like you'd create the graph by transforming those foreign keys into relationships to other nodes. Provided that you're looking for a node with an id, you'd match that first, then find and traverse any relationships back to nodes of the type you're looking for. – Medallist 29/7, 2016 at 15:53

There's Video objects and User objects. Let's say 1,000,000 users watch a video. I could store those relationships in Neo4J with edges to a video node. If I wanted to look up gender, age, location, and other demographics for view analytics, would Neo4J be able to handle retrieving 1,000,000 + user demographics? My original idea was to keep the views as a composite key between user_id and video_id, then join that table with the User table to retrieve demographics. From what I've read, RDBMS is a lot faster for jobs like that, but graph DB's are a LOT better at storing relationships – Lebkuchen 29/7, 2016 at 16:16

I am looking also for a solution of using a graph db to store infrastructure and weather forecast information with geospatial attributes. Forecast are provided by grids which can be stored as points or polygons. Infrastructure nodes are also stored as points. I would like to keep the forecast values in a rdbms since there is a huge amount of entries (granularity = hourly, layer, parameter, run). The idea is to find which grid correspond to a node and then retrieve, i..e., the wind speed forecast at 100m for the next 12 hours. Any experience to share? – Coss 3/3, 2020 at 12:23

I am going to disagree with all of those who say that you shouldn't use two different kinds of databases in one project. I've been building systems for 30+ years and in all of the enterprise systems that I've had an opportunity to work on, we always used the best tools for each part of the project. It's common to combine object/graph databases with relational databases and document databases into the same project. We will support high-speed link analysis in the object/graph database and then hop out to the relational database for metadata and then into the document database for the original source information. Or you can reverse the entire process and start with a document search, then do the high-speed link analysis.

In many cases, it's a matter of being forced to use exabytes of legacy relational data, but needing to add new capabilities such has high-speed link analysis. Instead of attempting to port all of the relational data over to the object/graph database, we just move over the information we need.

Don't force a square peg into a round hole. Use the best tool for each part of the project.

Passerine answered 27/2, 2021 at 14:10 Comment(0)

A little bit late for the party. Generally it is not recommended to use two databases in one project at the same time. Just like @InverseFalcon pointed out in the answer above, there will be plenty of problems with such an architecture.

To my understanding, you can use a distributed graph database, which can not only handle the relationships perfectly well, but also solve the storage problem of large amounts of data.

Denticulate answered 12/6, 2020 at 3:12 Comment(1)

Where would you store your document data from which the graphed entities were extracted, and how would you support a document search capability? – Passerine 4/3, 2021 at 14:30

If you are looking to combine the relational and graph data model. You can use SQL Server 2017+ versions, where graph capabilities are added. The graph relationships are integrated into Transact-SQL and receive the benefits of using SQL Server as the foundational database management system.

As mentioned in the documentation:

Graph extensions are fully integrated in SQL Server engine. Use the same storage engine, metadata, query processor, etc. to store and query graph data. Query across graph and relational data in a single query. Combining graph capabilities with other SQL Server technologies like columnstore, HA, R services, etc. SQL graph database also supports all the security and compliance features available with SQL Server.

References

Territorialism answered 11/2, 2021 at 9:55 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

References

Recommended topics

Hot tags