AWS Neptune vs. Neo4j Internal Data Storage

Asked 23/1, 2022 at 17:53 Answered 4/1, 2024 at 17:44

amazon-web-services neo4j gremlin graph-databases amazon-neptune

What is the difference between how AWS Neptune stores data internally vs. how Neo4j stores data? From this post, it says Neo4j stores each node with a direct link to its connected nodes, "relationships are organized as doubly linked lists". From what I've read, AWS Neptune is basically a relational database with a few indexes that allow for graph queries. Is this accurate? Are there any major advantages to either representation?

Additional question:

I am guessing AWS Neptune is built on top of RDS (relational database service). Were there actual reasons why AWS chose to build Neptune on top of RDS rather than creating a brand new database? I would think that building on top of RDS would save a lot of time and effort for things like data replication etc. I don't mean to be skeptical or to start conspiracies I am just trying to evaluate graph databases and this got me curious.

Jiujitsu answered 23/1, 2022 at 17:53 Comment(0)

Amazon Neptune actually uses a custom built graph query engine and optimizer. The basic unit of Amazon Neptune graph data is a four-position (quad) element, which is similar to a Resource Description Framework (RDF) quad. You will find a detailed overview of the storage format and its benefits here: https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-data-model.html

There is a nine part video series that takes a deep dive on all things graph and Neptune available here that you may find useful.

There are many components that make up the overall Neptune compute and storage architecture.

Kostroma answered 24/1, 2022 at 0:29 Comment(2)

Hi Kelvin, yes that’s what I referenced. Isn’t that just one table with 4 columns? With 3 additional indices? How does this compare with Neo4j which appears to structure a graph with direct links? – Jiujitsu 24/1, 2022 at 1:45

There is a lot more to the architecture than I can do justice to in an answer here. I provided a link in an update to the answer for a deep dive video series. – Kostroma 25/1, 2022 at 14:29

From this site, from point 6 I can see substantial differences like:

Secondary index (supported only in Neo4J)
Triggers (supported only in Neo4J)
Server-side script (supported only in Neo4J)
Speed (Aura Neo4J in theory is a bit slower altough users in stackoverflow found the engine of Neptune using Gremlin query performs slower - maybe due to a curve learning? ) - If i find better evidence I will share in comments.
And in the image below, few more differences.

Giulia answered 4/1, 2024 at 17:44 Comment(3)

That image above must be quite old. It's definitely out of date. For example Amazon Neptune added support for openCypher in 2021 in addition to the existing Gremlin and SPARQL support. The Gremlin support in Neptune is also at the 3.6.x level now. The Gremlin 3.3 listed above was launched in 2018. Further, both Neo4j and Neptune support access from lots more programming languages than the ones listed. Overall, that diagram is a mix of incorrect and incomplete both in terms of Neo4j and Neptune. – Kostroma 4/1, 2024 at 18:59

Thank you @Kelvin Lawrence! I'm particularly interested in the performance from both of them when querying millions of data nodes/relationships. The post I found for the point of "speed" is from 11th October 2023. I assume that the trick on this is about how good do you use Gremlin or OpenCypher and which engine do you use in AWS Neptune. I dont have experience with it. Feel free to share your experience please! – Giulia 4/1, 2024 at 19:4

Regardless of which graph database you use, performance will almost always be a factor of many things, including, but not limited to, how the data is modeled, and how the queries are written. For the specific post linked, and thanks for that by the way, I had not seen that one before, it needs more information to fully understand. – Kostroma 4/1, 2024 at 20:36

Recommended topics

Hot tags