ArangoDB document database and also a graph database? How is it possible?

Disclaimer: I am Max from ArangoDB, one of the core developers.

First of all, a longer discussion of this and other related questions can be found in my article Graphs in data modeling - is the emperor naked?, but I will try to answer both questions concisely here.

(1) Storing a graph in a document store is relatively easy (as it is in a relational database), one can for example simply store a document for each vertex in a "vertex-collection" and a document for each edge in an "edge-collection". One only has to make sure that each edge stores from which vertex it comes and to which vertex it goes. In ArangoDB, we use the _from and _to attributes in the edge document for this.

However, the crucial capability for a graph database is that it needs to answer queries about graphs efficiently. Typical queries for graphs are (a) "what are the neighbors of a vertex in the graph?" or (b) "what is the shortest path from vertex A to vertex B in the graph?" or (c) "give me all vertices I can reach from vertex A by following edges". Whereas (a) simply needs a good index on the edge collection, (b) and (c) involve an a priori unknown number of steps in the graph. Therefore, (b) and (c) cannot be done efficiently with traditional database query languages like SQL, simply because they would involve a large amount of communication between client and server, or at the very least a very complicated expression with a variable number of joins. I call queries like (b) and (c) therefore "graphy", without defining this rigorously.

Therefore, my short answer to "how can a document store be a graph database?" is: Store the graph as above and implement graphy queries in the database server, accessible from the query language of the data store. In principle, the same could be done with a relational database and some considerable extensions to SQL.

With ArangoDB we have managed to combine the document, the graph and the key/value features into a single, coherent query language. We therefore call ArangoDB a "multi-model database", because it combines these three data models seamlessly. You can even mix the data models in a single query!

This leads over to my answer to question (2), which is obviously a bit biased:

In comparison to ArangoDB, which is a distributed multi-model database in the above sense, Neo4j is a classical graph database. It stores graphs, allows to query them with "graphy queries" and has a storage and query engine that is optimised for that. Neo4j is particularly good at matching paths using its builtin query language cypher. It does allow to attach properties to vertices and to edges, but it is not a full featured document store. It is not optimised to handle document queries using multiple secondary indexes nor does it do joins. Furthermore, Neo4j is not distributed.

Neo4j is written in Java, ArangoDB is written in C++ and embeds Google's V8 to execute JavaScript extensions.

For a performance comparison see this post.

Recommended topics

Hot tags