Hold entire Neo4j graph database in RAM?
Asked Answered
T

2

14

I'm researching graph databases for a work project. Since our data is highly connected, it appears that a graph database would be a good option for us.

One of the first graph DB options I've run into is neo4j, and for the most part, I like it. However, I have one question about neo4j to which I cannot find the answer: Can I get neo4j to store the entire graph in-memory? If so, how does one configure this?

The application I'm designing needs to be lightning-fast. I can't afford to wait for the db to go to disk to retrieve the data I'm searching for. I need the entire DB to be held in-memory to reduce the query time.

Is there a way to hold the entire neo4j DB in-memory?

Thanks!

Thickwitted answered 12/12, 2017 at 1:31 Comment(0)
L
9

Neo4j isn't designed to hold the entire graph in main memory. This leaves you with a couple of options. You can either play around with the config parameters (as Jasper Blues already explained in more details) OR you can configure Neo4j to use RAMDisk.

The first option probably won't give you the best performance as only the cache is held in memory.

The challenge with the second approach is that everything is in-memory which means that the system isn't durable and the writes are inefficient.

You can take a look at Memgraph (DISCLAIMER: I'm the co-founder and CTO). Memgraph is a high-performance, in-memory transactional graph database and it's openCypher and Bolt compatible. The data is first stored in main memory before being written to disk. In other words, you can choose to make a tradeoff between write speed and safety.

Lardon answered 14/12, 2017 at 17:35 Comment(6)
This sounds very close to what I want; I definitely want to explore it. Can you point me to an example Memgraph config file or project that can walk me thru how to configure Memgraph so that the data is first stored in memory and then written to disk as you've described?Thickwitted
Memgraph stores the data in main memory by default. No additional config is required. If you want to try out Memgraph, please sign up for the early access.Lardon
@jtcotton63 Currently, v0.8 is available. v0.9 with support for write-ahead logging and UNION is going to be released next week.Lardon
Actually, this is wrong. Neo4j holds the whole database in RAM if the memory is available. Jasper is correct. Also Neo4j's data handling is always safe as it is transactional to disk.Violative
@Lardon Did you read Michael's answer?Chianti
@ArmenSanoyan, sorry for the delay. I haven't noticed the question before... I did see the answer, disclaimer again, I don't know all the internal details of Neo, but Neo4j seems like disk first system. On the other side, Memgraph is RAM first system (at the moment). Both approaches have pros/cons. Memgraph is more optimized for speed and highly concurrent workloads, while Neo4j is better when it comes to pure disk storage. That's on a super high level, ofc, the actual system characteristics have to be measured on a specific workload. There is one rule: measure and measure again :DLardon
H
12

Further to Bruno Peres' answer, if you want to run a regular server instance, Neo4j will load the entire graph into memory when resources are sufficient. This does indeed improve performance.

The Manual has a chapter on configuring memory.

The page cache portion holds graph data and indexes - this is configured via the dbms.memory.pagecache.size property in neo4j.conf. If it is large enough, the whole graph will be stored in memory.

The heap space portion is for query execution, state management, etc. This is set via the dbms.memory.heap.initial_size and dbms.memory.heap.max_size properties. Generally these two properties should be set to the same value, so that the whole heap is allocated on startup.

If the sole purpose of the server is to run Neo4j, you can allocate most of the memory to the heap and page cache, leaving enough left over for operating system tasks.

Memory Configuration

Holding Very Large Graphs In Memory

At Graph Connect in San Francisco, 2016, Neo4j's CTO, Jim Webber, in his typical entertaining fashion, gave details on servers that have a very large amount of high performance memory - capable of holding an entire large graph in memory. He seemed suitably impressed by them. I forget the name of the machines, but if you're interested, the video archive should have details.

Harmonics answered 12/12, 2017 at 1:42 Comment(3)
@jtcotton63 Another great place to ask this kind of question is the neo4j-users public Slack channel - recommended to join, if you have not already.Harmonics
@jcotton63 More notes : Besides memory allocation, you can tune the garbage collector settings for throughput. I believe that the idea is to a) prevent objects being promoted to old generation space too soon b) encourage more frequent, shorter collects, than less and longer ones. The impact of this is much less than being able to fit a whole graph in memory, but when performance is critical, cone nonetheless be done. Finally, if disk access must happen, use the fastest SSDs available.Harmonics
For initial load after startup you can run call apoc.warmup.run() which loads the database into memory.Violative
L
9

Neo4j isn't designed to hold the entire graph in main memory. This leaves you with a couple of options. You can either play around with the config parameters (as Jasper Blues already explained in more details) OR you can configure Neo4j to use RAMDisk.

The first option probably won't give you the best performance as only the cache is held in memory.

The challenge with the second approach is that everything is in-memory which means that the system isn't durable and the writes are inefficient.

You can take a look at Memgraph (DISCLAIMER: I'm the co-founder and CTO). Memgraph is a high-performance, in-memory transactional graph database and it's openCypher and Bolt compatible. The data is first stored in main memory before being written to disk. In other words, you can choose to make a tradeoff between write speed and safety.

Lardon answered 14/12, 2017 at 17:35 Comment(6)
This sounds very close to what I want; I definitely want to explore it. Can you point me to an example Memgraph config file or project that can walk me thru how to configure Memgraph so that the data is first stored in memory and then written to disk as you've described?Thickwitted
Memgraph stores the data in main memory by default. No additional config is required. If you want to try out Memgraph, please sign up for the early access.Lardon
@jtcotton63 Currently, v0.8 is available. v0.9 with support for write-ahead logging and UNION is going to be released next week.Lardon
Actually, this is wrong. Neo4j holds the whole database in RAM if the memory is available. Jasper is correct. Also Neo4j's data handling is always safe as it is transactional to disk.Violative
@Lardon Did you read Michael's answer?Chianti
@ArmenSanoyan, sorry for the delay. I haven't noticed the question before... I did see the answer, disclaimer again, I don't know all the internal details of Neo, but Neo4j seems like disk first system. On the other side, Memgraph is RAM first system (at the moment). Both approaches have pros/cons. Memgraph is more optimized for speed and highly concurrent workloads, while Neo4j is better when it comes to pure disk storage. That's on a super high level, ofc, the actual system characteristics have to be measured on a specific workload. There is one rule: measure and measure again :DLardon

© 2022 - 2024 — McMap. All rights reserved.