Storing multiple graphs in Neo4J
Asked Answered
A

3

17

I have an application that stores relationship information in a MySQL table (contact_id, other_contact_id, strength, recorded_at). This is fine if all I need to do is show who a contact's relationships are or even to generate a list of mutual contacts for two contacts.

But now I need to generate stats like: 'what was the total number of 2-way connections of strength 3 or better in January 2011' or (assuming that each contact is part of a group) 'which group has the most number of connections to other groups' etc.

I quickly found that the SQL for generating these stats became unwieldy real fast.

So I wrote a script that for any given date it will generate a graph in memory. I could then run whatever stat I wanted against that graph. Much easier to understand and in general, much more performant also -- except for the generating the graph part.

My next thought was to cache those graphs so I could call on them whenever I needed to run a new stat (or generate a later graph: eg for today's graph I take yesterday's graph and apply any changes that happened since yesterday). I tried memcached which worked great until the graphs grew > 1 MB.

So now I'm thinking about using a graph database like Neo4J.

Only problem is, I don't have just one graph. Or I do, but it is one that changes over time and I need to be able to query it with different reference times.

So, can I:

  • store multiple graphs in Neo4J and rertrieve/interact with them separately? i would then create and store separate social graphs for each date.

or

  • add valid to and from timestamps to each edge and filter the graph appropriately: so if i wanted a graph for "May 1st" i would only follow the newest edge between two noeds that was created before "May 1st" (and if all the edges were created after May 1st then those nodes wouldn't be connected).

I'm pretty new to graph databases so any help/pointers/hints will be appreciated.

Amigo answered 11/5, 2011 at 0:26 Comment(2)
after doing some reading i'm wondering if reference nodes are the key? i could create a reference node for each day and then build that day's graph off its reference node...Amigo
Hi there, I think using exntry nodes for the graphs, and maybe index them with some property so you can find them not only off a reference node but by index lookup can help here. Would indexing certain "meta data" properties of your subgraph entry nodes give you the right starting points?Transcalent
A
15

Right now you can store just one graph database in a single Neo4j instance, but this one graphdb can contain as many different sub-graphs as you like. You only have to keep that in mind when doing global operations (like index queries) but there you can do compound queries that include timestamped properties as well to limit the results.

One way of doing that is, as you said adding temporal information to edges to represent the structure of a graph for a given date you can then traverse the structure of the graph back then.

Reference node has a different meaning in Neo4j.

Using category nodes per day (and linking them and also aggregating them for higher level timespans) is the more graphy way of categorizing nodes than indexed properties. (Effectively these are in-graph indices that you can easily include in your traversals and graph queries).

You don't have to duplicate the nodes as long as you are only interested in different temporal structures. If your nodes are also different (e.g. changing properties, you could either duplicate them, and so effectively creating different subgraphs) or create a connected list of history nodes on each node that contain just the changes (or the full snapshot depending on your requirements).

Your domain sounds very fitting for the graph database. If you have more and detailed questions feel free to join the Neo4j mailing list.

Arnelle answered 11/5, 2011 at 11:43 Comment(1)
The mailing list link is deadDogmatist
L
5

Not the easiest solution (I'm assuming you only work with one machine), but if you really want to separate your graphs, you only need to remember that a graph is a directory.

You can then create a dynamic loader class which takes the path of the database you want, load it in memory for the query, and close it after you getting your answer. You could also configure a proxy server, and send 2 parameters to your loader: your query (which I presume is a cypher query in this case) and the path of the database you want to query.

This is not adequate if you have tons of real-time queries to answer. But if it is simply for storing and doing some analytics over data sets, it can definitly answer your needs.

Lippi answered 13/3, 2013 at 13:42 Comment(1)
can you throw some light on how to dynamically refer graph database path in cypher query. Thanks in advance...Clarisclarisa
M
0

This is an old question, but starting with Neo4j 4.x, multi-tenancy is supported and you can have different databases within the same Neo4j server (with distinct RBAC permissions).

Matriculation answered 18/1, 2021 at 10:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.