Creating nodes and relationships at the same time in neo4j

Asked 28/4, 2014 at 9:2 Answered 15/7, 2022 at 23:15

I am trying to build an database in Neo4j with a structure that contains seven different types of nodes, in total around 4-5000 nodes and between them around 40000 relationships. The cypher code i am currently using is that i first create the nodes with the code:

Create (node1:type {name:'example1', type:'example2'})

Around 4000 of that example with unique nodes.

Then I've got relationships stated as such:

Create
(node1)-[:r]-(node51),
(node2)-[:r]-(node5),
(node3)-[:r]-(node2);

Around 40000 of such unique relationships.

With smaller scale graphs this has not been any problem at all. But with this one, the Executing query never stops loading.

Any suggestions on how I can make this type of query work? Or what i should do instead?

edit. What I'm trying to build is a big graph over a product, with it's releases, release versions, features etc. in the same way as the Movie graph example is built.

The product has about 6 releases in total, each release has around 20 releaseversion. In total there is 371 features and of there 371 features there is also 438 featureversions. ever releaseversion (120 in total) then has around 2-300 featureversions each. These Featureversions are mapped to its Feature whom has dependencies towards a little bit of everything in the db. I have also involed HW dependencies such as the possible hw to run these Features on, releases on etc. so basicaly im using cypher code such as:

Create (Product1:Product {name:'ABC', type:'Product'})
Create (Release1:Release {name:'12A', type:'Release'})
Create (Release2:Release {name:'13A, type:'release'})
Create (ReleaseVersion1:ReleaseVersion {name:'12.0.1, type:'ReleaseVersion'})
Create (ReleaseVersion2:ReleaseVersion {name:'12.0.2, type:'ReleaseVersion'})

and below those i've structured them up using

Create (Product1)<-[:Is_Version_Of]-(Release1),
(Product1)<-[:Is_Version_Of]-(Release2),
(Release2)<-[:Is_Version_Of]-(ReleaseVersion21),

All the way down to features, and then I've also added dependencies between them such as:

(Feature1)-[:Requires]->(Feature239),
(Feature239)-[:Requires]->(Feature51);

Since i had to find all this information from many different excel-sheets etc, i made the code this way thinking i could just put it together in one mass cypher query and run it on the /browser on the localhost. it worked really good as long as i did not use more than 4-5000 queries at a time. Then it created the entire database in about 5-10 seconds at maximum, but now when I'm trying to run around 45000 queries at the same time it has been running for almost 24 hours, and are still loading and saying "executing query...". I wonder if there is anyway i can improve the time it takes, will the database eventually be created? or can i do some smarter indexes or other things to improve the performance? because by the way my cypher is written now i cannot divide it into pieces since everything in the database has some sort of connection to the product. Do i need to rewrite the code or is there any smooth way around?

Recognizee answered 28/4, 2014 at 9:2 Comment(1)

Also give a look at the MERGE clause neo4j.com/docs/developer-manual/current/cypher/clauses/merge – Collincolline 16/5, 2018 at 1:12

You can create multiple nodes and relationships interlinked with a single create statement, like this:

create (a { name: "foo" })-[:HELLO]->(b {name : "bar"}),
       (c {name: "Baz"})-[:GOODBYE]->(d {name:"Quux"});

So that's one approach, rather than creating each node individually with a single statement, then each relationship with a single statement.

You can also create multiple relationships from objects by matching first, then creating:

match (a {name: "foo"}), (d {name:"Quux"}) create (a)-[:BLAH]->(d);

Of course you could have multiple match clauses, and multiple create clauses there.

You might try to match a given type of node, and then create all necessary relationships from that type of node. You have enough relationships that this is going to take many queries. Make sure you've indexed the property you're using to match the nodes. As your DB gets big, that's going to be important to permit fast lookup of things you're trying to create new relationships off of.

You haven't specified which query you're running that isn't "stopping loading". Update your question with specifics, and let us know what you've tried, and maybe it's possible to help.

Stocky answered 28/4, 2014 at 15:24 Comment(2)

Don't we need parentheses here? (a)-[:BLAH]->(b) – Oatis 22/12, 2017 at 21:26

Yes. Answer was written in 2014 when you didn't need them. – Stocky 23/12, 2017 at 21:49

If you have one of the nodes already created then a simple approach would be:

MATCH (n: user {uid: "1"}) CREATE (n) -[r: posted]-> (p: post {pid: "42", title: "Good Night", msg: "Have a nice and peaceful sleep.", author: n.uid});

Here the user node already exists and you have created a new relation and a new post node.

Retake answered 27/2, 2019 at 2:36 Comment(1)

Hi @Ashwin, how do we provide label in above query? and if I want to make a field value of a post node as default label how do we do that? – Inwrought 26/11, 2019 at 19:47

Another interesting approach might be to generate your statements directly in Excel, see http://blog.bruggen.com/2013/05/reloading-my-beergraph-using-in-graph.html?view=sidebar for an example. You can run a lot of CREATE statements in one transaction, so this should not be overly complicated.

Mathre answered 29/4, 2014 at 8:53 Comment(0)

If you're able to use the Neo4j 2.1 prerelease milestones, then you should try using the new LOAD CSV and PERIODIC COMMIT features. They are designed for just this kind of use case.

LOAD CSV allows you to describe the structure of your data with one or more Cypher patterns, while providing the values in CSV to avoid duplication.

PERIODIC COMMIT can help make large imports more reliable and also improve performance by reducing the amount of memory that is needed.

Hanghangar answered 29/4, 2014 at 9:32 Comment(2)

That looks like a very interesting thing to do, my only question is if i can use relations between for example features and features and have different kinds of relations among them, because in this database there is same types of nodes having perhaps ten different relations to that same type of node. Hence i cannot for example use: CREATE (p)-[:PLAYED { role: csvLine.role }]->(m) Or will i be able to run a bit of my database at time? because that is probibly my current issue, i cannot divide it to smaller pieces. – Recognizee 30/4, 2014 at 6:41

Relationship types, labels and property names have to be literally specifies in the queries - the relationship types can't come from a CSV file. Instead, you would typically use one CSV file and one query per type. – Procambium 30/4, 2014 at 12:31

It is possible to use a single cypher query to create a new node as well as relate it to an existing now.

As an example, assume you're starting with:

an existing "One" node which has an "id" property "1"

And your goal is to:

create a second node, let's call that "Two", and it should have a property id:"2"
relate the two nodes together

You could achieve that goal using a single Cypher query like this:

MATCH (one:One {id:'1'})
CREATE (one) -[:RELATED_TO]-> (two:Two {id:'2'})

Erine answered 15/7, 2022 at 23:15 Comment(3)

Seems not working unless I change it to MATCH (one:One) WHERE id(one)=1 ..., but thanks for the answer. – Extinction 17/10, 2022 at 18:14

What version of Neo4j are you using? Perhaps they've changed this behavior over time. I believe I used 4.4.5 in my posted answer. – Erine 17/10, 2022 at 18:59

Neo4j Desktop 1.5.0, DBMS 4.4.11. Oh, perhaps I left out the quotes for '1'? – Extinction 17/10, 2022 at 19:37

Recommended topics

Hot tags