Speeding Up creation of edges between Node in Neo4j using Py2Neo

Asked 17/7, 2015 at 12:25 Answered 10/1, 2019 at 19:5

I am trying to create a huge database in neo4j which will have around 2 million nodes and around 4 million edges. I have been able to speed up the node creation process by creating the node in the batches of 1000 nodes each. However, when I try to create edges between these nodes, the process slows down and then it times out. Initially I though it might be slow as I was merging on the basis of node name, but its slower even if I use ids - I have manually create these ids. Below I have given snipped to data and code, for better understanding of the problem -

Node.csv - this file contains details about the node

NodeName NodeType NodeId Sachin Person 1 UNO Organisation 2 Obama Person 3 Cricket Sports 4 Tennis Sports 5 USA Place 6 India Place 7
Edges.csv - this file just contains the node ids and their relationship

Node1Id Relationship Node2Id 1 Plays 4 3 PresidentOf 6 1 CitizenOf 7

Code to create Node is given below -

from py2neo import Graph
graph = Graph()
statement =""
tx = graph.cypher.begin()
for i in range(len(Node)):
    statement = "Create(n{name:{A} ,label:{C}, id:{B}})"
    tx.append(statement,{"A": Node[i][0],"C":str(Node[i][1]), "B":str(Node[i][2])})
    if i % 1000 == 0:
        print str(i) + "Node Created"
        tx.commit()
        tx = self.graph.cypher.begin()
        statement =""

Above code works like wonder and finished the creation of 2 million nodes in 5 minutes. Code to create edges is given below -

tx = graph.cypher.begin()
statement = ""
for i in range(len(Edges)):
    statement = "MATCH (a {id:{A}}), (b {id:{B}}) CREATE (a)-[:"+ Edges[i][1]+"]->(b)"
    tx.append(statement, {"A": Edges[i][0], "B": Edges[i][2]})
        if i % 1000 == 0:
            print str(i) + " Relationship Created"
            tx.commit()
            tx = graph.cypher.begin()
            statement = ""

Above, code works well for creating first 1000 relationship but after that it takes lot of time and connection gets timed out.

I am in immediate need to fix this and any help which can fasten up the process of relationship creation would be really helpful.

Please Note - I am not using import csv of Neo4j or Neo4j shell import because these assume relationship between Nodes to be fixed. Whereas for me relationship vary and its not feasible to import for one relationship at a time because it would mean importing almost 2000 times manually.

Marianomaribel answered 17/7, 2015 at 12:25 Comment(2)

Do you have an index (preferably a uniqueness constraint) on the id property? – Garget 17/7, 2015 at 19:40

Michael suggested the same in below answer and it works like a magic. Thanks! – Marianomaribel 18/7, 2015 at 11:51

You forgot to use a label for your nodes and then create constraint on the label + id.

create constraint on (o:Organization) assert o.id is unique;
create constraint on (p:Person) assert p.id is unique;

Create(n:Person {name:{A} ,id:{B}})
Create(n:Organization {name:{A} ,id:{B}})

match (p:Person {id:{p_Iid}), (o:Organization {id:{o_id}})
create (p)-[:WORKS_FOR]->(o);

Claytonclaytonia answered 17/7, 2015 at 22:40 Comment(1)

Thanks a lot! it worked like wonder. Entire database creation just took 10 minutes – Marianomaribel 18/7, 2015 at 11:50

Here is an updated version of the code since a lot of stuff in py2neo (v3+) transactions got deprecated. The code also includes Michaels solution.

Nodes:

def insert_nodes_to_neodb():
    queries_per_transaction = 1000  # 1000 seems to work faster
    node_path = "path_to_csv"

    graph = Graph("bolt://localhost:7687", user="neo4j", password="pswd")
    trans_action = graph.begin()

    with open(node_path) as csvfile:
        next(csvfile)   # labels of the columns
        node_csv = csv.reader(csvfile, delimiter=',')
        for idx, row in enumerate(node_csv):
            statement = "CREATE (n:Entity {id:{A} , label:{B}, name:{B}})"  # name is for visual reasons (neo4j)
            trans_action.run(statement, {"A": row[0], "B": row[1]})

            if idx % queries_per_transaction == 0:
                trans_action.commit()
                trans_action = graph.begin()

        # Commit the left over queries
        trans_action.commit()

        # We need to create indexes on a separate transaction
        # neo4j.exceptions.Forbidden: Cannot perform schema updates in a transaction that has performed data updates.
        trans_action = graph.begin()
        trans_action.run("CREATE CONSTRAINT ON (o:Entity) ASSERT o.id IS UNIQUE;")
        trans_action.commit()

Edges:

def insert_edges_to_neodb(neo_graph):
    queries_per_transaction = 1000  # 1000 seems to work faster
    edge_path = "path_to_csv"
    graph = Graph("bolt://localhost:7687", user="neo4j", password="pswd")

    trans_action = graph.begin()

    with open(edge_path) as csvfile:
        next(csvfile)   # labels of the columns
        edge_csv = csv.reader(csvfile, delimiter=',')
        for idx, row in enumerate(edge_csv):
            statement = """MATCH (a:Entity),(b:Entity)
                        WHERE a.id = {A} AND b.id = {B}
                        CREATE (a)-[r:CO_APPEARS { name: a.name + '<->' + b.name, weight: {C} }]->(b)
                        RETURN type(r), r.name"""
            trans_action.run(statement, {"A": row[0], "B": row[1], "C": row[2]})

            if idx % queries_per_transaction == 0:
                trans_action.commit()
                trans_action = graph.begin()

        trans_action.commit()

Tandratandy answered 10/1, 2019 at 19:5 Comment(0)

Recommended topics

Hot tags