I am just getting started with Neo4j and py2neo. I am experimenting with the batch feature available in py2neo for bulk data loading of a neo4j database.
At a basic level, I want to create two nodes (or get them if they already exist) and create a relationship between them with a default weight (or increment the weight if the relationship already exists) using WriteBatch in py2neo.
The documentation explains only how to create two new nodes and form a relationship between them. I am looking into something along the lines of:
from py2neo import neo4j, cypher
graphdb = neo4j.GraphDatabaseService()
topic_index = graphdb.get_or_create_index(neo4j.Node, "node_index")
batch = neo4j.WriteBatch(graphdb)
batch.get_or_create_indexed_node('node_index', 'name', 'Alice', {'name': 'Alice'})
batch.get_or_create_indexed_node('node_index', 'name', 'Bob', {'name': 'Bob'})
batch.get_or_create_indexed_relationship('rel_index', 'type', 'KNOWS', 0, 'KNOWS', 1, {})
results = batch.submit()
However, this fails with the error:
SystemError: {u'stacktrace': [u'org.neo4j.server.rest.batch.NonStreamingBatchOperations.invoke(NonStreamingBatchOperations.java:63)', u'org.neo4j.server.rest.batch.BatchOperations.performRequest(BatchOperations.java:178)', u'org.neo4j.server.rest.batch.BatchOperations.parseAndPerform(BatchOperations.java:149)', u'org.neo4j.server.rest.batch.NonStreamingBatchOperations.performBatchJobs(NonStreamingBatchOperations.java:48)', u'org.neo4j.server.rest.web.BatchOperationService.batchProcess(BatchOperationService.java:117)', u'org.neo4j.server.rest.web.BatchOperationService.performBatchOperations(BatchOperationService.java:71)', u'java.lang.reflect.Method.invoke(Method.java:616)'], u'message': u'{\n "message" : "For input string: \"{0}\"",\n "exception" : "BadInputException",\n "stacktrace" : [ "org.neo4j.server.rest.web.RestfulGraphDatabase.extractNodeId(RestfulGraphDatabase.java:162)", "org.neo4j.server.rest.web.RestfulGraphDatabase.extractNodeIdOrNull(RestfulGraphDatabase.java:151)", "org.neo4j.server.rest.web.RestfulGraphDatabase.addToRelationshipIndex(RestfulGraphDatabase.java:813)", "java.lang.reflect.Method.invoke(Method.java:616)", "org.neo4j.server.web.Jetty6WebServer.invokeDirectly(Jetty6WebServer.java:273)", "org.neo4j.server.rest.batch.NonStreamingBatchOperations.invoke(NonStreamingBatchOperations.java:55)", "org.neo4j.server.rest.batch.BatchOperations.performRequest(BatchOperations.java:178)", "org.neo4j.server.rest.batch.BatchOperations.parseAndPerform(BatchOperations.java:149)", "org.neo4j.server.rest.batch.NonStreamingBatchOperations.performBatchJobs(NonStreamingBatchOperations.java:48)", "org.neo4j.server.rest.web.BatchOperationService.batchProcess(BatchOperationService.java:117)", "org.neo4j.server.rest.web.BatchOperationService.performBatchOperations(BatchOperationService.java:71)", "java.lang.reflect.Method.invoke(Method.java:616)" ]\n}', u'exception': u'BatchOperationFailedException'}
Based on the 'Bad Input Exception', I am pretty sure that is a problem with the start_node and end_node arguments. Basically, I want to refer to the previous get_or_create nodes and relationships. How do you refer to these in the batch?
UPDATE: after much experimentation, I have narrowed in on a possible method to reproduce this error - if either of the nodes in the get_or_create already exist in the graph, the batch operation fails with a bad input error on the index corresponding to the existing node. I have also updated the code to show exactly what I ran. Running this code for the first time succeeds (with both nodes NOT present in the graph). Running it again fails.
py2neo version: 1.5
neo4j version: 1.8.2