What is the best way to clean up the graph from all nodes and relationships via Cypher?
I've outlined four options below that are current as of July 2022:
- Option 1: MATCH (x) DETACH DELETE x
- Option 2: CALL {} IN TRANSACTIONS
- Option 3: delete data directories
- Option 4: delete in code
Option 1: MATCH (x) DETACH DELETE x - works only with small data sets
As you posted in your question, the following works fine, but only if there aren't too many nodes and relationships:
MATCH (x) DETACH DELETE x
If the number of nodes and/or relationships is high enough, this won't work. Here's what "not working" looks like against http://localhost:7474/browser/
:
There is not enough memory to perform the current task. Please try increasing 'dbms.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.
And here's what
shows up in neo4j
console output (or in logs, if you have that enabled):
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.Scheduler-1"
Option 2: CALL {} IN TRANSACTIONS - does not work as of July 2022
An alternative, available since 4.4 according to neo4j docs, is to use a new CALL {} IN TRANSACTIONS
feature:
With 4.4 and newer versions you can utilize the CALL {} IN TRANSACTIONS
syntax [...] to delete subsets of the matched records in batches until the full delete is complete
Unfortunately, this doesn't work in my tests. Here's an example attempting to delete relationships only:
MATCH ()-[r]-()
CALL { WITH r DELETE r }
IN TRANSACTIONS OF 1000 ROWS
Running that in browser results in this error:
A query with 'CALL { ... } IN TRANSACTIONS' can only be executed in an implicit transaction, but tried to execute in an explicit transaction.
In code, it produces the same result. Here's an attempt connecting via bolt in Java:
session.executeWrite(tx -> tx.run("MATCH (x) " +
"CALL { WITH x DETACH DELETE x } " +
"IN TRANSACTIONS OF 10000 ROWS"));
which results in this error, identical to what the browser showed:
org.neo4j.driver.exceptions.DatabaseException: A query with 'CALL { ... } IN TRANSACTIONS' can only be executed in an implicit transaction, but tried to execute in an explicit transaction.
at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:111)
at org.neo4j.driver.internal.InternalTransaction.run(InternalTransaction.java:58)
at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:34)
Looking at the documentation for Transactions, it states: "Transactions can be either explicit or implicit." What's the difference? From that same doc:
Explicit transactions:
- Are opened by the user.
- Can execute multiple Cypher queries in sequence.
- Are committed, or rolled back, by the user.
Implicit transactions, sometimes called auto-commit transactions or :auto
transactions:
- Are opened automatically.
- Can execute a single Cypher query.
- Are committed automatically when the query finishes successfully.
I can't determine from docs or experimentation how to open an implicit transaction (and thus, to be able to use 'CALL { ... } IN TRANSACTIONS'
structure), so this is apparently a dead end.
In a recent Neo4j AuraDB Office Hours posted May 31, 2022, they tried using this same feature in AuraDB. It didn't work for them either, though the behavior was different from what I've observed in Neo4j Community. I'm guessing they'll address this at some point, feels like a bug, but at least for now it's another confirmation that
'CALL { ... } IN TRANSACTIONS'
is not the way forward.
Option 3: delete data directories - works with any size data set
This is the easiest, most straightforward mechanism that actually works:
- stop the server
- manually delete data directories
- restart the server
Here's what that looks like:
% ./bin/neo4j stop
% rm -rf data/databases data/transactions
% ./bin/neo4j start
This is pretty simple. You could write a script to capture this as a single command.
Option 4: delete in code - works with any size data set
Below is a minimal Java program that handles deletion of all nodes and relationships, regardless of how many.
The manual-delete option works fine, but I needed a way to delete all nodes and relationships in code.
This works in Neo4j Community 4.4.3, and since I'm using only basic functionality (no extensions), I assume this would work across a range of other Neo4j versions, and probably AuraDB, too.
import org.neo4j.driver.AuthTokens;
import org.neo4j.driver.GraphDatabase;
import org.neo4j.driver.Session;
public static void main(String[] args) throws InterruptedException {
String boltUri = "...";
String user = "...";
String password = "...";
Session session = GraphDatabase.driver(boltUri, AuthTokens.basic(user, password)).session();
int count = 1;
while (count > 0) {
session.executeWrite(tx -> tx.run("MATCH (x) WITH x LIMIT 1000 DETACH DELETE x"));
count = session.executeWrite(tx -> tx.run("MATCH (x) RETURN COUNT(x)").single().values().get(0).asInt());
}
}