Best way to delete all nodes and relationships in Cypher
Asked Answered
C

7

19

What is the best way to cleanup the graph from all nodes and relationships via Cypher?

At http://neo4j.com/docs/stable/query-delete.html#delete-delete-a-node-and-connected-relationships the example

MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r

has the note:

This query isn’t for deleting large amounts of data

So, is the following better?

MATCH ()-[r]-() DELETE r

and

MATCH (n) DELETE n

Or is there another way that is better for large graphs?

Chopfallen answered 18/4, 2015 at 1:12 Comment(1)
at #14691022 they suggest removing the whole database directory, but I'm interested in the case of a remote GUI that needs to provide the user with an action to clear the graph (reset state to default)Chopfallen
M
25

As you've mentioned the most easy way is to stop Neo4j, drop the data/graph.db folder and restart it.

Deleting a large graph via Cypher will be always slower but still doable if you use a proper transaction size to prevent memory issues (remember transaction are built up in memory first before they get committed). Typically 50-100k atomic operations is a good idea. You can add a limit to your deletion statement to control tx sizes and report back how many nodes have been deleted. Rerun this statement until a value of 0 is returned back:

MATCH (n)
OPTIONAL MATCH (n)-[r]-()
WITH n,r LIMIT 50000
DELETE n,r
RETURN count(n) as deletedNodesCount
Mores answered 18/4, 2015 at 10:12 Comment(9)
thanks, added your comment to zoomicon.wordpress.com/2015/04/18/…Chopfallen
How do you "drop" the data/graph.db folder and restart it?Locklin
This issue was raised on Zoomicon's website: "In your last query you create a huge cross product. All nodes times all relationships. Probably cleaner then to split it into two, delete rels first then nodes"Locklin
dropping the folder: rm -rf data/graph.dbMores
@jsc123 - that issue was referring to my original though of doing MATCH (n), ()-[r]-() DELETE n,r which is not efficient because it does a cross product and the query optimizer isn't clever enough it seems to optimize it. In my opinion it could check that n and r are unrelated in how they're used after the MATCH and thus split internally into two separate queries that are executed together in one step, but maybe I ask too muchChopfallen
regarding dropping the data/graph.db subfolder (it's a subfolder with an extension in its folder name, not a file), obviously neo4j recreates it. However I guess you will need to stop the neo4j server first and restart it. On Windows use RMDIR /S/Q path to remove a directory tree (/S) without prompt for confirmation (/Q)Chopfallen
Now that DETACH DELETE is a thing, what are the thoughts on changing this answer?Transoceanic
Does anyone know how to delete nodes and relationships when a java driver is used to create them?Calabro
I ran this to delete relationships: MATCH ()-[r]-() WITH r LIMIT 50000 DELETE r RETURN count(r) as deletedRelationCount However, while the deletedRelationCount is always 50000, it always say that the number of relationships deleted are less than 50000. What is the reason?Calabro
G
14

According to the official document here:

MATCH (n)
DETACH DELETE n

but it also said This query isn’t for deleting large amounts of data. so it's better use with limit.

match (n)  
with n limit 10000  
DETACH DELETE n;  
Grandam answered 24/7, 2016 at 3:5 Comment(2)
Since you mentioned DETACH, it writes at the document you pointed to: "you can not delete a node without also deleting relationships that start or end on said node. Either explicitly delete the relationships, or use DETACH DELETE"Chopfallen
btw, if you use LIMIT (for performance reasons), guess you also need to use RETURN count(n) as deletedNodesCount and run it repeatedly till it returns 0, as suggested at other reply aboveChopfallen
T
4

Wrote this little script, added it in my NEO/bin folder.

Tested on v3.0.6 community

#!/bin/sh
echo Stopping neo4j
./neo4j stop
echo Erasing ALL data
rm -rf ../data/databases/graph.db
./neo4j start
echo Done

I use it when my LOAD CSV imports are crappy.

Hope it helps

Torn answered 13/10, 2016 at 9:55 Comment(0)
E
2

What is the best way to clean up the graph from all nodes and relationships via Cypher?

I've outlined four options below that are current as of July 2022:

  • Option 1: MATCH (x) DETACH DELETE x
  • Option 2: CALL {} IN TRANSACTIONS
  • Option 3: delete data directories
  • Option 4: delete in code

Option 1: MATCH (x) DETACH DELETE x - works only with small data sets

As you posted in your question, the following works fine, but only if there aren't too many nodes and relationships:

MATCH (x) DETACH DELETE x

If the number of nodes and/or relationships is high enough, this won't work. Here's what "not working" looks like against http://localhost:7474/browser/:

There is not enough memory to perform the current task. Please try increasing 'dbms.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.

And here's what shows up in neo4j console output (or in logs, if you have that enabled):

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.Scheduler-1"

Option 2: CALL {} IN TRANSACTIONS - does not work as of July 2022

An alternative, available since 4.4 according to neo4j docs, is to use a new CALL {} IN TRANSACTIONS feature:

With 4.4 and newer versions you can utilize the CALL {} IN TRANSACTIONS syntax [...] to delete subsets of the matched records in batches until the full delete is complete

Unfortunately, this doesn't work in my tests. Here's an example attempting to delete relationships only:

MATCH ()-[r]-()
CALL { WITH r DELETE r }
IN TRANSACTIONS OF 1000 ROWS

Running that in browser results in this error:

A query with 'CALL { ... } IN TRANSACTIONS' can only be executed in an implicit transaction, but tried to execute in an explicit transaction.

In code, it produces the same result. Here's an attempt connecting via bolt in Java:

session.executeWrite(tx -> tx.run("MATCH (x) " +
        "CALL { WITH x DETACH DELETE x } " +
        "IN TRANSACTIONS OF 10000 ROWS"));

which results in this error, identical to what the browser showed:

org.neo4j.driver.exceptions.DatabaseException: A query with 'CALL { ... } IN TRANSACTIONS' can only be executed in an implicit transaction, but tried to execute in an explicit transaction.
    at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:111)
    at org.neo4j.driver.internal.InternalTransaction.run(InternalTransaction.java:58)
    at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:34)

Looking at the documentation for Transactions, it states: "Transactions can be either explicit or implicit." What's the difference? From that same doc:

Explicit transactions:

  • Are opened by the user.
  • Can execute multiple Cypher queries in sequence.
  • Are committed, or rolled back, by the user.

Implicit transactions, sometimes called auto-commit transactions or :auto transactions:

  • Are opened automatically.
  • Can execute a single Cypher query.
  • Are committed automatically when the query finishes successfully.

I can't determine from docs or experimentation how to open an implicit transaction (and thus, to be able to use 'CALL { ... } IN TRANSACTIONS' structure), so this is apparently a dead end.

In a recent Neo4j AuraDB Office Hours posted May 31, 2022, they tried using this same feature in AuraDB. It didn't work for them either, though the behavior was different from what I've observed in Neo4j Community. I'm guessing they'll address this at some point, feels like a bug, but at least for now it's another confirmation that 'CALL { ... } IN TRANSACTIONS' is not the way forward.

Option 3: delete data directories - works with any size data set

This is the easiest, most straightforward mechanism that actually works:

  • stop the server
  • manually delete data directories
  • restart the server

Here's what that looks like:

% ./bin/neo4j stop
% rm -rf data/databases data/transactions
% ./bin/neo4j start

This is pretty simple. You could write a script to capture this as a single command.

Option 4: delete in code - works with any size data set

Below is a minimal Java program that handles deletion of all nodes and relationships, regardless of how many. The manual-delete option works fine, but I needed a way to delete all nodes and relationships in code. This works in Neo4j Community 4.4.3, and since I'm using only basic functionality (no extensions), I assume this would work across a range of other Neo4j versions, and probably AuraDB, too.

import org.neo4j.driver.AuthTokens;
import org.neo4j.driver.GraphDatabase;
import org.neo4j.driver.Session;

public static void main(String[] args) throws InterruptedException {
    String boltUri = "...";
    String user = "...";
    String password = "...";
    Session session = GraphDatabase.driver(boltUri, AuthTokens.basic(user, password)).session();

    int count = 1;
    while (count > 0) {
        session.executeWrite(tx -> tx.run("MATCH (x) WITH x LIMIT 1000 DETACH DELETE x"));
        count = session.executeWrite(tx -> tx.run("MATCH (x) RETURN COUNT(x)").single().values().get(0).asInt());
    }
}
Etra answered 2/7, 2022 at 0:29 Comment(1)
Confirmed option 2 still not working in January 2023. Suggestion to prepend with auto: also not working: community.neo4j.com/t5/neo4j-graph-platform/…Protean
L
0
optional match (n)-[p:owner_real_estate_relation]->() with n,p LIMIT 1000 delete p

In test run, deleted 50000 relationships, completed after 589 ms.

Lorica answered 8/9, 2017 at 17:5 Comment(3)
Code-only answers are discouraged on Stack Overflow as they are not particularly helpful. Please update your answer to explain how this solves the question, and why it may be a better option than the accepted and up-voted answer.Evenfall
actually the time for completion info isn't very helpful if it is not comparing with time it took other alternatives to complete on the same machine (with initial database being the same every time)Chopfallen
also :owner_real_estate_relation doesn't go with the "all" relationships in the questionChopfallen
R
0

I performed several tests and the best combination was

`call apoc.periodic.iterate("MATCH p=()-[r]->() RETURN r,p LIMIT 5000000;","DELETE r;", {batchSize:10000, parallel: true}`)

(this code deleted 300,000,000 relationships in 3251s)

It is worth noting that using the "parallel" parameter drastically reduces the time.

This for Neo4j 4.4.1 AWS EC2: m5.xlarge

neo4j:
  resources:
    memory: 29000Mi
  configs:
    dbms.memory.heap.initial_size: "20G"
    dbms.memory.heap.max_size: "20G"
    dbms.memory.pagecache.size: "5G"
Roundup answered 10/11, 2022 at 5:21 Comment(2)
Did you mean to say "taking the parallel off"? Guess the above settings though may also depend on the specific machine's specsChopfallen
By "taking the parallel" I meant that removing the "parallel" parameter from the "apoc" function causes it to lose a lot of performanceRoundup
D
0

Just adding an update as I needed to do this today and didn't want to delete the database:

:auto MATCH (n) CALL { WITH n DETACH DELETE n } IN TRANSACTIONS OF 50000 ROWS
Depend answered 6/11, 2023 at 14:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.