Neo4j Cypher: Merge duplicate nodes
Asked Answered
B

2

6

I have some duplicate nodes, all with the label Tag. What I mean with duplicates is that I have two nodes with the same name property, example:

{ name: writing, _id: 57ec2289a90f9a2deece7e6d},
{ name: writing, _id: 57db1da737f2564f1d5fc5a1},
{ name: writing }

The _id field is no longer used so in all effects these three nodes are the same, only that each of them have different relationships.

What I would like to do is:

  1. Find all duplicate nodes (check)

    MATCH (n:Tag)
    WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
    WHERE count > 1
    RETURN name, nodelist, count
    
  2. Copy all relationships from the duplicate nodes into the first one

  3. Delete all the duplicate nodes

Can this be achieved with cypher query? Or do I have to make a script in some programming language? (this is what I'm trying to avoid)

Blistery answered 15/3, 2017 at 2:37 Comment(0)
L
17

APOC Procedures has some graph refactoring procedures that can help. I think apoc.refactor.mergeNodes() ought to do the trick.

Be aware that in addition to transferring all relationships from the other nodes onto the first node of the list, it will also apply any labels and properties from the other nodes onto the first node. If that's not something you want to do, then you may have to collect incoming and outgoing relationships from the other nodes and use apoc.refactor.to() and apoc.refactor.from() instead.

Here's the query for merging nodes:

MATCH (n:Tag)
WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
WHERE count > 1
CALL apoc.refactor.mergeNodes(nodelist) YIELD node
RETURN node
Licentiate answered 15/3, 2017 at 2:55 Comment(3)
Wow I didn't about this apoc plugin. Very useful, thanks!Blistery
Is there a way to avoid the RETURN statement at the end of this query? E.g if I only want to merge the nodes in the db without returning anythingRakes
Currently queries must end either with a RETURN, or with a write clause, such as SET, REMOVE, CREATE, DELETE, and MERGE. You can always return count(*) if you just want to keep the returned rows down to 1.Licentiate
H
1

The above cypher query didn't work on my Database version 3.4.16

What worked for me was:

MATCH (n:Tag)
WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
WHERE count > 1
CALL apoc.refactor.mergeNodes(nodelist,{
  properties:"combine",
  mergeRels:true
})
YIELD node
RETURN node; 
Heterogony answered 5/10, 2021 at 7:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.