Cypher: how to find all the chains of single nodes not repeated?
Asked Answered
C

2

6

I want to find ALL the paths starting and ending from/to a specific node. I would like that each node in a path appears only once.

For example, in a graph like this:

(a)-[:REL]->(b)-[:REL]->(c)-[:REL]->(a)
(a)-[:REL]->(e)-[:REL]->(f)-[:REL]->(a)
(e)-[:REL]->(b)

grafically:

  e → b
 ↙ ↖ ↗ ↘
f → a ← c

Cypher code:

CREATE (a { name:'A' })-[:REL]->(b {name:'B'})-[:REL]->(c { name:'C' })
       -[:REL]->(a)-[:REL]->(e {name:'E'})-[:REL]->(f {name:'F'})-[:REL]->(a),
       (e)-[:REL]->(b)

I would like that the research of chains starting from (a) returns

(a)->(b)->(c)->(a)

(a)->(e)->(f)->(a)

(a)->(e)->(b)->(c)->(a)

while starting from (f) returns only

(f)->(a)->(e)->(f)

and NOT

(f)->(a)->(b)->(c)->(a)->(e)->(f)

because it passes twice through the node (a).

I tryied:

MATCH p=(a {name:'F'})-[:REL*1..]->(a) 
WHERE SINGLE(e1 IN TAIL(NODES(p)) WHERE SINGLE(e2 IN TAIL(NODES(p)) WHERE e1=e2))
RETURN p

but I've got no result. The best I've been able to reach is to have no repetition of just the start node with this query:

MATCH p=(a {name:'F'})-[:REL*1..]->(a) 
WHERE SINGLE(e IN TAIL(NODES(p)) WHERE e=a)
RETURN p

but obviously it's not what I want because it returns also

(f)->(a)->(b)->(c)->(a)->(e)->(f)

that is a path involving the node (a) twice.

Can someone suggest me a solution?

Thank you in advance.

P.S. The version of Neo4j that I use is 2.0

Contrapositive answered 2/10, 2014 at 13:31 Comment(2)
Have you seen this answer from ages ago: #13768248 it may still be the only way and you might need to combine it with @FrobberOfBits answer, do you have to use Cypher?Squeaky
Good advice, man. Below @JohnMark13 helped me giving me directly the code. Thank you.Contrapositive
S
2

On the shoulders of those who came before I have come up with:

MATCH (a { name:'A' })
MATCH (a)-[p:REL]->(s)
WITH a, s
MATCH p = s-[r:REL*]->a
WITH [a] + nodes(p) AS ns
    WHERE ALL (n IN ns 
       WHERE 1=LENGTH(FILTER(m IN TAIL(ns) 
                             WHERE m = n)))
RETURN ns

It utilises the idea from @FrobberOfBits to perform the initial steps and then the query linked from Michael to perform the filter on the paths. I suppose that the TAIL could be left out, if you also left out:

WITH [a] + nodes(p) AS ns    

Which would make the query this maybe:

MATCH (a { name:'A' })
MATCH (a)-[p:REL]->(s)
WITH a, s
MATCH p = s-[r:REL*]->a
WITH a, nodes(p) AS ns
WHERE ALL (n IN ns 
       WHERE 1=LENGTH(FILTER(m IN ns 
                             WHERE m = n)))
RETURN [a] + ns

The complicated or at least slightly hard to read part of the query is the ALL in combination with the FILTER which "basically" for all values in ns (WHERE ALL (n in ns) counts the number of occurences by comparing to all other values in the array (m IN ns WHERE m = n) and filters the result out completely if the resultant array does not contain a single element (1=LENGTH).

I added your example in the console here.

Squeaky answered 2/10, 2014 at 16:51 Comment(3)
Great job, it does exactly what I want. Thank you very much.Contrapositive
Don't forget to add Labels for your nodes to make it faster!Susurrus
Thanks @MichaelHunger surprised I was lazy and left that out, must have been rushing.Squeaky
D
2

So here is some data mocked up to copy your example:

$ create (a:T {label:"A"})-[:REL]->(b:T)-[:REL]->(c:T)-[:REL]->(a); 
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 3
Relationships created: 3
Properties set: 1
Labels added: 3
20 ms
neo4j-sh (?)$ match (a:T {label:"A"})                                          
> CREATE a-[:REL]->(e:T)-[:REL]->(f:T)-[:REL]->(a);
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2
Relationships created: 3
Labels added: 2
27 ms

Now here's the query that will get you there (but it requires some explanation).

match (a:T {label: "A"})-[:REL]->(somethingElse),
      p=shortestPath((somethingElse)-[:REL*]->(a))
return p;

OK so basically it seems to me like you want the shortest path from (a) back to itself. But that's tricky, because the shortest path from a node to itself is always the empty path, of length 0. So if you do something like shortestPath((a)-[:REL*]->(a)) the result will always be empty which isn't what you want.

So instead, I matched from one :REL hop to the next item (somethingElse) and then did the shortest path from that back to A. The resulting returned path p is exactly the path you want, except that it's missing the original hop from a-->somethingElse. As long as you take that extra hop into account, then the combination of a-->somethingElse with p should be the answer you're looking for.

Dispassionate answered 2/10, 2014 at 13:47 Comment(2)
The idea to use ShortestPath is good but omits all those paths longer than the shortest one where there are no repetitions. To give the idea I deepened my example above with an other relationship (e)-[:REL]->(b), in that case, starting from (a), I would like to retrieve also the path (a)->(e)->(b)->(c)->(a).Contrapositive
@Dispassionate For whatever reason, I would never have thought of this. I Am now a bit graph smarter.Squeaky
S
2

On the shoulders of those who came before I have come up with:

MATCH (a { name:'A' })
MATCH (a)-[p:REL]->(s)
WITH a, s
MATCH p = s-[r:REL*]->a
WITH [a] + nodes(p) AS ns
    WHERE ALL (n IN ns 
       WHERE 1=LENGTH(FILTER(m IN TAIL(ns) 
                             WHERE m = n)))
RETURN ns

It utilises the idea from @FrobberOfBits to perform the initial steps and then the query linked from Michael to perform the filter on the paths. I suppose that the TAIL could be left out, if you also left out:

WITH [a] + nodes(p) AS ns    

Which would make the query this maybe:

MATCH (a { name:'A' })
MATCH (a)-[p:REL]->(s)
WITH a, s
MATCH p = s-[r:REL*]->a
WITH a, nodes(p) AS ns
WHERE ALL (n IN ns 
       WHERE 1=LENGTH(FILTER(m IN ns 
                             WHERE m = n)))
RETURN [a] + ns

The complicated or at least slightly hard to read part of the query is the ALL in combination with the FILTER which "basically" for all values in ns (WHERE ALL (n in ns) counts the number of occurences by comparing to all other values in the array (m IN ns WHERE m = n) and filters the result out completely if the resultant array does not contain a single element (1=LENGTH).

I added your example in the console here.

Squeaky answered 2/10, 2014 at 16:51 Comment(3)
Great job, it does exactly what I want. Thank you very much.Contrapositive
Don't forget to add Labels for your nodes to make it faster!Susurrus
Thanks @MichaelHunger surprised I was lazy and left that out, must have been rushing.Squeaky

© 2022 - 2024 — McMap. All rights reserved.