Cypher to return total node count as well as a limited set
Asked Answered
S

3

10

Is it possible to extract in a single cypher query a limited set of nodes and the total number of nodes?

match (n:Molecule) with n, count(*) as nb limit 10 return {N: nb, nodes: collect(n)}

The above query properly returns the nodes, but returns 1 as number of nodes. I certainly understand why it returns 1, since there is no grouping, but can't figure out how to correct it.

Supermundane answered 6/1, 2015 at 18:55 Comment(1)
Is it possible to combine this with the where statement? Basically, I don't want a total node count. I want the count of nodes that satisfies "some" conditions.Nagana
F
9

The following query returns the counter for the entire number of rows (which I guess is what was needed). Then it matches again and limits your search, but the original counter is still available since it is carried through via the WITH-statement.

MATCH 
    (n:Molecule)
WITH 
    count(*) AS cnt
MATCH 
    (n:Molecule)
WITH 
    n, cnt LIMIT 10
RETURN 
    { N: cnt, nodes:collect(n) } AS molecules
Flashing answered 7/1, 2015 at 7:37 Comment(2)
What if the MATCH is actually quite involved? Does duplicating it incur a performance penalty, or is Neo4j smart enough to reuse the result of the first match?Handcuff
Just tested this with a really heavy match. No, Neo4j is not smart enough. Without the count part it took 44-45 seconds, with count is 89-90 seconds. It's like running the query twice so there is no need to combine the queries really... @SzczepanHołyszewskiKarolinekaroly
S
3

Here is an alternate solution:

match (n:Molecule) return {nodes: collect(n)[0..5], n: length(collect(n))}

84 ms for 30k nodes, shorter but not as efficient as the above one proposed by wassgren.

Supermundane answered 26/3, 2017 at 19:49 Comment(5)
It probably much more memory than the previous solution. We’re talking here about millions of Molecule nodes which contains the molecule depiction (SMILES string, INCHI key, base64 conversion of the molecule PNG.Supermundane
Why your solution needs much more memory? I couldn't understand. We just return a small subset of results. Is it because of the collect statement?Nagana
I meant the solution using collect, I suspect it loads collected nodes in memory, if so the collection would contain all nodes labeled Molecule... millions.Supermundane
Thank you. Maybe Neo4j makes optimizations for indexing statements before collect. Also, I think your answer deserves to be the answer. The previous one just makes complexity a lot worse. They simply execute 2 queries serially. Instead of that, they could execute 2 queries parallelly.Nagana
I tried to use this query with a relatively large set of nodes (2.9 million nodes). I was using a query like OPTIONAL MATCH (x:Card) WHERE ( (TRUE) ) RETURN ID(x), x SKIP 0 LIMIT 10 to get 10 cards among those 2.9 million cards. It is executing in like 1 or 2 ms. When I tried OPTIONAL MATCH (x:Card) WHERE ( (TRUE) ) RETURN collect(ID(x))[0..10], collect(x)[0..10], length(collect(x)) it first executed in 4500 ms. In second run, it executed in 2700 ms. It looks not soo bad but definitely slower.Nagana
D
0

This solution splits the work up into two parts: first get the complete list of rows, do a WITH that counts them and collects them into a list, and then do a subquery with collect() that does the paging. If you need to bring in additional information about the rows in the result, put that in between "WITH item" and "RETURN collect(item)" so that it is only done for the items on the page you are returning.

MATCH (n:MyLabel)
WITH collect(n) as collection,count(n) as cnt 
CALL{
    WITH collection
    UNWIND collection as item
    WITH item ORDER BY item.id DESC SKIP 10 LIMIT 5
    RETURN collect(item) as items
}
RETURN cnt, items
Dostie answered 8/11, 2023 at 4:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.