I can't find how to return a node labels with Cypher.
Anybody knows the syntax for this operation?
I can't find how to return a node labels with Cypher.
Anybody knows the syntax for this operation?
There is a function labels(node) that can return all labels for a node.
To get all distinct node labels:
MATCH (n) RETURN distinct labels(n)
To get the node count for each label:
MATCH (n) RETURN distinct labels(n), count(*)
MATCH
for newer Neo4j databases because START
is for legacy indexes. –
Orphanage Neo.ClientError.Statement.SyntaxError Parentheses are required to identify nodes in patterns, i.e. (n) ...
(v. 3.1.1) –
Agro neo4j
tag in addition to cypher
. The shortest to write (and fastest to execute!) query for getting all distinct node labels is CALL db.labels
-- at least since neo4j
3.0, see also CALL page in neo4j
manual –
Blessing There is a function labels(node) that can return all labels for a node.
Neo4j 3.0 has introduced the procedure db.labels()
witch return all available labels in the database. Use:
call db.labels();
call
is 4ms
vs 15ms
for MATCH (n)
on my digitialocean test DB. One note: I have hierarchical labels, and MATCH
returns them as ["RawFile", "RawPhoto"]
, where call
only returns the flattened list. –
Pul If you want all the individual labels (not the combinations) you can always expand on the answers:
MATCH (n)
WITH DISTINCT labels(n) AS labels
UNWIND labels AS label
RETURN DISTINCT label
ORDER BY label
START n=node(*) RETURN labels(n)
START n=node(*) RETURN DISTINCT(labels(n))
returns the same as MATCH (n) RETURN distinct labels(n)
, but takes ~10 times as long (on my db) –
Pul If you're using the Java API, you can quickly get an iterator of all the Label
s in the database like so:
GraphDatabaseService db = (new GraphDatabaseFactory()).newEmbeddedDatabase(pathToDatabase);
ResourceIterable<Label> labs = GlobalGraphOperations.at(db).getAllLabels();
If you want to get the labels of a specify node, then use labels(node)
; If you only want to get all node labels in neo4j, then use this function instead: call db.labels;
, never ever use this query: MATCH n RETURN DISTINCT LABELS(n)
. It will do a full table scan, which is very very slow..
match(n) where n.name="abc" return labels(n)
it returns all the labels of the node "abc"
Use the labels()
function, as in this example which matches nodes with a name
property that have the value 'Alice':
MATCH (a) WHERE a.name = 'Alice'
RETURN labels(a)
The return type for labels()
is LIST<STRING>
, so it can return one or more values.
More info here: https://neo4j.com/docs/cypher-manual/5/functions/list/#functions-labels
There are multiple upvoted answers on this question, only one of which you should use (listed below as "Solution #1"). I've posted three ways of getting all in-use labels in the graph. The test data set has 109,120 nodes in the graph.
MATCH (x) RETURN count(x)
109120
db.labels()
The usage looks like this:
CALL db.labels();
On my test data set, this query completed in ~1 ms (successive runs shown):
Started streaming 9 records in less than 1 ms and completed in less than 1 ms.
Started streaming 9 records after 1 ms and completed after 1 ms.
Started streaming 9 records in less than 1 ms and completed after 1 ms.
Here's the execution plan output:
EXPLAIN CALL db.labels();
ProcedureCall
label
db.labels() :: (label :: STRING)
10 estimated rows
ProduceResults
label
label
10 estimated rows
Result
Note: estimated rows is 10, with no mention of the ~109,000 nodes in the graph.
labels()
on each node, get distinct resultsThe query looks like this:
MATCH (n) RETURN DISTINCT labels(n)
Here are several runs of that query, each more than an order of magnitude slower than solution #1:
Started streaming 9 records after 1 ms and completed after 41 ms.
Started streaming 9 records after 1 ms and completed after 36 ms.
Started streaming 9 records in less than 1 ms and completed after 37 ms.
The execution plan is more complicated, and clearly shows that all nodes in the graph are evaluated. Again, my test data set has 109,120 nodes in it, and we see exactly that number of nodes evaluated in the first step. If we had 1 million nodes in the graph, this approach would scan all 1 million (or 10 million, etc.).
EXPLAIN MATCH (n) RETURN DISTINCT labels(n)
AllNodesScan
n
n
109,120 estimated rows
Distinct
`labels(n)`
labels(n) as `labels(n)`
103,664 estimated rows
ProduceResults
`labels(n)`
`labels(n)`
103,664 estimated rows
Result
While the result is correct, this approach is significantly more expensive to evaluate than solution #1.
The query looks like this:
MATCH (n)
WITH DISTINCT labels(n) AS labels
UNWIND labels AS label
RETURN DISTINCT label
ORDER BY label
Here are several runs of this query, mid-30 ms range like solution #2:
Started streaming 9 records in less than 1 ms and completed after 33 ms.
Started streaming 9 records after 1 ms and completed after 32 ms.
Started streaming 9 records in less than 1 ms and completed after 37 ms.
Started streaming 9 records after 4 ms and completed after 36 ms.
The execution plan is similar to solution #2 at the beginning, but includes additional steps which involve nearly the entire data set:
EXPLAIN MATCH (n)
WITH DISTINCT labels(n) AS labels
UNWIND labels AS label
RETURN DISTINCT label
ORDER BY label
AllNodesScan
n
n
109,120 estimated rows
Distinct
labels
labels(n) AS labels
103,664 estimated rows
Unwind
labels, label
labels AS label
1,036,640 estimated rows
Distinct
label
label
984,808 estimated rows
Sort
label
label ASC
Ordered by label ASC
984,808 estimated rows
ProduceResults
label
label
Ordered by label ASC
984,808 estimated rows
Result
If your goal is to determine which labels exist in a graph, Solution #1 looks like the clear winner – it is not only the fasest and simplest approach, but it's performance is not bound by the number of nodes in the graph (so, it should remain fast even if you have more nodes).
I do not see any measurable benefit for using Solutions #2 or #3. Compared to Solution #1, both are slower and more complicated to write, and - unlike Solution #1 - their execution plans show that their performance is bound directly by the number of nodes in the graph. They will run more slowly with larger data sets.
© 2022 - 2024 — McMap. All rights reserved.
MATCH n RETURN DISTINCT LABELS(n)
is 8 characters less to type :) – Gluck