neo4j how to return all node labels with Cypher?
Asked Answered
C

9

82

I can't find how to return a node labels with Cypher.

Anybody knows the syntax for this operation?

Cynth answered 23/8, 2013 at 8:46 Comment(0)
Z
63

There is a function labels(node) that can return all labels for a node.

Zoroastrian answered 23/8, 2013 at 12:10 Comment(0)
V
110

To get all distinct node labels:

MATCH (n) RETURN distinct labels(n)

To get the node count for each label:

MATCH (n) RETURN distinct labels(n), count(*)
Vennieveno answered 5/6, 2014 at 11:6 Comment(5)
MATCH n RETURN DISTINCT LABELS(n) is 8 characters less to type :)Gluck
Agree with @FLekschas and moreover, Neo recommends that you use MATCH for newer Neo4j databases because START is for legacy indexes.Orphanage
Neo.ClientError.Statement.SyntaxError Parentheses are required to identify nodes in patterns, i.e. (n) ... (v. 3.1.1)Agro
This question also has neo4j tag in addition to cypher. The shortest to write (and fastest to execute!) query for getting all distinct node labels is CALL db.labels -- at least since neo4j 3.0, see also CALL page in neo4j manualBlessing
How can I return labels for a list of nodes?Neese
Z
63

There is a function labels(node) that can return all labels for a node.

Zoroastrian answered 23/8, 2013 at 12:10 Comment(0)
I
47

Neo4j 3.0 has introduced the procedure db.labels() witch return all available labels in the database. Use:

call db.labels();
Incinerator answered 18/12, 2017 at 12:45 Comment(3)
this is the most efficient approachNapalm
call is 4ms vs 15ms for MATCH (n) on my digitialocean test DB. One note: I have hierarchical labels, and MATCH returns them as ["RawFile", "RawPhoto"], where call only returns the flattened list.Pul
Why was this so hard to find.Burmaburman
C
25

If you want all the individual labels (not the combinations) you can always expand on the answers:

MATCH (n)
WITH DISTINCT labels(n) AS labels
UNWIND labels AS label
RETURN DISTINCT label
ORDER BY label
Clapperclaw answered 6/12, 2014 at 17:54 Comment(0)
G
5
 START n=node(*) RETURN labels(n)
Gelsemium answered 2/12, 2013 at 16:34 Comment(1)
Results on this are currently a nightmare. Returns 1 row for every node. START n=node(*) RETURN DISTINCT(labels(n)) returns the same as MATCH (n) RETURN distinct labels(n), but takes ~10 times as long (on my db)Pul
S
4

If you're using the Java API, you can quickly get an iterator of all the Labels in the database like so:

GraphDatabaseService db = (new GraphDatabaseFactory()).newEmbeddedDatabase(pathToDatabase);
ResourceIterable<Label> labs = GlobalGraphOperations.at(db).getAllLabels();
Saraisaraiya answered 29/6, 2015 at 20:10 Comment(1)
"with Cypher" -- OPLoth
C
4

If you want to get the labels of a specify node, then use labels(node); If you only want to get all node labels in neo4j, then use this function instead: call db.labels;, never ever use this query: MATCH n RETURN DISTINCT LABELS(n). It will do a full table scan, which is very very slow..

Cannell answered 9/12, 2017 at 3:25 Comment(0)
F
0

match(n) where n.name="abc" return labels(n)

it returns all the labels of the node "abc"

Frons answered 24/3, 2023 at 12:37 Comment(0)
M
0

How do you get all labels for a specific node?

Use the labels() function, as in this example which matches nodes with a name property that have the value 'Alice':

MATCH (a) WHERE a.name = 'Alice'
RETURN labels(a)

The return type for labels() is LIST<STRING>, so it can return one or more values.

More info here: https://neo4j.com/docs/cypher-manual/5/functions/list/#functions-labels

How do you get all labels in the graph?

There are multiple upvoted answers on this question, only one of which you should use (listed below as "Solution #1"). I've posted three ways of getting all in-use labels in the graph. The test data set has 109,120 nodes in the graph.

MATCH (x) RETURN count(x)
109120

Solution #1: Use built-in procedure db.labels()

The usage looks like this:

CALL db.labels();

On my test data set, this query completed in ~1 ms (successive runs shown):

Started streaming 9 records in less than 1 ms and completed in less than 1 ms.
Started streaming 9 records after 1 ms and completed after 1 ms.
Started streaming 9 records in less than 1 ms and completed after 1 ms.

Here's the execution plan output:

EXPLAIN CALL db.labels();

ProcedureCall
    label
    db.labels() :: (label :: STRING)
    10 estimated rows

ProduceResults
    label
    label
    10 estimated rows

Result

Note: estimated rows is 10, with no mention of the ~109,000 nodes in the graph.

Solution #2: Match all nodes, call labels() on each node, get distinct results

The query looks like this:

MATCH (n) RETURN DISTINCT labels(n)

Here are several runs of that query, each more than an order of magnitude slower than solution #1:

Started streaming 9 records after 1 ms and completed after 41 ms.
Started streaming 9 records after 1 ms and completed after 36 ms.
Started streaming 9 records in less than 1 ms and completed after 37 ms.

The execution plan is more complicated, and clearly shows that all nodes in the graph are evaluated. Again, my test data set has 109,120 nodes in it, and we see exactly that number of nodes evaluated in the first step. If we had 1 million nodes in the graph, this approach would scan all 1 million (or 10 million, etc.).

EXPLAIN MATCH (n) RETURN DISTINCT labels(n)

AllNodesScan
    n
    n
    109,120 estimated rows

Distinct
    `labels(n)`
    labels(n) as `labels(n)`
    103,664 estimated rows

ProduceResults
    `labels(n)`
    `labels(n)`
    103,664 estimated rows

Result

While the result is correct, this approach is significantly more expensive to evaluate than solution #1.

Solution #3: Similar to solution #2, with additional steps of unwinding labels and returning distinct results from that

The query looks like this:

MATCH (n)
WITH DISTINCT labels(n) AS labels
UNWIND labels AS label
RETURN DISTINCT label
ORDER BY label

Here are several runs of this query, mid-30 ms range like solution #2:

Started streaming 9 records in less than 1 ms and completed after 33 ms.
Started streaming 9 records after 1 ms and completed after 32 ms.
Started streaming 9 records in less than 1 ms and completed after 37 ms.
Started streaming 9 records after 4 ms and completed after 36 ms.

The execution plan is similar to solution #2 at the beginning, but includes additional steps which involve nearly the entire data set:

EXPLAIN MATCH (n)
WITH DISTINCT labels(n) AS labels
UNWIND labels AS label
RETURN DISTINCT label
ORDER BY label

AllNodesScan
    n
    n
    109,120 estimated rows

Distinct
    labels
    labels(n) AS labels
    103,664 estimated rows

Unwind
    labels, label
    labels AS label
    1,036,640 estimated rows

Distinct
    label
    label
    984,808 estimated rows

Sort
    label
    label ASC
    Ordered by label ASC
    984,808 estimated rows

ProduceResults
    label
    label
    Ordered by label ASC
    984,808 estimated rows

Result

Conclusion

If your goal is to determine which labels exist in a graph, Solution #1 looks like the clear winner – it is not only the fasest and simplest approach, but it's performance is not bound by the number of nodes in the graph (so, it should remain fast even if you have more nodes).

I do not see any measurable benefit for using Solutions #2 or #3. Compared to Solution #1, both are slower and more complicated to write, and - unlike Solution #1 - their execution plans show that their performance is bound directly by the number of nodes in the graph. They will run more slowly with larger data sets.

Massa answered 14/4 at 5:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.