Titan lookups on indexed key are incredibly slow?
Asked Answered
D

1

6

Using Titan w/ Cassandra v 0.3.1, I created a vertex key index via createKeyIndex as described in the Titan docs.

gremlin> g.createKeyIndex("my_key", Vertex.class)
==>null

I now have appx 50k nodes and 186k edges in the graph, and I'm finding a significant performance difference between lookups using my_key. This query takes about 5 seconds to run:

gremlin> g.V.has("my_key", "abc")
==>v[12345]

whereas using the index ID takes less than 1 second:

gremlin> g.v(12345)
==>v[12345]

my_key does not have a unique constraint (I don't want to), but I'm wondering what is causing such a discrepancy in performance. How can I increase performance on lookups for a non-unique, indexed vertex key?

Dioptrics answered 17/6, 2013 at 12:39 Comment(0)
D
5

The issue here is the use of .has, which is a filter function and will not use any indexes. From GremlinDocs:

It is worth noting that the syntax of has is similar to g.V("name", "marko"), which has the difference of being a key index lookup and as such will perform faster. In contrast, this line, g.V.has("name", "marko"), will iterate over all vertices checking the name property of each vertex for a match and will be significantly slower than the key index approach.

For the example above, this will use the index and perform the lookup very quickly (< 1 second):

gremlin> g.V("my_key", "abc")
==>v[12345]
Dioptrics answered 17/6, 2013 at 12:56 Comment(1)
This is not accurate as of Titan 0.5.0: g.V.has("my_key", "abc") will now use an available index on the my_key key. See Titan's index docs.Clue

© 2022 - 2024 — McMap. All rights reserved.