Counting Super Nodes On Titan
Asked Answered
P

2

6

In my system I have the requirement that the number of edges on the node must be stored as an internal property on the vertex as well as a vertex centric index on a specific outgoing edge. This naturally requires me to count the number of edges on the node after all the data has finished loading. I do so as follows:

long edgeCount = graph.getGraph().traversal().V(vertexId).bothE().count().next();

However when I scale up my tests to the point where some of my nodes are "super" nodes I get the following exception on the above line:

Caused by: com.netflix.astyanax.connectionpool.exceptions.TransportException: TransportException: [host=127.0.0.1(127.0.0.1):9160, latency=4792(4792), attempts=1]org.apache.thrift.transport.TTransportException: Frame size (70936735) larger than max length (62914560)!
    at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:197) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:153) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119) ~[astyanax-core-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352) ~[astyanax-core-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:538) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:112) ~[titan-cassandra-1.0.0.jar!/:na]

What is the best way to fix this ? Should I simply increase the frame size or is there a better way to count the number of edges on the node ?

Pixie answered 24/3, 2016 at 8:15 Comment(0)
S
3

Yes, you will need to increase the frame size. When you have a supernode, there is a really big row that needs to be read out of the storage backend, and this is even true in the OLAP case. I agree that if you are planning to calculate this on every vertex in the graph, this would be best done as an OLAP operation.

This and several other good tips can be found in this Titan mailing list thread. Keep in mind that link is pretty old, so the concepts are still valid, but some of the Titan configuration properties names may be different.

Shrine answered 28/3, 2016 at 15:34 Comment(1)
Does this mean that I need to start looking into integrating Gremlin-Hadoop ?Pixie
D
3

Such a task, which is OLAP by its nature, should be performed using a distributed system, not using a traversal.

There is a concept called GraphComputer in TinkerPop 3, which can be used to perform such a task.

It is basically allows you to run Gremlin queries, which will be evaluated on multiple machines.

For example, you can use SparkGraphComputer to run your queries on top of Apache Spark.

Delois answered 28/3, 2016 at 5:20 Comment(0)
S
3

Yes, you will need to increase the frame size. When you have a supernode, there is a really big row that needs to be read out of the storage backend, and this is even true in the OLAP case. I agree that if you are planning to calculate this on every vertex in the graph, this would be best done as an OLAP operation.

This and several other good tips can be found in this Titan mailing list thread. Keep in mind that link is pretty old, so the concepts are still valid, but some of the Titan configuration properties names may be different.

Shrine answered 28/3, 2016 at 15:34 Comment(1)
Does this mean that I need to start looking into integrating Gremlin-Hadoop ?Pixie

© 2022 - 2024 — McMap. All rights reserved.