What is the optimal way to create a graph with add_edge_list() method?
Asked Answered
E

1

6

I am trying to create large graph via graph-tool library (near 10^6 - 10^7 vertices) and fill vertex property with vertex name or use names instead of vertex indexes. I have:

  1. list of names:

    ['50', '56', '568']
    
  2. set of edges, but instead of vertex indexes it consists of their names:

    edge_list = {frozenset({'568', '56'}), frozenset({'56', '50'}), frozenset({'50', '568'})}
    

Since add_edge_list() allows to create vertices if they are no such vertix in the graph. I'm trying to use it to fill an empty graph. It works ok, but when I was trying to get vertex by its name, I got an error that there are no vertex with such index.

Here is the code of my program:

g = grt.Graph(directed=False)
edge_list = {frozenset({'568', '56'}), frozenset({'56', '50'}), frozenset({'50', '568'})}
ids = ['50', '56', '568']
g.add_edge_list(edge_list, hashed=True, string_vals=True)
print(g.vertex('50'))

The error message of print(g.vertex('50')):

ValueError: Invalid vertex index: 50

I want to create graph:

  1. Using edge_list only;
  2. Having quick access to a vertex by its name;
  3. Optimal by time (and RAM if possible).

Is there any good way to do this?

EDIT: Current code:

g = grt.Graph(directed=False)
g.add_vertex(len(ids))
vprop = g.new_vertex_property("string", vals=ids)
g.vp.user_id = vprop  
for vert1, vert2 in edges_list:
    g.add_edge(g.vertex(ids_dict[vert1]), g.vertex(ids_dict[vert2]))
Emmieemmit answered 8/5, 2019 at 12:17 Comment(0)
D
8

If you have a dense graph with 10^6 - 10^7 vertices (Is it some medical data or social graph? It can change everything), you shouldn't use networkx because it is written on pure Python so it is ~10-100 times slower than graph-tool or igraph. In your case I recommend you to use graph-tool. It is the fastest (~as igraph) Python graph processing library.

graph-tool behaviour differs from networkx. When you create the networkx node, its identifier is what you wrote in node constructor so you can get the node by its ID. In graph-tool every vertex ID is the integer from 1 to GRAPH_SIZE:

Each vertex in a graph has an unique index, which is always between 0 and N−1, where N is the number of vertices. This index can be obtained by using the vertex_index attribute of the graph (which is a property map, see Property maps), or by converting the vertex descriptor to an int.

Every additional information about graph, vertices or edges is stored in property maps. And when you are using .add_edge_list() with hashed=True, the new property map is returned as the result of .add_edge_list(). So in your case you should handle your vertices like this:

# Create graph
g = grt.Graph(directed=False)

# Create edge list
# Why frozensets? You don't really need them. You can use ordinary sets or tuples
edge_list = {
    frozenset({'568', '56'}),
    frozenset({'56', '50'}),
    frozenset({'50', '568'})
}

# Write returned PropertyMap to a variable!
vertex_ids = g.add_edge_list(edge_list, hashed=True, string_vals=True)

g.vertex(1)

Out [...]: <Vertex object with index '1' at 0x7f3b5edde4b0>

vertex_ids[1]

Out [...]: '56'

If you want to get a vertex according to the ID, you should construct mapping dict manually (well, I am not a graph-tool guru, but I can't find simple solution):

very_important_mapping_dict = {vertex_ids[i]: i for i in range(g.num_vertices())}

So you can easily get a vertex index:

very_important_mapping_dict['568']

Out [...]: 0

vertex_ids[0]

Out [...]: '568'
Drat answered 8/5, 2019 at 16:17 Comment(1)
Yes, it is a social graph actually. Will it help some way? I am using igraph for clustering (or searching communities if you prefer this term) and graph-tool for measuring centralities and for drawing graphs. I am using frozenset to collect data from the source. Runtime is the main problem. The question is: what will be faster: using add_edge_lis() or add_edge() in cycle? I added current code in the question field above.Emmieemmit

© 2022 - 2024 — McMap. All rights reserved.