Graph-tool surprisingly slow compared to Networkx
Asked Answered
P

1

15

After looking at the impressive performance comparison, I decided that I would give a try to graph-tool. So for comparison, I wrote codes to generate a random tree using both packages.

The graph-tool code:

import numpy as np
import graph_tool.all as gt

# construct an initial graph with two nodes and one link
n = 5000
G = gt.Graph(directed = False)
G.add_edge(0, 1)

for t in range(2, n):
    # connect the new vertex to one of the old vertices randomly
    G.add_edge(np.random.choice(range(t)), t)

The Networkx code:

import networkx as nx
import numpy as np

n = 5000
# initial graph
G = nx.Graph()
G.add_edge(0, 1)

for t in range(2, n):
    G.add_edge(t, np.random.choice(range(t)))

The graph-tool takes around 14 seconds on my 4-core machine whereas networkx takes less than 2 seconds on the same machine! Am I missing something obvious?

Thanks in advance.

Photokinesis answered 24/3, 2016 at 5:24 Comment(4)
I would guess that adding edges one by one is slower than some other use cases (e.g. doing lots of searches of a pre-existing graph).Daggerboard
Actually I run more complicated code with this and still networkx is giving better performance to me. (This one: link.springer.com/article/10.1140/epjb/e2015-60501-y#page-1)Photokinesis
Could it be that something is slow about initializing the network?Unfit
No. I am printing those times t and in graph-tool, I can see that those are printed more and more slowly as t becomes larger.Photokinesis
R
31

There is nothing surprising here. graph-tool achieves greater performance by off-loading main loops to C++. If all your main loops are in Python, it offers no advantage. The same is true for other libraries like numpy.

The proper way to achieve fast addition of edges is to let graph-tool perform the main loop. The network you are generating is a simple growth model, and can be achieved in graph-tool by calling:

G = price_network(n, gamma=0, directed=False)

which takes about 15 ms in my computer for n=5000.

Note also that your python code is unnecessarily slow, since you create new lists with all vertices at each iteration. A much faster version would be:

from numpy.random import randint
n = 5000
G = Graph(directed=False)
G.add_vertex(n)
G.add_edge(0, 1)
for i in range(2, n):
    G.add_edge(i, randint(i))

For even larger values of n, it will be even faster to add all edges at once instead of one by one, i.e.

from graph_tool.all import *
from numpy.random import randint
n = 5000
G = Graph(directed=False)
edges = [(0, 1)]
for i in range(2, n):
    edges.append((i, randint(i)))
G.add_edge_list(edges)
Robeson answered 24/3, 2016 at 14:31 Comment(4)
Yes. Now I understand the issue. But then how do I construct some generalized graphs? For example, for preferential attachment models, I obviously can't add all the edges at once. Also, I actually want to simulate the network in which number of new nodes is some fraction of the current network size and so I can't add all nodes simultaneously too. Is that right? ThanksPhotokinesis
Can you please guide me about achieving the task of many nodes simultaneously and efficiently? ThanksPhotokinesis
@Peaceful, did you get around this?Honolulu
@user1712447 Tiago's answer helps.Photokinesis

© 2022 - 2024 — McMap. All rights reserved.