The Girvan–Newman algorithm is computationally very expensive. As mentioned in the the docs in NetworkX:
The Girvan–Newman algorithm detects communities by progressively removing edges from the original graph. The algorithm removes the “most valuable” edge, traditionally the edge with the highest betweenness centrality, at each step.
By looking at the source code, the following is called:
while g.number_of_edges() > 0:
yield _without_most_central_edges(g, most_valuable_edge)
Which in turn calls:
while num_new_components <= original_num_components:
edge = most_valuable_edge(G)
G.remove_edge(*edge)
new_components = tuple(nx.connected_components(G))
num_new_components = len(new_components)
return new_components
So at each step, the most valuable edge is removed, defined as the edge with the highest betweenness centrality, and the connected components are found. So roughly the complexity is of the order of the number of edges times the complexity of the connected components algorithm and the highest betweenness centrality.
The docs mention some approaches to slice the returned generator and keep the first k
tuples of communities. Here's one if you want to run the algorithm up to the kth
iteration:
from itertools import islice, takewhile
G = nx.fast_gnp_random_graph(10, 0.2)
k = 2
comp = girvan_newman(G)
for communities in islice(comp, k):
print(tuple(sorted(c) for c in communities))
([0, 3, 4, 8], [1, 5], [2], [6, 7, 9])
([0, 3], [1, 5], [2], [4, 8], [6, 7, 9])
Or one using itertools.takewhile
to take tuples until the number of communities surpasses some threshold, which seems an interesting approach, since it allows you to impose the amount of clusters you want, for example:
G = nx.fast_gnp_random_graph(10, 0.3)
k = 4
comp = girvan_newman(G)
limited = takewhile(lambda c: len(c) <= k, comp)
for communities in limited:
print(tuple(sorted(c) for c in communities))
([0, 1, 2, 3, 4, 5, 6, 7, 8], [9])
([0, 2, 4, 7, 8], [1, 3, 5, 6], [9])
([0, 2, 4, 7, 8], [1], [3, 5, 6], [9])
Answering to your first question, you'll see in the source code that the graph is copied to an undirected graph g = G.copy().to_undirected()
, so yes, it is only intended to work for undirected graphs.