Networkx: extract the connected component containing a given node (directed graph)
Asked Answered
E

5

25

I am trying to extract from a big graph the sub-graph of all connected nodes containing a specific node.

Is there a solution in the Networkx library?

[EDIT]
My graph is a DiGraph

[EDIT]
Rephrased simply:
I want the part of my graph that contain my specific node N_i and and all the nodes that are connected directly or indirectly (passing by other nodes) using any incoming or outcoming edges.
Example:

>>> g = nx.DiGraph()
>>> g.add_path(['A','B','C',])
>>> g.add_path(['X','Y','Z',])
>>> g.edges()
[('A', 'B'), ('B', 'C'), ('Y', 'Z'), ('X', 'Y')]

My desired result would be:

>>> g2 = getSubGraph(g, 'B')
>>> g2.nodes()
['A', 'B', 'C']
>>> g2.edges()
[('A', 'B'), ('B', 'C')]
Electrical answered 17/12, 2012 at 13:15 Comment(2)
It's not clear from your question what subgraph you want. If you want a subgraph that contains node N_i with no isolated nodes then e.g. the neighbors of N_i satisfy that. If you want the largest subgraph containing N_i but with with no isolated nodes then removing all isolated nodes from the graph would work (as long as N_i isn't degree 0). That graph won't necessarily be connected. If you want all of the nodes reachable from N_i consider nx.shortest_path(G,N_i)...Misconception
Not sure if you're checking this, but please check the edit I did of your title. What you had was not actually the question you ended up asking.Fawkes
M
28

You can use shortest_path() to find all of the nodes reachable from a given node. In your case you need to first convert the graph to an undirected representation so both in- and out-edges are followed.

In [1]: import networkx as nx

In [2]: >>> g = nx.DiGraph()

In [3]: >>> g.add_path(['A','B','C',])

In [4]: >>> g.add_path(['X','Y','Z',])

In [5]: u = g.to_undirected()

In [6]: nodes = nx.shortest_path(u,'B').keys()

In [7]: nodes
Out[7]: ['A', 'C', 'B']

In [8]: s = g.subgraph(nodes)

In [9]: s.edges()
Out[9]: [('A', 'B'), ('B', 'C')]

Or in one line

In [10]: s = g.subgraph(nx.shortest_path(g.to_undirected(),'B'))

In [11]: s.edges()
Out[11]: [('A', 'B'), ('B', 'C')]
Misconception answered 18/12, 2012 at 13:29 Comment(0)
S
15

Simply loop through the subgraphs until the target node is contained within the subgraph.

For directed graphs, I assume a subgraph is a graph such that every node is accessible from every other node. This is a strongly connected subgraph and the networkx function for that is strongly_connected_component_subgraphs.

(MWE) Minimal working example:

import networkx as nx
import pylab as plt

G = nx.erdos_renyi_graph(30,.05)
target_node = 13

pos=nx.graphviz_layout(G,prog="neato")

for h in nx.connected_component_subgraphs(G):
    if target_node in h:
        nx.draw(h,pos,node_color='red')
    else:
        nx.draw(h,pos,node_color='white')

plt.show()

enter image description here

For a directed subgraph (digraph) example change the corresponding lines to:

G = nx.erdos_renyi_graph(30,.05, directed=True)
...
for h in nx.strongly_connected_component_subgraphs(G):

enter image description here

Note that one of the nodes is in the connected component but not in the strongly connected component!

Smitt answered 17/12, 2012 at 15:31 Comment(5)
@AlbanSoupper As noted strongly_connected_component_subgraphs does work with directed graphs (digraphs).Smitt
I am novice in graph, but to my point of view in a directed graph the subgraph is not strongly connected... Think about a directed path_graph...Electrical
@AlbanSoupper It is not clear what you are intending when you say subgraph then... without a link to a mathematical definition it will be hard to help you. Also details like a directed or undirected graph are important when you first ask the question!Smitt
Sorry, I will rephrase my question, let me 5 minElectrical
I accept your answer because it would be the perfect answer if I was working with a non-directed graph :) ThanksElectrical
K
3

I found three solution to solve your requirement, just same as mine. The size of my Digraph are between 6000 to 12000 nodes, and max subgraph size will up to 3700. Three function I used are:

def create_subgraph_dfs(G, node):
    """ bidirection, O(1)"""
    edges = nx.dfs_successors(G, node)
    nodes = []
    for k,v in edges.items():
        nodes.extend([k])
        nodes.extend(v)
    return G.subgraph(nodes)

def create_subgraph_shortpath(G, node):
    """ unidirection, O(1)"""
    nodes = nx.single_source_shortest_path(G,node).keys()
    return G.subgraph(nodes)

def create_subgraph_recursive(G, sub_G, start_node):
    """ bidirection, O(nlogn)"""
    for n in G.successors_iter(start_node):
        sub_G.add_path([start_node, n])
        create_subgraph_recursive(G, sub_G, n)

The test result is summary as follow:

timeit ms

Kaffir answered 15/8, 2017 at 5:23 Comment(2)
create_subgraph_shortpath(G, node) worked for me to find the connected component of a directed graph. However, it feels like I might be overlooking an obvious function from the API to get such a result directly for a DiGraph. For Graphs using connected_components() is so easy in comparison.Figuration
@Figuration yes, but that will yield a generator for all of them within G, we want just the connected component containing a given node.Mesothelium
S
2

Use the example at the end of the page connected_component_subgraphs.

Just ensure to refer the last element from the list rather than the first

>>> G=nx.path_graph(4)
>>> G.add_edge(5,6)
>>> H=nx.connected_component_subgraphs(G)[-1]
Susian answered 17/12, 2012 at 13:18 Comment(4)
It seems to me that this will extract the largest subgraph not the subgraph (as OP notes) "containing a specific node".Smitt
@Hooked: I got mislead with the subject extract the smallest connected subgraph, which in this case would as the list returned is sorted in descending order of size,. Instead if OP wants the subgraph with the specific node, your solution makes senseSusian
It appears the most relevant page in the current documentation is this (connected_component_subgraphs itself being gone.)Verdin
connected_component_subgraphs has been deprecated. To create the induced subgraph of each component use: S = [G.subgraph(c).copy() for c in nx.connected_components(G)]Yachting
V
0

I checked the source code and found using g.to_undirected() will traverse the whole graph, which is inefficient. I refer to the source code of weakly_connected_components — NetworkX 3.3 documentation, modifying '_plain_bfs' method as follow, I think it's the optimal solution:

import networkx as nx

def plain_bfs(G, source):
    """A fast BFS node generator
    
    The direction of the edge between nodes is ignored.
    For directed graphs only.
    """
    Gsucc = G.succ
    Gpred = G.pred

    seen = set()
    nextlevel = {source}
    while nextlevel:
        thislevel = nextlevel
        nextlevel = set()
        for v in thislevel:
            if v not in seen:
                seen.add(v)
                nextlevel.update(Gsucc[v])
                nextlevel.update(Gpred[v])
    return seen

def get_weakly_connected_components(g, nid: str):
    '''Get the weakly connected components of the node.'''
    reachable_nodes = plain_bfs(g, nid)
    subg = g.subgraph(list(reachable_nodes))
    return subg.copy()

g = nx.DiGraph()
nx.add_path(g, ['A','B','C',])
nx.add_path(g, ['X','Y','Z',])
print(g.edges())

connected_nodes = plain_bfs(g, 'B')
print(connected_nodes)
print(get_weakly_connected_components(g, 'B').edges())

Result:

[('A', 'B'), ('B', 'C'), ('X', 'Y'), ('Y', 'Z')]

{'C', 'B', 'A'}

[('A', 'B'), ('B', 'C')]

Vannoy answered 29/5 at 16:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.