SciPy, as of 1.4.0, has an implementation as well in scipy.sparse.csgraph.maximum_flow
that might be easier to use as part of your build chain (as the package is available through pip/conda).
It works by manipulating sparse matrices (hence scipy.sparse
) representing the adjacency matrix of the graph, and as such, the underlying data structure is close to the metal, and with the algorithm itself being implemented in Cython, performance should be on par with e.g. graph-tool.
How the different implementations compare with regards to performance will always depend on the structure of the graph whose maximum flow you're interested in, but as a simple benchmark, I tried running random graphs with different sparsities through NetworkX, graph-tool, and SciPy. All of them play well with NumPy arrays, so to ensure a level playing field, let us create methods so that each of them take as inputs NumPy arrays with shape (density*1000*1000, 3) whose rows are edges, and whose columns are the two vertices incident to a given edge, as well as the capacity of the edge.
import numpy as np
from scipy.sparse import rand
def make_data(density):
m = (rand(1000, 1000, density=density, format='coo', random_state=42)*100).astype(np.int32)
return np.vstack([m.row, m.col, m.data]).T
data01 = make_data(0.1)
data03 = make_data(0.3)
data05 = make_data(0.5)
With this, the various frameworks can calculate the value of a maximum flow as follows:
import graph_tool.all as gt
from scipy.sparse import coo_matrix, csr_matrix
from scipy.sparse.csgraph import maximum_flow
import networkx as nx
def networkx_max_flow(data, primitive):
m = coo_matrix((data[:, 2], (data[:, 0], data[:, 1])))
G = nx.from_numpy_array(m.toarray(), create_using=nx.DiGraph())
return nx.maximum_flow_value(G, 0, 999, capacity='weight', flow_func=primitive)
def graph_tool_max_flow(data, primitive):
g = gt.Graph()
cap = g.new_edge_property('int')
eprops = [cap]
g.add_edge_list(data, eprops=eprops)
src, tgt = g.vertex(0), g.vertex(999)
res = primitive(g, src, tgt, cap)
res.a = cap.a - res.a
return sum(res[e] for e in tgt.in_edges())
def scipy_max_flow(data):
m = csr_matrix((data[:, 2], (data[:, 0], data[:, 1])))
return maximum_flow(m, 0, 999).flow_value
And with this, examples of IPython benchmarks become
%timeit networkx_max_flow(data01, nx.algorithms.flow.shortest_augmenting_path)
%timeit graph_tool_max_flow(data03, gt.push_relabel_max_flow)
%timeit scipy_max_flow(data05)
I then see the following results:
+----------------------------------------------+----------------+----------------+---------------+
| Algorithm | Density: 0.1 | Density: 0.3 | Density: 0.5 |
+----------------------------------------------+----------------+----------------+---------------+
| nx.algorithms.flow.edmonds_karp | 1.07s | 3.2s | 6.39s |
| nx.algorithms.flow.preflow_push | 1.07s | 3.27s | 6.18s |
| nx.algorithms.flow.shortest_augmenting_path | 1.08s | 3.25s | 6.23s |
| gt.edmonds_karp_max_flow | 274ms | 2.84s | 10s |
| gt.push_relabel_max_flow | 71ms | 466ms | 1.42s |
| gt.boykov_kolmogorov_max_flow | 79ms | 463ms | 895ms |
| scipy.sparse.csgraph.maximum_flow | 64ms | 234ms | 580ms |
+----------------------------------------------+----------------+----------------+---------------+
Again, results will depend on the graph structure, but this at least suggests that SciPy should offer you performance on par with graph-tool.