Fast max-flow min-cut library for Python
Asked Answered
E

5

25

Is there a reliable and well-documented Python library with a fast implementation of an algorithm that finds maximum flows and minimum cuts in directed graphs?

pygraph.algorithms.minmax.maximum_flow from python-graph solves the problem but it is painfully slow: finding max-flows and min-cuts in a directed graph with something like 4000 nodes and 11000 edges takes > 1 minute. I am looking for something that is at least an order of magnitude faster.

Bounty: I'm offering a bounty on this question to see if the situation has changed since when this question was asked. Bonus points if you have personal experience with the library you recommend!

Explore answered 24/10, 2010 at 16:3 Comment(2)
Have you tried using Psyco(psyco.sourceforge.net) with it? The code for maximum_flow here is all written in pure Python so Psyco could give a huge speed-up.Juback
the link is brokenGatt
D
28

I have used graph-tool for similar tasks.

Graph-tool is an efficient python module for manipulation and statistical analysis of graphs (a.k.a. networks). They even have superb documentation about max-flow algorithms.

Currently graph-tool supports given algorithms:

  • Edmonds-Karp - Calculate maximum flow on the graph with the Edmonds-Karp algorithm.
  • Push relabel - Calculate maximum flow on the graph with the push-relabel algorithm.
  • Boykov Kolmogorov - Calculate maximum flow on the graph with the Boykov-Kolmogorov algorithm.

Example taken from docs: find maxflow using Boykov-Kolmogorov:

>>> g = gt.load_graph("flow-example.xml.gz") #producing example is in doc
>>> cap = g.edge_properties["cap"]
>>> src, tgt = g.vertex(0), g.vertex(1)
>>> res = gt.boykov_kolmogorov_max_flow(g, src, tgt, cap)
>>> res.a = cap.a - res.a  # the actual flow
>>> max_flow = sum(res[e] for e in tgt.in_edges())
>>> print max_flow
6.92759897841
>>> pos = g.vertex_properties["pos"]
>>> gt.graph_draw(g, pos=pos, pin=True, penwidth=res, output="example-kolmogorov.png")

I executed this example with random directed graph(nodes=4000, vertices = 23964), all process took just 11seconds.

alternative libraries:

Devoted answered 29/8, 2011 at 6:37 Comment(2)
Python packages link is broken.Gatt
Graph-tool doesn't support Windows yet.Inexorable
L
7

SciPy, as of 1.4.0, has an implementation as well in scipy.sparse.csgraph.maximum_flow that might be easier to use as part of your build chain (as the package is available through pip/conda).

It works by manipulating sparse matrices (hence scipy.sparse) representing the adjacency matrix of the graph, and as such, the underlying data structure is close to the metal, and with the algorithm itself being implemented in Cython, performance should be on par with e.g. graph-tool.

How the different implementations compare with regards to performance will always depend on the structure of the graph whose maximum flow you're interested in, but as a simple benchmark, I tried running random graphs with different sparsities through NetworkX, graph-tool, and SciPy. All of them play well with NumPy arrays, so to ensure a level playing field, let us create methods so that each of them take as inputs NumPy arrays with shape (density*1000*1000, 3) whose rows are edges, and whose columns are the two vertices incident to a given edge, as well as the capacity of the edge.

import numpy as np
from scipy.sparse import rand


def make_data(density):
    m = (rand(1000, 1000, density=density, format='coo', random_state=42)*100).astype(np.int32)
    return np.vstack([m.row, m.col, m.data]).T

data01 = make_data(0.1)
data03 = make_data(0.3)
data05 = make_data(0.5)

With this, the various frameworks can calculate the value of a maximum flow as follows:

import graph_tool.all as gt
from scipy.sparse import coo_matrix, csr_matrix
from scipy.sparse.csgraph import maximum_flow
import networkx as nx


def networkx_max_flow(data, primitive):
    m = coo_matrix((data[:, 2], (data[:, 0], data[:, 1])))
    G = nx.from_numpy_array(m.toarray(), create_using=nx.DiGraph())
    return nx.maximum_flow_value(G, 0, 999, capacity='weight', flow_func=primitive)


def graph_tool_max_flow(data, primitive):
    g = gt.Graph()
    cap = g.new_edge_property('int')
    eprops = [cap]
    g.add_edge_list(data, eprops=eprops)
    src, tgt = g.vertex(0), g.vertex(999)
    res = primitive(g, src, tgt, cap)
    res.a = cap.a - res.a
    return sum(res[e] for e in tgt.in_edges())


def scipy_max_flow(data):
    m = csr_matrix((data[:, 2], (data[:, 0], data[:, 1])))
    return maximum_flow(m, 0, 999).flow_value

And with this, examples of IPython benchmarks become

%timeit networkx_max_flow(data01, nx.algorithms.flow.shortest_augmenting_path)
%timeit graph_tool_max_flow(data03, gt.push_relabel_max_flow)
%timeit scipy_max_flow(data05)

I then see the following results:

+----------------------------------------------+----------------+----------------+---------------+
|                  Algorithm                   |  Density: 0.1  |  Density: 0.3  |  Density: 0.5 |
+----------------------------------------------+----------------+----------------+---------------+
| nx.algorithms.flow.edmonds_karp              |  1.07s         |  3.2s          |  6.39s        |
| nx.algorithms.flow.preflow_push              |  1.07s         |  3.27s         |  6.18s        |
| nx.algorithms.flow.shortest_augmenting_path  |  1.08s         |  3.25s         |  6.23s        |
| gt.edmonds_karp_max_flow                     |  274ms         |  2.84s         |  10s          |
| gt.push_relabel_max_flow                     |  71ms          |  466ms         |  1.42s        |
| gt.boykov_kolmogorov_max_flow                |  79ms          |  463ms         |  895ms        |
| scipy.sparse.csgraph.maximum_flow            |  64ms          |  234ms         |  580ms        |
+----------------------------------------------+----------------+----------------+---------------+

Again, results will depend on the graph structure, but this at least suggests that SciPy should offer you performance on par with graph-tool.

Lackluster answered 12/11, 2019 at 9:14 Comment(2)
On this data, igraph is 6-10x faster than Scipy and 100-115x faster than Networkx.Inexorable
SciPy 1.8.0 (due around December 2021) will come with further performance improvements to maximum_flow (#14358, #14392), at which point updates are due for the benchmarks above.Lackluster
K
6

I don't know if it is faster, you'll need to check that, but have you tried networkx ? Seems like it offers the functionality you're looking for and from my experience it is a very easy to use library for handling graphs.

Koblenz answered 23/8, 2011 at 14:12 Comment(2)
If networkx is too slow, you could try and get it working in pypy as it seems that it almost does.Flatto
Networkx is implemented in Python, and hence is very much slower than igraph and PyMaxflow, which are implemented in C/C++.Inexorable
F
1

For really good performance, you can try reformulating your problem as an Integer Linear Program, any of the standard ILP tools should give you more than adequate performance.

Wikipedia contains a good list of such both commercial and open source tools, many of which seem to have python bindings. Amongst the most well known are CPLEX and lp_solve.

I've personally used lp_solve reasonably heavily over the last few years and found it sufficient to just write input to lp_solve as plain text files and invoke lp_solve using the shell. Thinking back, I probably should have invested a bit more effort to get the official python bindings to lp_solve working.

Faunia answered 29/8, 2011 at 11:18 Comment(1)
An Integer Linear Program (ILP) is unnecessary, max flow can be formulated as a simple linear program (en.wikipedia.org/wiki/…). Max flow can be solved in polynomial time, as well as a linear program formulation of the same problem. However, ILP is an NP-hard problem.Holley
I
1

Check PyMaxflow and igraph.

PyMaxflow is a Python library for graph construction and maxflow computation (commonly known as graph cuts). The core of this library is the C++ implementation by Vladimir Kolmogorov, which can be downloaded from his homepage.

Besides the wrapper to the C++ library, PyMaxflow offers NumPy integration, methods for fast construction of common graph layouts in computer vision and graphics, and implementation of algorithms for fast energy minimization which use the maxflow method (αβ-swap and α-expansion).

igraph is a C library for creating, manipulating and analysing graphs. It is intended to be as powerful (i.e. fast) as possible to enable working with large graphs. igraph can also be used from: R, Python, and Mathematica.

On my test cases, igraph is 2-5x faster than PyMaxflow, 6-10x faster than Scipy, and 100-115x faster than Networkx.

Inexorable answered 1/9, 2021 at 13:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.