Algorithm for topological sorting if cycles exist

Asked 15/5, 2012 at 19:54 Answered 15/5, 2012 at 20:18

Solved algorithm directed-graph topological-sort

Some programming languages (like haskell) allow cyclic dependencies between modules. Since the compiler needs to know all definitions of all modules imported while compiling one module, it usually has to do some extra work if some modules import each other mutually or any other kind of cycle occurs. In that case, the compiler may not be able to optimize code as much as in modules that have no import cycles, since imported functions may have not yet been analyzed. Usually only one module of a cycle has to be compiled that way, as a binary object has no dependecies. Let's call such a module loop-breaker

Especially if the import cycles are interleaved it is interesting to know, how to minimize the number of loop-breakers when compiling a big project composed of hundreds of modules.

Is there an algorithm that given a set of dependecies outputs a minimal number of modules that need to be compiled as loop-breakers to compile the program successfully?

Example

I try to clarify what I mean in this example.

Consider a project with the four modules A, B, C and D. This is a list of dependencies between these modules, an entry X y means y depends on x:

A C
A D
B A
C B
D B

The same relation visualized as an ASCII-diagram:

D ---> B
^    / ^
|   /  |
|  /   |
| L    |
A ---> C

There are two cycles in this dependency-graph: ADB and ACB. To break these cycles one could for instance compile modules C and D as loop-breakers. Obviously, this is not the best approach. Compiling A as a loop-breaker is completely sufficient to break both loops and you need to compile one less module as a loop-breaker.

Aristaeus answered 15/5, 2012 at 19:54 Comment(0)

This is the NP-hard (and APX-hard) problem known as minimum feedback vertex set. An approximation algorithm due to Demetrescu and Finocchi (pdf, Combinatorial Algorithms for Feedback Problems in Directed Graphs (2003)") works well in practice when there are no long simple cycles, as I would expect for your application.

Shiny answered 15/5, 2012 at 20:18 Comment(0)

Here is how to do it in Python:

from collections import defaultdict

def topological_sort(dependency_pairs):
    'Sort values subject to dependency constraints'
    num_heads = defaultdict(int)   # num arrows pointing in
    tails = defaultdict(list)      # list of arrows going out
    for h, t in dependency_pairs:
        num_heads[t] += 1
        tails[h].append(t)

    ordered = [h for h in tails if h not in num_heads]
    for h in ordered:
        for t in tails[h]:
            num_heads[t] -= 1
            if not num_heads[t]:
                ordered.append(t)
    cyclic = [n for n, heads in num_heads.iteritems() if heads]
    return ordered, cyclic

def is_toposorted(ordered, dependency_pairs):
    '''Return True if all dependencies have been honored.
       Raise KeyError for missing tasks.
    '''
    rank = {t: i for i, t in enumerate(ordered)}
    return all(rank[h] < rank[t] for h, t in dependency_pairs)

print topological_sort('aa'.split())
ordered, cyclic = topological_sort('ah bg cf ch di ed fb fg hd he ib'.split())
print ordered, cyclic
print is_toposorted(ordered, 'ah bg cf ch di ed fb fg hd he ib'.split())
print topological_sort('ah bg cf ch di ed fb fg hd he ib ba xx'.split())

The runtime is linearly proportional to the number of edges (dependency pairs).

The algorithm is organized around a lookup table called num_heads that keeps a count the number of predecessors (incoming arrows). In the ah bg cf ch di ed fb fg hd he ib example, the counts are:

node  number of incoming edges
----  ------------------------
 a       0
 b       2
 c       0
 d       2
 e       1
 f       1  
 g       2
 h       2
 i       1

The algorithm works by "visting" nodes with no predecessors. For example, nodes a and c have no incoming edges, so they are visited first.

Visiting means that the nodes are output and removed from the graph. When a node is visited, we loop over its successors and decrement their incoming count by one.

For example, in visiting node a, we go to its successor h to decrement its incoming count by one (so that h 2 becomes h 1.

Likewise, when visiting node c, we loop over its successors f and h, decrementing their counts by one (so that f 1 becomes f 0 and h 1 becomes h 0).

The nodes f and h no longer have incoming edges, so we repeat the process of outputting them and removing them from the graph until all the nodes have been visited. In the example, the visitation order (the topological sort is):

a c f h e d i b g

If num_heads ever arrives at a state when there are no nodes without incoming edges, then it means there is a cycle that cannot be topologically sorted and the algorithm exits.

Gipson answered 15/5, 2012 at 19:59 Comment(3)

Sorry, I don't speak Python. How does this algorithm work? What is its amortized runtime? – Aristaeus 15/5, 2012 at 21:11

@FUZxxl I amended the answer to explain in detail how the algorithm works and what its running time would be. – Gipson 5/5, 2013 at 1:15

Thank you for your extended effort. Unfortunately your algorithm does actually not solve the problem I have. If you look at the question, it's not about how to toposort a DAG, it's about how to find a minimal set of vertices to remove from the graph in a way that the remainder is toposortable. – Aristaeus 5/5, 2013 at 10:2

Example

Recommended topics

Hot tags