All pairs all paths on a graph

D

3

This is possibly a problem with possibly no optimal solution. Suppose I have a directed graph, have no clue whether it has any cycles or not(cycle detection will be one of the aspects of this problem). Given a set of vertices(possibly millions of vertices) I need to count all the distinct paths(paths with no duplicate vertices) between all the unique pairs of the given graph. How would I go about tackling this situation?

Lets look at a brute-force way to do this:

Compute all possible pairs from the graph.
For each pair of the graph use DFS to get all the paths from Source to Destination.
Assuming the pairs are represented in a hash table, put the path count as the value against this pair.
Repeat for the rest of the pairs.

Can people point out what are some of the things that can go wrong with this? Lets think of the problem in this manner, what are the computational challenges of finding all the distinct paths between all the cities of the planet and if one would even attempt to solve this problem, where would one start?

Edit: Some of the algorithms presented produce results in O(n!) factorial time. This is somewhat of a daunting computation for a single machine with limited resources to handle. Does anyone know of map-reduce techniques to break down the problem of graph traversal into smaller chunks?

Denary answered 9/3, 2011 at 17:38 Comment(10)

How do you want to deal with cycles when you find them? Say, given a graph {(A,B), (B,C), (C,A), (C,D)}. There is a an infinite number of distinct paths between A and D: ABCD, ABCABCD, ABCABCABCD, ABCABCABCABCD... Any cyclic graph has an infinite number of distinct paths between all of its pairs. – Entertain 9/3, 2011 at 17:48

Well the assumption is that you mark your vertices upon the first pass. The only distinct path in this situation is ABCD. You go A->B->C->D. Any traversal that revisits a vertex will be discarded. – Denary 9/3, 2011 at 17:54

@Denary Adding that restriction to your question will get you better answers. "Distinct paths" and "paths with no duplicate vertices" are not the same thing. – Entertain 9/3, 2011 at 17:57

@Martinho Fernandes - Point taken. Updated the post – Denary 9/3, 2011 at 17:59

@spinning_plate: since it needs all paths between all pairs, yes, ABC is a path between A and C. But not one between A and D, of course. – Entertain 9/3, 2011 at 18:9

@spinning_plate - If only the source was A and the destination C. – Denary 9/3, 2011 at 18:10

Duplicate? - #1642639 – Sacrarium 9/3, 2011 at 18:48

@spinning_plate - The question you have referred to provides the number of distinct paths between a given pair. What I am looking for is some sort of way to approximate the 'all pairs all distinct paths' for a massive graph and ideas on how to implement it if possible using open source frameworks such as hadoop that can map-reduce the problem to make the computation feasible. – Denary 9/3, 2011 at 20:4

Ok. The Floyd-Warshall implementation in that post and this one mathoverflow.net/questions/18603/… both show a way to compute all distinct paths of length #verticies (different from above, since ABA is considered a path) Might be a useful starting point. – Sacrarium 9/3, 2011 at 21:49

@spinning_plate - These are useful links. Do you know by any chance if parallelizing the computation will speed up the processing and if so how can a graph traversal problem like this be broken down into smaller pieces computable on numerous machines? – Denary 10/3, 2011 at 0:14

S

3

The Floyd-Warshall generalization that gets a rough approximation of paths is this:

 procedure FloydWarshall ()
    for k := 1 to n
       for i := 1 to n
          for j := 1 to n
             path[i][j] = path[i][j] + path[i][k]*path[k][j]

Here is a very rough idea on how to scale this. Disclaimer - This isn't concrete -it's very hand wavy, but hopefully it helps more than confuses. It helps to understand how the algorithm works. It starts on an adjacency matrix of the graph. In each iteration k, we're saying the number of paths from i to j is equal to the number of paths in prior iterations from i to j plus the number of paths from i to j via k.

Thus Break the graph in to n arbitrary sub-adjacency matricies of size k x k, compute above on each of them. Now you have the number of paths and start combining submatricies by recomputing part of the above on the combined matrix. I.e, we only need to recompute n/2 values for k when recombining. I found this, which looks like a similar direction http://theory.stanford.edu/~oldham/publications/generalized/asgsp.pdf.

Sacrarium answered 10/3, 2011 at 21:23 Comment(2)

Doesn't this depend on the order in which it is executed? For example, wouldn't the path from i to j through k only exist if the correct iteration were run on i k and k j first? – Sisak 1/10, 2020 at 15:17

As k is increment, we build longer and longer paths. It does not matter in which direction we build these long paths. – Sisak 1/10, 2020 at 16:8

E

3

Have you thought how many such paths can exist?

In such a graph with V vertices, you have about V^2 different pairs. Let's imagine a worst case scenario where you have a full graph (one where edges exist between every pair). Paths can have anywhere between 1 edge and V-1 edges, because you do not allow duplicate vertices in a path.

Between each pair of vertices:

There is only one path with 1 edge;
There are V-2 paths with 2 edges: using any intermediate vertex that isn't the origin or destination;
There are (V-2)(V-3) paths with 3 edges: using any two distinct intermediate vertices;
There are (V-2)(V-3)(V-4) paths with 4 edges;
There are prod{k=1 -> n-1}(V-k-1) paths with n edges.

Given that, we know that there are at most V^2*sum{i=1 -> V-1}(prod{k=1 -> i-1}(V-k-1)) total paths for a graph with V vertices.

The total number of paths is therefore P(V) = V^2*sum{i=1 -> V-1}(prod{k=1 -> i-1}(V-k-1)) = V^2*sum{i=1 -> V-1}(O(V^V)) = O(V^3*V^V) = O(V!).

Now imagine an ideal world where you can compute each path in constant time. Your algorithm would take O(1*V!) = O(V!) time to run, which is impratical.

There might be an algorithm that can count the paths without enumerating them, and thus get a more efficient algorithm.

Entertain answered 9/3, 2011 at 18:31 Comment(3)

Can someone please tell me how to "mathify" my sums and products? – Entertain 9/3, 2011 at 18:31

Thanks for the analysis. I assumed it will be some obscene runtime like O(V!). I am looking for practical strategies to tackle problems of this magnitude. – Denary 9/3, 2011 at 20:58

straight html should work: & Sigma; and & Pi; (no blank after the & sign). See also List_of_XML_and_HTML_character_entity_references – Corabelle 11/4, 2011 at 15:46

S

3

The Floyd-Warshall generalization that gets a rough approximation of paths is this:

 procedure FloydWarshall ()
    for k := 1 to n
       for i := 1 to n
          for j := 1 to n
             path[i][j] = path[i][j] + path[i][k]*path[k][j]

Here is a very rough idea on how to scale this. Disclaimer - This isn't concrete -it's very hand wavy, but hopefully it helps more than confuses. It helps to understand how the algorithm works. It starts on an adjacency matrix of the graph. In each iteration k, we're saying the number of paths from i to j is equal to the number of paths in prior iterations from i to j plus the number of paths from i to j via k.

Thus Break the graph in to n arbitrary sub-adjacency matricies of size k x k, compute above on each of them. Now you have the number of paths and start combining submatricies by recomputing part of the above on the combined matrix. I.e, we only need to recompute n/2 values for k when recombining. I found this, which looks like a similar direction http://theory.stanford.edu/~oldham/publications/generalized/asgsp.pdf.

Sacrarium answered 10/3, 2011 at 21:23 Comment(2)

Doesn't this depend on the order in which it is executed? For example, wouldn't the path from i to j through k only exist if the correct iteration were run on i k and k j first? – Sisak 1/10, 2020 at 15:17

As k is increment, we build longer and longer paths. It does not matter in which direction we build these long paths. – Sisak 1/10, 2020 at 16:8

N

0

This SO page describes a DFS method for printing all non-cyclic paths between any two vertices--it also includes java code. You can modify it to find all such paths for all pairs of of vertices, but it's not the most efficient way of counting all paths between all vertices.

Nonlinearity answered 9/3, 2011 at 21:31 Comment(0)

Recommended topics

Hot tags