Finding length of shortest cycle in undirected graph
Asked Answered
Q

5

26

I tried the following :

1) DFS, keeping track of level of each vertex in my DFS tree

2) Each time a back edge (x,y) is seen, I calculate cycle length = level[x] - level[y] + 1, and save it if it is smaller than the shortest

Can someone tell a counter example for which this approach is wrong ?

What could be a better way to find shortest cycle in undirected graphs ?

Thanks.

Quartan answered 30/12, 2013 at 20:59 Comment(3)
This procedure should work correctly. By the way, if the graph has too few nodes, you can find smallest cycle with Floyd-Warshall algorithm too (implementing transitive closure matrix) But Floyd Warshall algorithm would take O(V^3) computation time while DFS is taking only O(V+E)Shrieval
@Shrieval I assumed this would work as well, until I found a question in Dasgupta which asks why the same approach is wrong and asks for a counter-example; so definitely this shouldn't be correct. Here is the link to the question I found ( look at exercise 4.4 at the end ) : cs.berkeley.edu/~vazirani/algorithms/chap4.pdfQuartan
I missed the DFS tree, sorry. IMHO BFS tree should work fine. In DFS fashion longer paths can be visited before the shorter path.Shrieval
H
35

Why DFS won't work

You cannot use DFS to find a shortest circle. We can easily create a counter example, where DFS leads finds only the longest circle. Lets have a look at the following graph:

A graph with a "long line"

As you can see we have nine nodes. If we start at the leftmost node A, the following DFS level could be possible:

Resulting depth first search tree

We have two back edges while iterating:

  • (B , A), therefore we found a circle with length 8
  • (D , A), therefore we found a circle with length 8

However, the shortest circle has length 5. It's shown in blue in the next picture, whereas one of the previously found circles is shown in red:

Long circle vs. short circle

You didn't see the blue circle because your DFS path doesn't contain it. Dagupa et al also mention this behaviour in their book:

But it also means that DFS can end up taking a long and convoluted route to a vertex that is actually very close by.

Why BFS won't work

Well, that's not entirely true, one can use BFS (see next subsection), but you cannot use your formula. Take the following graph:

No fancy picture for this graph yet.

Every "o" is a node.

        o---o
        |   |
+-------o---o-------+
|                   |
o----o----o----o----o

Lets see what levels are possible in BFS. If I start at the node in the middle, I get the following levels:

        5~~~5            ~~~ are back-edges
        |   |
+-------4~~~4-------+
|                   |
3----2----1----2----3

And if I start at the left node, I get the following levels:

        3~~~4
        |   |
+-------2---3-------+
|                   |
1----2----3----4~~~~4

Therefore, you cannot use your level formula.

Solution

Although not efficient, using an all-pair shortest path algorithm and checking the distance (i,i) for every node is a valid solution.

Hungnam answered 30/12, 2013 at 21:39 Comment(14)
+1, good examples. I'd just change the back edge term in the "fix for BFS" section, they shouldn't exist in a BFS traversal.Lys
@Leeor: What are conflicting/non-conforming edges called in a BFS? Do they have a special name?Hungnam
I think they would be cross edges, back edges connect to an ancestor, which isn't possible in BFS. I also think your complexity is too pessimistic.Lys
@Hungnam can you tell me why exactly is the complexity of the bfs solution O((|V|+|E|)|V|)?Ephebe
oh, never mind, understood that you need to check all nodes.Ephebe
@Zeta, what kind of all pair shortest path algorithm were you talking about, by the way, BFS? I've been thinking about this problem for a while, can we use BFS to find the shortest cycle in an undirected weighted graph? Also, why do we need to go back up to the node where they intersect, since we're checking all nodes, can't we just look at the node where we start our BFS from?Ephebe
@paulpaul1076 The start node of the BFS might not even be part of the circle. Think of a graph that consists of a long path and a small circle at the end of the path. If you start at the non-circle end, the first node isn't even part of the circle. For an all-pair shortest-path, you can use BFS from every node (if you're interested in edge-distance, e.g. c = 1 for all edges), or Floyd-Warshall (if your edges have non-uniform cost).Hungnam
@Hungnam but we do bfs from every node so it shouldnt matter, because we eventually start at a node that is on a minimum cycle. Also warshall floyd only finds shortest cycles in a directed graph, not undirected, so I was asking about shortest cycle in an undirected weighted graph. Shouldn't bfs work here ?Ephebe
@Hungnam I do not understand how you algorithm works when you start from the node in the middle.Resistant
But is the fixed BFS even correct? Consider a binary tree of high depth where we name three of the leaf vertices a, b, and c such that a and b share parent p(ab) and c is far away with parent p(c). We start the BFS from the root, and assume that the leaves are examined in the order a - b - c. Now add edges a--c and b--p(c). Then the shortest cycle is a--p(ab)--b--p(c)--c--a. How I interpret your fix of the bfs, you won't find this cycle while examining any of the two cross-edges b--p(c) or a--c. Or am I wrong?Hymnody
@Hymnody Let's take your example. For sake of simplicity, let's say that depth(p(c)) > depth(a). If we introduce a--c and b--p(c), we will have the following examination order if we start at the root: p(ab);a,b;c,p(c); where a,b resp. c,p(c) will be examined in the same iteration. As soon as you start from p(c), you'll encounter the back-edge (cross edge? has been a while) to c. You start from both p(c) and c to the common ancestor on that path, which is p(ab). Therefore, you found p(ab)-a-c-p(c)-b-p(ab).Hungnam
is there any better algo than this ?.Guthrun
@Hungnam But what when depth(p(c)) == depth(p(ab))? Assume p(ab) is examined before p(c) in the same layer, then the cross-edge b-p(c) will give us a big cycle, not the one we're looking for. We continue to examine a, and find the cross-edge a--c. Their least common ancestor is still far away..?Hymnody
@Hymnody it has been almost four years since I wrote that algorithm, so I had to look up what I'm actually doing, but yeah, you're correct. I removed that algorithm. There's an even simpler counter example, by the way. Take a graph that consists of a line with an odd number of nodes. Let the end points be called A and B. Introduce X and Z, and connect A-X-B and A-Z-B. The shortest cycle is A-X-B-Z-A, but the algorithm could have returned A-…-B-X/Z-A instead. Basically the same as the counter example above. All-pair shortest path will still work, though.Hungnam
R
3

I think this is what you are looking for : https://web.archive.org/web/20170829175217/http://webcourse.cs.technion.ac.il/234247/Winter2003-2004/ho/WCFiles/Girth.pdf

You make a BFS from each node, thus you have complexity O(V*E)

Resistant answered 7/11, 2016 at 0:36 Comment(5)
Thanks, the link is fixed! The pdf I linked uses BFS on an undirected graph :)Resistant
Thank you! I was thinking about another problem: find a shortest cycle that includes a particular vertex v.Kensell
I can think of O(E * deg(v)).Resistant
This algorithm pseudocode is right can find the cycle corresponding to the girth when $v\in V(C)$ trivially. D(x) + D(y) + 1 is right because x,y level difference is at most 1 when using BFS which implies the paths corresponding to x,y has no overlap, so root-...-x-y-...-root is one circuit.Prizewinner
But some prerequisite contents seems to be wrong. For the cycle we may get x,y in the left and right child tree of the root in one balanced tree, then "if we reach an already-explored vertex at depth r, then G must contain a cycle of size <= r" seems to be wrong. Because here D(y)=r and possibly D(x)=D(y), then D(x) + D(y) + 1<=r is impossible. Can someone solve with this problem?Prizewinner
S
1

Let's say we've the graph with following edges,

1<->4, 4<->2, 4<->3, 2<->3, 3<->1

Then cycle 1, 4, 2, 3, 1 could be traversed before 1, 4, 3, 1 and as we are considering DFS, no node will be visited twice. So if 1, 4, 2, 3, 1 is traversed first, no chance that 1, 4, 3, 1 or 4, 2, 3, 3 will be traversed at all. So with DFS it can NOT be assured that we will get the shortest cycle always.

Possible Improvement: A BFS tree should work fine as it goes level by level and for BFS tree distance from root to any node is fixed, no matter in which order nodes are picked. Runtime: O(V+E) while modified Floyd-Warshall's algorithm would run in O(V^3) in worst case.

Shrieval answered 30/12, 2013 at 22:4 Comment(1)
How about this graph? pastebin.com/rT7Y1AUN If you start at 1, I don't think an O(V+E) algorithm helps.Faddish
E
0

I'd like to explain an O(V * (V + E)) solution to this problem, which on sparse graphs is significantly more efficient than O(V^3) Floyd-Warshall.

First, let's fix the starting node of the cycle to be node S. Then, let's compute
dist[i] = the shortest distance from node S to node i
par[i] = the parent of node i; any node j where dist[i] = dist[j] + 1
in O(V + E) time using a simple BFS.

Now, the key idea is to find, for each edge, the shortest cycle from S that passes through that edge. Then, because any cycle must pass through some edge of the graph, the shortest of all shortest cycles that pass through a given edge will yield the shortest cycle starting at S.

But how do we find the shortest cycle from S that passes through a given undirected edge (a, b)? The shortest length is NOT just dist[a] + dist[b] + 1 (from the cycle S -> a -> b -> S), because the paths S -> a and b -> S may intersect. Consider the graph below:

enter image description here
Here, if we used dist[a] + dist[b] + 1, then we'd report a cycle of length 2 + 3 + 1 = 6, when in reality no cycle exists at all because the path from S -> a is a prefix of the path from S -> b.

So, we need to ensure that for (a, b), the path S -> a isn't a prefix of S -> b (or vice versa). Observe that this simply equivalent to checking if parent_of[a] != b && parent_of[b] != a. So as long as that condition is true, then we can use dist[a] + dist[b] + 1.

But, you might ask, what if the two paths S -> a and S -> b aren't prefixes of each other, but they still intersect, like below?

enter image description here

Here, because par[b] equals 1 and not a, neither of S->a and S->b are prefixes of the other, so we would report a cycle of length dist[a] + dist[b] + 1 = 2 + 2 + 1 = 5 (representing the cycle S -> 1 -> a -> b -> 1 -> S)! Clearly, this is not a simple cycle, and so shouldn't be counted. However, observe that we do not actually need to change the algorithm to correct this, because (a) we never reports a cycle when one doesn't exist (For instance, although S -> 1 -> a -> b -> 1 -> S isn't a simple cycle, it contains the simple cycle 1 -> a -> b -> 1. Essentially, whenever S->a and S->b share a prefix, then there is an existing cycle that can be found by removing their shared prefix), and (b) this only ever reports cycles longer than the minimum-length cycle, meaning we won't ever get a wrong answer (we won't ever get a cycle shorter than the shortest cycle). And lastly, we will always find the shortest cycle whenever we test a starting node that lies within that cycle, so we will always get the correct answer.

So, to recap:

  1. From each starting node, we will use a BFS to find the shortest cycle which passes through it.
  2. Specifically, use the BFS to calculate dist[i] and par[i].
  3. Afterwards, enumerate every edge (a, b), and find the shortest cycle starting from S that passes through that edge.
  4. As long as parent_of[a] != b && parent_of[b] != a (meaning neither of the paths S->a and S->b are a prefix of the other, so we don't ever report a cycle where none exists), we have a cycle of length dist[a] + dist[b] + 1.
  5. Taking the minimum of all dist[a] + dist[b] + 1 over all edges over all starting nodes, yields the answer.

Time Complexity: O(V(V + E)), since for each of the V starting nodes we run a O(V + E) BFS.
Space Complexity: O(V + E), since we need O(V + E) for the adjacency list and O(V) for the arrays/queues used in the BFS.

Implementation in C++:


int shortest_cycle_length(const vector<vector<int>> &adj) {
    int N = adj.size(), INF = N + 1;

    int shortest_cycle = INF; // cycle length must be <= N

    // Test every possible starting node of the shortest cycle
    for (int starting_node = 0; starting_node < N; ++starting_node) {
        // Find the shortest cycle that starts (and ends) at starting_node

        // BFS, finding for each node its shortest distance from starting_node
        // as well as its parent node in the shortest path to it
        queue<int> Q;
        vector<bool> visited(N, false);
        vector<int> dist(N, INF), parent_of(N, -1);

        Q.push(starting_node);
        visited[starting_node] = true;
        dist[starting_node] = 0;
        
        while (!Q.empty()) {
            int curr = Q.front();
            Q.pop();
            
            for (int next_node : adj[curr]) {
                
                // If we haven't visited next_node, enqueue it (normal BFS)
                if (!visited[next_node]) {
                    visited[next_node] = true;
                    dist[next_node] = dist[curr] + 1;
                    parent_of[next_node] = curr;
                    
                    Q.push(next_node);
                }
                // Otherwise, we have visited next_node previously.
                else if (parent_of[curr] != next_node) {
                    // ^^ Note that we don't need to check if parent_of[next_node] = curr,
                    // because that can never happen

                    // Then there is a cycle from starting_node through curr <=> next_node
                    // Specifically: starting_node -> curr -> next_node -> starting_node
                    // Clearly, the shortest length of this cycle is (dist[curr] + dist[next_node] + 1)
                    shortest_cycle = min(shortest_cycle, dist[curr] + dist[next_node] + 1);
                }
            }
        }
    }
    
    // return -1 if no cycle exists, otherwise return shortest_cycle
    return (shortest_cycle == INF ? -1 : shortest_cycle);
}

Note that in this implementation, I create new dist and par arrays for each starting node. If you just allocate them once at the beginning and reset them for each new starting node, you'll probably save a bit of time.

Also, notice that instead of iterating over all edges after the BFS is complete and finding the shortest cycle through each edge, I do it during the BFS. When we're at curr_node and wanting to move to next_node, then if next_node has already been visited (so we know it's dist and par values), I find right then and there the shortest cycle through the edge (curr_node, next_node). To do this, we just need to check if parent_of[curr_node] != next_node && parent_of[next_node] != curr_node, but then observe that parent_of[next_node] can never equal curr_node, because otherwise that means next_node could not have been visited previously. As a result, we only need to check if parent_of[curr_node] != next_node; if this condition is satisfied, the minimize the answer with dist[curr_node] + dist[next_node] + 1.

Emarie answered 1/4, 2023 at 16:42 Comment(0)
P
0

Here I shows one possible solution by combining DFS and BFS inspired by this baeldung blog.

I offer some notes for the pseudocode:

  1. It inits all value in depth[node] to -1 by

    First, we check if the depth of the current node is not equal to -1, which means this node has been visited before, and we have a cycle.

  2. d-depth[node] is the length of the cycle because depth[node]!=-1 means it is one ancestor. And MIN (answer, ShortestCycle(G, v, node, d + 1, depth) connects the back edge to the path of tree edges by d+1.
  3. Here we don't maintain one visited array, so it is not one mere DFS. And $for v \in G[node].neighbors do$ implies the BFS behavior.

    Note that we used all the neighbors of the current node, except for the parent.

  4. if v != parent is similar to the Labo's reference to avoid unnecessary backtracking.

    This will cause our answer to always be equal 2 from the option of going back and forth any edge. Thus, we don’t return directly to the parent node but move to visit other nodes.

  5. depth is one stack structure to store the path sequence related with the possible cycle sequence.
  6. the call ShortestCycle(G, 1, -1, i, {-1}) is also ok. Here we assume depth starting index is 1 instead of 0 normally and -1 is one invalid node index which we can use to indicate no parent for the root node.

In summary, this algorithm enumerates for all spanning trees generated by DFS with one fixed root. The above BFS behavior implies it can find the shortest cycle.

complexity
  1. The space complexity is right based on the stack structrue where G has one constant space, node, parent, d may be stored in the stack due to the recursive algorithm which has O(d) where d is the maximal depth V. depth also has O(d) due to its stack structure (i.e. will drop the added node when returning to the caller) and has the maximal depth V.

  2. The time complexity shown in the blog IMHO is wrong.

    The reason behind this complexity is that we’re visiting each vertex at most twice, and each edge at most once.

    • branching factor (See QA_1)

      Here based on BFS, each node v can branch out at most deg(v) nodes (most of time this will less due to that we have depth to ensure no duplicate nodes in one specific branch when DFS. Notice it doesn't ensure no duplicate for the whole tree-like graph (See QA_1) which allows redundant paths)

    • depth

      Then the depth of this tree-like graph is at most |V| when there is one Hamiltonian path. Then we have O(max(deg(v))^{|V|+1}), i.e. O(n^{|V|+1}) where n is the number of vertices, nodes in this tree-like graph (See QA_1). Here since max(deg(v)) is not one constant, so I didn't drop the max(deg(v)) factor when calculating max(deg(v))+...+max(deg(v))^{|V|}.

    • time complexity of each node

      For each node we execute ShortestCycle, here we don't use the adjacency list with time complexity O(deg(v)) to check adjacency but use depth[node] which has O(1) time complexity. And $for v \in G[node].neighbors do$ is already contained when branching out nodes and get the node number. So O(1) for each node.

    • In summary we have time complexity O(n^{|V|+1}).

    Here I don't one very exact estimate for time complexity because we are only to get one upper bound for O(). Anyone with one better understanding of the algorithm in the blog can improve this answer.

Thanks in advance

Prizewinner answered 2/2 at 13:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.