Finding all longest unique paths in tree
Asked Answered
C

7

7

Let us assume that we have a tree consisting on N nodes. The task is to find all longest unique paths in the tree. For example, if the tree looks like following:

Sample Tree

Then there are three longest unique paths in the tree: 1 - 2 - 3 - 4 - 5, 6 - 2 - 3 - 4 - 5 and 1 - 2 - 6.

I want to programmatically find and store all such paths for a given tree. One way to do it would be to compute paths between each pair of node in the tree and then reject the paths which are contained in any other path. However, I am looking for an efficient way to do it. My questions are as follows:

  • Is it possible to compute this information in less than O(N^2)? I have not been able to think of a solution which would be faster than O(N^2).
  • If yes, could you be kind enough to guide me towards the solution.

The reason why I want to try it out is because I am trying to solve this problem: KNODES

Chomp answered 19/8, 2015 at 8:49 Comment(26)
Might be missing the point, but won't all the longest paths be basically all paths from the root (assuming directed trees, as picture suggests) to all the leaves? This can be done using DFS with linear time.Rickeyricki
The picture is misleading. We have undirected trees. I will replace the picture.Chomp
how about bfs? Last layer vertices without edges are winnersInteract
It seems like find all longest unique path is overkilled for your problemWoody
@PhamTrung: I just want to think about the problem of finding all unique paths in general. I have encountered it twice this week, and I was wondering if it is possible to efficiently compute all unique longest paths. I have been unable to come up with an efficient solution.Chomp
@Chomp If the path is longest then how can be there more than 1 solution?Houseroom
It is a graph or a binary tree with root node 3?Interact
So, if the graph is undirected, why is 6-2-3-4-5 not counted as well? WHy is it not considered unique?Rickeyricki
@Chomp If the paths are unique, then there can only be one longest path according to me and by the way the two paths you mentioned have the edge 1-2 common in them so how are they unique?Houseroom
@Houseroom uniq != some common edges, uniq == all common edges?Interact
Your ques is too unclearScurvy
@amit: Corrected. Thank you!Chomp
@vish4071: Can you please let me know what part are you finding difficult to understand? I will try to provide a better description.Chomp
You called 1 - 2 - 6 as a longest path, while it is not longest. It is looks like you are trying to find all paths from each final nodes. All path between 1, 6, and 5 nodes in your example. They are not longest.Interact
as fl00r mentioned, It's totally unclear what "longest" means in this context.Mok
@Bhoot, ^this. Why is 1-2-6 longest path? Do you want all paths from all leaf nodes to every other leaf node? Is your tree a binary tree? Other specifics that might help.Scurvy
@vish4071: it is unique. There is no other path which contains nodes 1 - 2 - 6.Chomp
@Bhoot, 126 is unique, ok. But how is it longest? See my above comment and answer all those questions please (if answers below have not solved your problem, that is)Scurvy
`@Chomp Your question is unclear! Eg you are not explaining the nature of your graph or what you mean by "longest" or "unique". We cannot read your mind. Also, examples can only be an aid in confirming understanding of words saying what is going on.Vladimar
`@Chomp One definition of a tree is "an undirected graph in which any two vertices are connected by exactly one path" so maybe you mean something like, with that definition, all paths with distinct leaf end nodes? (Which seems consistent with your proposed algorithm.)Vladimar
What you propose is not a solution to the problem that you linked to, so I think you're asking the wrong questionSileas
@philipxy: The objective is to find that given a set of nodes, does there exist a path in which contains ALL the nodes in the set. For this purpose, I am trying to solve the problem by storing all possible 'superpaths'.Chomp
@NiklasB. Yes, there are alternate ways to solve the problem. I am trying to think in terms of whether we can solve it in this way. The objective of this question is to think what is the best way in which we can store all unique longest paths in the graph. Terminology in question might be a little confusing but I dont see a better way to put it forth. :)Chomp
@Bhoot: "The objective is to find that given a set of nodes, does there exist a path in which contains ALL the nodes in the set" : that is far from your initial question Assuming you mean "subset of nodes", the answer is O(N): if the subset contains more than two leafs, the answer is "no"; if the subset contains two leafs, compute if exists a path between them using only nodes in the subset, O(N); zero or one leaf nodes are trivial cases.Heteronomy
Let's take the tree which looks like the star with n nodes and n-1 edges. Then you have got C(n-1, 2) unique longest paths. So the lower limit of complexity is still O(n^2)Buttonhook
Hint: draw the tree. Let v be a vertex and p its parent. The length of the longest path including v but not p = (height of left subtree of v) + (height of right subtree of v). Maximise over v.Ehrman
B
2

An algorithm with a time complexity below O(N^2) may only exist, if every solution for a tree with N nodes can be encoded in less than O(N^2) space.

Suppose a complete binary tree with n leaves (N=n log n). The solution to the problem will contain a path for every set of 2 leaves. That means, the solution will have O(n^2) elements. So for this case we can encode the solution as the 2-element sets of leaves.

Now consider a nearly complete binary tree with m leaves, which was created by only removing arbitrary leaves from a complete binary tree with n leaves. When comparing the solution of this tree to that of the complete binary tree, both will share a possibly empty set of paths. In fact for every subset of paths of a solution of a complete binary tree, there will exist at least one binary tree with m leaves as mentioned above, that contains every solution of such a subset. (We intentionally ignore the fact that a tree with m leaves may have some more paths in the solution where at least some of the path ends are not leaves of the complete binary tree.)

Only that part of the solution for a binary tree with m leaves will be encoded by a number with (n^2)/2 bits. The index of a bit in this number represents an element in the upper right half of a matrix with n columns and rows.

For n=4 this would be:

x012
xx34
xxx5

The bit at index i will be set if the undirected path row(i),column(i) is contained in the solution.

As we have already statet that a solution for a tree with m leaves may contain any subset of the solution to the complete binary tree with n>=m leaves, every binary number with (n^2)/2 bits will represent a solution for a tree with m leaves.

Now encoding every possible number with (n^2)/2 bits with less than (n^2)/2 is not possible. So we have shown that solutions at least require O(n^2) space to be represented. Using N=n log n from above we yield a space requirement of at least O(N^2).

Therefore there doens't exist an algorithm with time complexity less than O(N^2)

Buddle answered 19/8, 2015 at 19:9 Comment(0)
L
1

As far as I could understand, you have a tree without a selected root. Your admissible paths are the paths that do not allow to visit tree nodes twice (you are not allowed to return back). And you need to find all such admissible paths that are not subpaths of any admissible path.

So if I understood right, then if a node has only one edge, than the admissible path either start or stop at this node. If tree is connected, then you can get from any node to any node by one admissible path.

So you select all nodes with one edge, call it S. Then select one of S and walk the whole tree saving the paths to the ends (path, not the walk order). Then you do this with every other item in S and remove duplicated paths (they can be in reverse order: like starting from 1: 1 - 2 - 6 and starting from 6: 6 - 2 - 1).

So here you have to visit all the nodes in the tree as much times as you have leafs in the tree. So complexity depends on the branching factor (in the worst case it is O(n^2). There are some optimizations that can reduce the amount of operations, like you don't have to walk the tree from the last of S.

Languor answered 19/8, 2015 at 9:34 Comment(4)
So you select all nodes with one edge - that's called a leaf (in undirected graph, a leaf is defined as a vertex with d(v)=1, where d(.) is the degree).Rickeyricki
I mentioned this in the last paragraph. The reason I didn't use it in the description, that the question was asked in terms of trees and usually in the tree the root node doesn't count as leaf. So in this situation it would be unclear should the root node (if it is a node of degree 1) be in S or not.Languor
The definition of root is a vertex where all nodes are reachable from it. It doesn't make much sense to talk about a root in an undirected graph, and especially a tree, because by definition - each node is a root.Rickeyricki
I don't think it worth arguing about the definitions, there can be many of them depending on the authors ideas put in the definition. Like natural numbers can include or exclude 0.Or for example in wikipedia ("a rooted graph is a graph in which one vertex has been distinguished as the root"). Definition provided by you has also much sense. I tried to wrote such way it couldn't be misunderstood. If it came more confusing, next time I'll use "leaf" in this situation.Languor
I
0

enter image description here

In this picture the longest paths are {1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 3, 6}, {1, 2, 3, 7}, {1, 2, 3, 8}, {1, 2, 3, 9}, {1, 2, 3, 10}

For tree like this storing all longest paths will cost you O(N2)

Irrelevant answered 19/8, 2015 at 9:58 Comment(5)
That is not correct. Storing this information can also be done in O(N). You can represent a path by a concatenation of two other paths. So therefore this is no proof that a better complexity class is not achievable.Buddle
Can you give an example pleaseIrrelevant
We have one path {1,2,3} named a. Then we have paths {3,4} named b4, {3,5} named b5 and so on. The solution is the concatenated paths {a,b4}, {a,b5} and so on. This solution takes O(N) space for path a. Another O(N) space is used for paths b<i> and another O(N) space is used for the concatenated paths. Leaving a total of O(N) space.Buddle
you are storing paths so that you that you can efficiently traverse these paths in efficiently (i.e. O(N)) but with such storage you cannot stop the computation process from O(N²) cost. What you are doing basically is storing the longest paths in forest (i.e. list of trees). In your example a is nothing but a path in the tree {1, 2, 3}. So, you are just transforming my problem into another problem. If node 1 had two more children 1' and 1", then you had to store the path {1', 1, 2, 3} & {1", 1, 2, 3} into another list. Now what will be the concatenated paths for this change?Irrelevant
The point is to have a class of trees where the solutions can not be encoded with less than O(N^2) space. I show such a class of trees in my answer. Just saying that you are able to use O(N^2) space for encoding solutions isn't helpful, because solutions that only use O(N) space can always be bloated to use O(N^2) space. Proving the other way is the point here.Buddle
I
0

Suppose you have got Binary Tree like on your picture.

class Node(object):
  def __init__(self, key):
    self.key = key
    self.right = None
    self.left = None
    self.parent = None

  def append_left(self, node):
    self.left = node
    node.parent = self

  def append_right(self, node):
    self.right = node
    node.parent = self

root = Node(3)
root.append_left(Node(2))
root.append_right(Node(4))
root.left.append_left(Node(1))
root.left.append_right(Node(6))
root.right.append_right(Node(5))

And we need to get all paths between all leaves. So in your tree they are:

  • 1, 2, 6
  • 6, 2, 3, 4, 5
  • 1, 2, 3, 4, 5

You can do this in (edit: not linear, quadratic) time.

def leaf_paths(node, paths = [], stacks = {}, visited = set()):
  visited.add(node)

  if node.left is None and node.right is None:
    for leaf, path in stacks.iteritems():
      path.append(node)
      paths.append(path[:])
    stacks[node] = [node]
  else:
    for leaf, path in stacks.iteritems():
      if len(path) > 1 and path[-2] == node:
        path.pop()
      else:
        path.append(node)

  if node.left and node.left not in visited:
    leaf_paths(node.left, paths, stacks, visited)
  elif node.right and node.right not in visited:
    leaf_paths(node.right, paths, stacks, visited)
  elif node.parent:
    leaf_paths(node.parent, paths, stacks, visited)
    
  return paths

for path in leaf_paths(root):
  print [n.key for n in path]

An output for your tree will be:

[1, 2, 6]

[6, 2, 3, 4, 5]

[1, 2, 3, 4, 5]

The idea is to track all visited leaves while traversing a tree. And to keep stack of paths for each leaf. So here is memory/performance tradeoff.

Interact answered 19/8, 2015 at 15:42 Comment(6)
Why should the visited set be needed? If you are starting with the root, then the parent of a visited node was already visited at the time the node is visited.Buddle
@Buddle we visit parent nodes up to three times during traversal. It is special traversal, not like in/pre/post order tree traversal. It is "traveller" traversal. So you need to visit previous node to go right after left. For simple tree B ← A → C your traversal should be "a->b->a->c->a". You could optimize solution by starting at most left leaf and finishing at most right leaf so you will make less steps, but you will still need visited set.Interact
@Buddle I used visited set mostly to simplify code. You can always check where did you come on current recursion step, so you could write little more code and throw out visited set. The idea is if we came from right we should return back to parent otherwise go to right child (if exists)Interact
Could you please elaborate more on why your algorithm is linear? This is far from obvious because of the recursion and iteration over the stacks.Buddle
@Buddle ehm. Looks like you are absolutely right. It is not linear, O(n*l)~O(n^2) :( I didn't get into account that inner stacks loop. Thank youInteract
@Buddle there is a trick with path compression, so you don't need to iterate over all stacks, but only specific ones (paths are shared between leaves). But I can't say if it will improve this algorithm asymptoticallyInteract
B
0

Let's take the tree which looks like the star with n nodes and n-1 edges.

Then you have got C(n-1, 2) unique longest paths.

So the lower limit of complexity can't be less than O(n^2).

Buttonhook answered 19/8, 2015 at 19:16 Comment(5)
But an algorithm could check in O(N) time, if the tree is a star and then write the plain string C(n-1,2) as the solution in constant time. So if you do not require that for a solution all paths are to be exactly written out with the node labels, then your answer doesn't proof that there won't exist a better solution.Buddle
But @Chomp said: "I want to programmatically find and store all such paths for a given tree."Buttonhook
Yes, and the plain string C(n-1,2) does store this information for a tree in the form of a star if interpreted correctly. So as no information is lost on storage why shouldn't this be used?Buddle
I don't understand, how would you like to generate C(n-1, 2) informations in less than O(n^2) complexity? We should store all of them. I think @Chomp had it in mind.Buttonhook
I don't generate C(n-1,2) informations but only the plain string C(n-1,2) with its 8 chars. The OP doesn't define what should happen with the stored information later on except the mentioning of KNODES. So this is perfectly fine. When considering queries if a path is contained in the set, than testing against C(n-1,2) is easy. We just need to parse it and see if the node labels in question are within the range of n.Buddle
E
0

Draw the tree. Let v be a vertex and p its parent. The length of the longest path including v but not p = (height of left subtree of v) + (height of right subtree of v).

The maximum over all v is the longest path in the graph. You can calculate this in O(n):

First calculate all the intermediate heights. Start at the leaves and work up: (height below v) = 1 + max(height below left child, height below right child)

Then calculate the sum (height of left subtree of v) + (height of right subtree of v) for each vertex v, and take the maximum. This is the length of longest path in the graph.

Ehrman answered 20/8, 2015 at 15:32 Comment(0)
A
0

This can be seen as a generalization of the tree diameter problem, because the largest of those lengths equals the diameter of the tree. The height of a tree for every node as root is easily found in O(n2) time by running a known algorithm from that node. You can look up how to find the height of a tree with given root, I'm not going to discuss it here. I'll explain how to do it in O(n) time using 2 DFS.

Since it is a tree, every node has one parent, which is the node 'p' from where the node 'u' was discovered during DFS. Similarly, the nodes discovered from 'u' are considered its children.

We run two DFS. The first one calculates the height of the tree rooted at 'u' considering its children. The second DFS calculates the length of the longest path starting at 'u' through its parent 'p'. This makes sense if you imagine the tree being rooted at 'u', when 'p' becomes one of its children. The maximum of these two values is the height of the tree rooted at node 'u' a.k.a. the maximum length of a path that begins at node 'u'. The sum of these two values is the diameter of the tree rooted at node 'u'.

However, your question includes another part which is to print only unique paths. In other words, exclude the paths that are contained in another. That is a whole different algorithm question, and I leave it to you with a hint. The idea is that every path that is included in a larger path is the prefix of some suffix of the larger path.

I give a Python implementation below for calculating the heights, explanation inline with the code.

def all_longest_paths(n: int, edges: list[list[int]]) -> list[int]:
    g: dict[int, list[int]] = defaultdict(list)
    for u, v in edges:
        g[u].append(v)
        g[v].append(u)

    heights = [[-1, -1] for _ in range(n)]
    _calculate_height(g, heights, 0, -1)
    longest_paths_up = [0] * n
    _calculate_longest_path_up(g, heights, longest_paths_up, 0, -1)
    # Sum for diameter.
    return [max(heights[i][0], longest_paths_up[i]) for i in range(n)]


def _calculate_height(g: dict[int, list[int]], heights: list[list[int]], u: int, p: int) -> None:
    """
    Calculates the height of the tree rooted at node u and sets the value at heights[u].
    Actually, it calculates the top 2 heights such that heights[u][0] >= heights[u][1].
    If u is a leaf, heights = [0, 0]. If u has only one child, then, heights[u][1] = 0.

    :param g: graph as adjacency list
    :param heights: the heights array, it is updated in place
    :param u: the root node
    :param p: the parent node of u
    :return: nothing
    """
    for v in filter(lambda x: x != p, g.get(u, [])):
        _calculate_height(g, heights, v, u)
        ht_v = heights[v][0]
        if ht_v > heights[u][0]:
            heights[u][1], heights[u][0] = heights[u][0], ht_v
        elif ht_v > heights[u][1]:
            heights[u][1] = ht_v
    heights[u][0] += 1
    heights[u][1] += 1


def _calculate_longest_path_up(
    g: dict[int, list[int]], heights: list[list[int]], lp_up: list[int], u: int, p: int
) -> None:
    """
    Calculates the length of the longest path of the tree rooted at node u through its
    parent p and sets the value at lp_up[u].

    :param g: graph as adjacency list
    :param heights: the heights array
    :param lp_up: the longest paths through p array, it is updated in place
    :param u: the root node
    :param p: the parent node of u
    :return: nothing
    """
    for v in filter(lambda x: x != p, g.get(u, [])):
        # If the longest path starting at u goes through v,
        # then we use another path that goes through a
        # sibling of v.
        if heights[u][0] == heights[v][0] + 1:
            ht = heights[u][1]
        else:
            # The longest path starting at u doesn't go
            # through v, so, use that value.
            ht = heights[u][0]
        # The longest path starting at v may go through
        # one of its siblings x (v-u-x-...), or it may
        # not (v-u-p-...).
        lp_up[v] = max(ht, lp_up[u]) + 1
        _calculate_longest_path_up(g, heights, lp_up, v, u)

If you prefer visual aid, there's a YouTube video also, although I found it somewhat long and rambling.

Atul answered 15/7, 2023 at 12:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.