Topological sort with an objective function
Asked Answered
A

2

15

I have a DAG with N nodes, i.e., 1, 2, ..., N, and each node has a weight (we can call it time) x_1, x_2, ..., x_N. I want to do a topological sorting but the difficulty is that I have an objective function when sorting. My objective function is to minimize the total time between several pairs of nodes.

For example, I have a DAG with 6 nodes, and I want a specific topological sorting such that (1,3) + (2,4) is minimized, where (A,B) denotes the time between two nodes A and B. For instance, if we have a sort [1, 6, 3, 2, 5, 4, 7], (1,3) = x_6 and (2,4) = x_5. Based on the DAG, I want to find a sorting that minimizes (1,3) + (2,4).

I have been thinking this problem for a while. Generate all possible topological sorts (reference link) and calculate the objective function one by one is always a possible solution but it takes too much time if N is large. I was also suggested to use branch-bound pruning when generating all possible sorts (I am not very familiar branch-bound but I think that won't dramatically reduce the complexity).

Any (optimal or heuristic) algorithm for this kind of problem? It would be perfect if the algorithm can also be applied to other objective functions such as minimizing the total starting time for some nodes. Any suggestion is appreciated.

PS: Or alternatively, is it possible to formulate this problem as a linear integer optimization problem?

Axum answered 13/7, 2016 at 7:56 Comment(7)
I suspect this problem is hard, because 2 small variations of it definitely are: (1) If the sum of distances is replaced with the maximum of distances, then you can reduce the NP-hard Bandwidth Minimisation problem to this problem (starting from the input graph G, delete all edges and add a single root vertex with an out-edge to every other vertex -- this means that any ordering of the original n vertices is a valid topological order -- and for each edge (u, v) in G (that you deleted), add a term (u,v) to the objective function to be minimised); ...Trichinosis
... (2) if you instead keep the sum (rather than maximum) of distances between pairs, but generalise the objective function slightly to allow each term (e.g. (1,3) or (2,4) in your example) to be multiplied by a separate weight, then you can reduce the NP-hard Quadratic Assignment Problem to this problem in the same way. It may be that even the special case of the QAP where every weight is 1 is still NP-hard -- that would imply that your problem is too -- but I wasn't able to confirm this with a few googles.Trichinosis
@ j_random_hacker. Thanks for your reply. I also believe this problem is NP-hard. I realized my problem is similar to Traveling Salesman Problem. I am not aware of the Quadratic Assignment Problem and Bandwidth Minimization Problem. Thanks for bringing the connection. I already designed a heuristic method for this problem. However, I am looking for a combinatorial optimization formulation such that branch-and-bound might be applied to get the optimal solution for a comparison of my heuristic method.Axum
Remember that to show NP-hardness, we need to be able to take an arbitrary instance of some known NP-hard problem (like QAP) and mechanically convert it into an instance of your problem. For the reductions I gave, we create an instance of your problem where the only precedence constraint is that the root vertex must come first: this is a valid possible instance of your problem, so an algorithm that solves your problem must be able to solve it. Look up (e.g. with Google) how reductions from NP-hard problems can be used to show NP-hardness of a problem if you're confused.Trichinosis
(For clarity: We need to more than just show that instances of an NP-hard problem can be converted to instances of your problem -- but the key point here is that it doesn't matter if your problem allows other constraints that we don't make use of, or make full use of, in the reduction.)Trichinosis
My goal is not to show the NP-hardness. I am looking for an optimal algorithm to compare with my heuristic method. I think it might be formulated as a combinatorial optimization problem. Sorry for the confusion.Axum
The point of showing NP-hardness is to demonstrate that there (very probably) is no "good" way of solving this problem. That is, the point is to save you, and anybody who reads your question, a lot of hard work looking for such an algorithm that would amount to nothing.Trichinosis
K
1

One way to solve this is as follows:

First we run All-Pairs shortest path algorithm Floyd-Warshall. This algorithm can be coded in essentially 5 lines of code and it runs in O(V^3) time. It generates the shortest paths between each of the pair of vertices in the graph, i.e., it generates V X V matrix of shortest paths as its output.

It's trivial to modify this algorithm so that we can also get the count of vertices included in each of the O(N^2) paths. So now we can eliminate all paths that have less than N vertices. For the remaining paths, we order them by their cost and then test each of them to see if topological sort property is not violated. If this property is not violated then we have found our desired result.

The last step above, i.e., testing topological sort can be performed in O(V+E) for each of the O(V^2) paths. This yields worst case runtime of O(V^4). However in practice this should be fast because Floyd-Warshall can be made very cache friendly and we would be testing only small fraction of O(N^2) paths in reality. Also if your DAG is not dense then you might be able to optimize topological testing as well with appropriate data structures.

Ketone answered 19/7, 2016 at 9:44 Comment(3)
Please correct me if I am wrong. I think there is a problem for your method. That is, my optimal solution does not necessarily minimize the path from the start node to the end node, i.e., my optimal solution is not necessarily in the set of Floy-Warshall solutions. For example, if we have a DAG with 3 nodes, a link from 1 to 2, 1 to 3 and 2 to 3 (1 precedes 2, 1 precedes 3, etc.) and we want to minimize (1,3). The only possible topological sort is [1,2,3] and (1,3)=x_2. However, it is easy to see that the solutions of Floyd-Warshall all have a vertex count 2.Axum
I think I'm not very clear about your example. There is only one topo sort that is possible in your example so there is no way to way to minimize anything. My understanding was that you have DAG with multiple possible topo sort and you want to pick one that minimizes total cost in the "primary" path in the sorted vertices. There may be only path from x to y that you want to optimize which can be done by setting cost of other edges as 0. It will be great if you can give some non trivial full example.Ketone
In other words, all the possible solutions of Floyd-Warshall may have a count less than N. So you method will eliminate all and yield no solution in the last. But my problem always has a solution, since this is DAG and there is always a topological sort. Consider a non-trivial DAG with 4 nodes, links from A to B, B to C, A to D and D to C. I want to minimize (A,C) + (B,C). So all the shortest paths have at most 2 counts and they are all eliminated in your method. However, topological sorts yield two possible solutions [A,B,D,C] and [A,D,B,C]. The second is my optimal solution.Axum
P
0

Here's an idea:

For simplicity, first suppose that you have a single pair to optimize (I'll comment on the general case later,) and suppose you already have your graph topologically sorted into an array.

Take the array segment starting at the lower (in terms of your topological order) node of the pair, say l, and ending at the higher node, say h. For every single node sorted between l and h, calculate whether it is bounded from below by l and/or bounded from above by h. You can calculate the former property by marking nodes in an "upward" BFS from l, cutting at nodes sorted above h; and similarly, the latter by marking in a "downward" BFS from h, cutting at nodes sorted below l. The complexity of either pass will be O( B*L ), where B is the branching factor, and L is the number of nodes originally sorted between l and h.

Now, all nodes that are not bounded from above by h can be moved above h, and all nodes that are not bounded from below by l can be moved below l (the two sets may overlap,) all without violating the topological sorting of the array, provided that that the original sorted order within each group of nodes relocated upward or downward is preserved.

This same procedure can be applied to as many pairs as needed, provided that the segments that they cut from the original sorting order do not overlap.

If any two pairs overlap, say (l1, h1) and (l2, h2), so that e.g. l1 < l2 < h1 < h2 in the original sorted order, you have the following two cases:

1) In the trivial case where h1 and l2 happen to be unrelated in the topological order, then you should be able to optimize the two pairs mostly independently from each other, only applying some care to move either l2 above h1 or h1 below l2 (but not e.g. h1 below l1, if that should turn out to be possible.)

2) If l2 < h1 in the topological order, you can treat both pairs as the single pair (l1, h2), and then possibly apply the procedure once more to (l2, h1).

As it's not clear what the complete process will achieve in the nontrivial overlapping case, especially if you have more complicated overlapping patterns, it may turn out to be better to treat all pairs uniformly, regardless of overlapping. In this case, the order can be processed repeatedly while each run yields an improvement over the previous (I'm not sure if the procedure will be monotonic in terms of the objective function though -- probably not.)

Placer answered 13/6, 2017 at 15:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.