Minimizing weighted sum
Asked Answered
N

3

9

I came across this problem quite recently. Suppose there are n points on x-axis, x[0],x[1] .. x[n-1]. Let the weight associated with each of these points be w[0],w[1] .. w[n-1]. Starting from any point between 0 to n-1, the objective is to cover all the points such that the sum of w[i]*d[i] is minimized where d[i] is the distance covered to reach the ith point from the starting point.

Example:
Suppose the points are: 1 5 10 20 40
Suppose the weights are: 1 2 10 50 13
If I choose to start at point 10 and choose to move to point 20 then to 5 then to 40 and then finally to 1, then the weighted sum will become 10*0+50*(10)+2*(10+15)+13*(10+15+35)+1*(10+15+35+39).

I have tried to solve it using greedy approach by starting off from the point which has maximum associated weight and move to second maximum weight point and so on. But the algorithm does not work. Can someone give me pointers about the approach which must be taken to solve this problem?

Nimbus answered 19/2, 2014 at 9:13 Comment(5)
This is a simpler variation of TSP. Actually, I am pretty sure we can reduce TSP with |E|<=2|V| to this problem...Scheers
What is the scale of the problem? Brute force will obviously work, but will be very inefficient...Scheers
Hint: you will always need to start from the left most or right most point in the x axis. Think about the x axis condition more! there is a greedy based on that :)Kasiekask
@PhamTrung What about points 0, 1, 2 with weights 0, 1000, 0?Dunno
@Scheers The set of points covered is always an interval, so the size of the DP table should be manageable.Dunno
F
3

There's a very important fact that leads to a polynomial time algorithm:

Since the points are located on some axis, they generate a path graph, which means that for every 3 vertices v1,v2,v3, if v2 is between v1 and v3, then the distance between v1 and v3 equals the distance between v1 and v2 plus the distance between v2 and v3. therefor if for example we start at v1, the obj. function value of a path that goes first to v2 and then to v3 will always be less than the value of the path that first goes to v3 and then back to v2 because:

value of the first path = w[2]*D(v1,v2)+W[3]*(D(v1,v2)+D(v2,v3))

value of the second path = w[3]*D(v1,v3)+W[2]*((v1,v3)+D(v3,v2)) = w[3]*D(v1,v2)+w[3]*D(v2,v3)+w[2]*(D(v1,v2)+2*D(v3,v2))

If we subtract the first path value from the second, we are left with w[2]*2*D(v3,v2) which is equal to or greater than 0 unless you consider negative weights.

All this means that if we are located at a certain point, there are always only 2 options we should consider: going to closest unvisited point on the left or the closest unvisited point on the right.

This is very significant as it leaves us with 2^n possible paths rather than n! possible paths (like in the Travelling Salesman Problem).

Solving the TSP/minimum weight hamiltonian path on path graphs can be done in polynomial time using dynamic programming, you should apply the exact same method but modify the way you calculated the objective function.

Since you don't know the starting vertex, you'll have to run this algorithm n time, each time starting from a different vertex, which means the running time will be multiplied by n.

Fred answered 19/2, 2014 at 14:51 Comment(6)
it is a great idea to run the dynamic programming algorithm n times due to the starting vertex dependency! +1Itch
How do you propose to reduce the problem to TSP?Dunno
What if the closest unvisited point on the left and the closest unvisited point on the right has the same distance? it looks that it needs to consider the next point.Dreda
@DavidEisenstat The problem is not reduced to the TSP, it can be solved in similar manner using DP in polynomial time, to be more precise: When solving the TSP with DP we have n stages, in each stage we fill a table with n columns and (|V| choose n-1) rows. When solving such a problem on a path graph, at each stage we only have 2 columns and 1 row as there is only one possible subset of vertices we already visited if got to each of these points at a certain state. When filling the table, we can use the objective function mentioned above rather the TSP's obj. function.Fred
@Dreda I don't see how this should affect the algorithm. you'll find want to get to one of these points before any other point, whether the distance to each of them is equal or not.Fred
@RonTeller Can you please tell me how can you conclude "All this means that if we are located at a certain point, there are always only 2 options we should consider: going to closest unvisited point on the left or the closest unvisited point on the right.", according to above logic ? Please explain.Kidskin
F
0

Maybe you should elaborate what you mean that the algorithm "does not work". The basic idea of the greedy approach that you described seems feasible for me. Do you mean that the greedy approach will not necessarily find the optimal solution? As it was pointed out in the comments, this might be an NP-complete problem - although, to be sure, one would have to analyze it further: Some dynamic programming, and maybe some prefix sums for the distance computations could lead to a polynomial time solution as well.

I quickly implemented the greedy solution in Java (hopefully I understood everything correctly...)

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

public class MinWeightSum
{
    public static void main(String[] args)
    {
        double x[] = { 1, 5, 10, 20, 40 };
        double w[] = { 1, 2, 10, 50, 13 };

        List<Integer> givenIndices = Arrays.asList(2, 3, 1, 4, 0);
        Path path = createPath(x, w, givenIndices);
        System.out.println("Initial result "+path.sum);

        List<Integer> sortedWeightIndices =
            computeSortedWeightIndices(w);
        Path greedyPath = createPath(x, w, sortedWeightIndices);
        System.out.println("Greedy result  "+greedyPath.sum);

        System.out.println("For "+sortedWeightIndices+" sum "+greedyPath.sum);
   }

    private static Path createPath(
        double x[], double w[], List<Integer> indices)
    {
        Path path = new Path(x, w);
        for (Integer i : indices)
        {
            path.append(i);
        }
        return path;
    }


    private static List<Integer> computeSortedWeightIndices(final double w[])
    {
        List<Integer> indices = new ArrayList<Integer>();
        for (int i=0; i<w.length; i++)
        {
            indices.add(i);
        }
        Collections.sort(indices, new Comparator<Integer>()
        {
            @Override
            public int compare(Integer i0, Integer i1)
            {
                return Double.compare(w[i1], w[i0]);
            }
        });
        return indices;
    }

    static class Path
    {
        double x[];
        double w[];

        int prevIndex = -1;
        double distance;
        double sum;

        Path(double x[], double w[])
        {
            this.x = x;
            this.w = w;
        }

        void append(int index)
        {
            if (prevIndex != -1)
            {
                distance += Math.abs(x[prevIndex]-x[index]);
            }
            sum += w[index] * distance;
            prevIndex = index;
        }
    }
}

The sequence of indices that you described in the example yields the solution

For [2, 3, 1, 4, 0] sum 1429.0

The greedy approach that you described gives

For [3, 4, 2, 1, 0] sum 929.0

The best solution is

For [3, 2, 4, 1, 0] sum 849.0 

which I found by checking all permutations of indices (This is not feasible for larger n, of course)

Ferriage answered 19/2, 2014 at 11:43 Comment(3)
dynamic programming does not seem to work because the system has memory. The cost associated to point 5 for instance depends whether you started your walking in a given point x or another point y.Itch
@Itch You're probably right, this was just "brainstorming" about possible approaches for tackling these kinds of minimization problems. I also thought about formulating this as some sort of shortest path problem, but have not analyzed this in detail. IFF it is NP-complete, it's ... "unlikely" ;-) ... that one finds a polynomial time solution anyhow.Ferriage
@Itch I have put in a DP answer because I think that although the absolute cost for future points depends on the distance travelled so far you can split off the distance-dependent component and account for it - traveling an extra unit of distance always costs you the total weight of all the points not yet covered, regardless of the journey you take to visit them, and so it does not change the relative worth of different future itineraries.Vociferous
V
0

Suppose you are part way through a solution and have traveled for distance D so far. If you go a further distance x and see a point with weight w it costs you (D + x)w. If you go a further distance y and see a point with weight v it costs you (D + x + y)v.. If you sum all of this up there is a component that depends on the path you take after the distance D: xw + xv + yv+..., and there is a component that depends on distance D and the sum of the weights of the points that you need to carry: D (v + w + ...). But the component that depends on distance D does not depend on anything else except the sum of the weights of the points you need to visit, so it is fixed, in the sense that it is the same regardless of the path you take after going distance D.

It always make sense to visit points we pass as we visit them, so the best path will start off with a single point (possibly at the edge of the set of points to be visited and possibly in the centre) and then expand this to an interval of visited points, and then expand this to visit all the points. To pre-calculate the relative costs of visiting all points outside the interval we only need to know the current position and the size of the interval, not the distance travelled so far.

So an expensive but polynomial dynamic programming approach has as the state the current position (which must be one of the points) the position of the first, if any, unvisited point to the left of the current position, and the position, if any, of the first unvisited point to the right of the current point. There are at most two points we should consider visiting next - the point to the right of the current point and the point to the left of the current point. We can work out the cost of these two alternatives by looking at pre-computed costs for states with fewer points left, and store the best result as the best possible cost from this point. We could compute these costs under the fiction that D=0 at the time we reach the current point. When we look up stored costs they are also stored under this assumption (but with D=0 at their current point, not our current point), but we know the sum of the weights of points left at that stage, so we can add to the stored cost that sum of weights times the distance between our current point and the point we are looking up costs for to compensate for this.

That gives cost O(n^3), because you are building a table with O(n^3) cells, with each cell the product of a relatively simple process. However, because it never makes sense to pass cells without visiting them, the current point must be next to one of the two points at either end of the interval, so we need consider only O(n^2) possibilities, which cuts the cost down to O(n^2). A zig-zag path such as (0, 1, -1, 2, -2, 3, -3, 4, -4...) might be the best solution for suitably bizarre weights, but it is still the case, even for instance when going from -2 to 3, that -2 to is the closest point not yet taken between the two points 3 and -3.

I have put an attempted java implementation at http://www.mcdowella.demon.co.uk/Plumber.java. The test harness checks this DP version against a (slow) almost exhaustive version for a number of randomly generated test cases of length up to and including 12. It still may not be completely bug-free, but hopefully it will fill in the details.

Vociferous answered 24/2, 2014 at 21:10 Comment(4)
Why can't this be done in O(n^2)? "It always make sense to visit points we pass as we visit them". Therefore if the starting point is fixed, we have only two paths (1. Go to left end and then reach right end, 2. Go to right end and then reach left end).Aldric
I believe you are correct to say there is an O(n^2) solution and have added text to my answer, but I am not sure whether we agree on how complex the possible best answers might be.Vociferous
The theory looks correct. But we should prove it :) I think it's possible to prove this by contradiction. But I'm not a good proof writer. I'll award 50 bounties to you/anyone who can prove this :DAldric
It's not the same thing as a proof but as edited into my answer I have put an implementation plus test harness at mcdowella.demon.co.uk/Plumber.java. Depending on how clear my Java is you may or may not find enough info here for a proof. If you find this helpful, perhaps you could tell me where this problem comes from - at the moment it looks like an exercise constructed to show DP in action, but I'd love to be proved wrong about that.Vociferous

© 2022 - 2024 — McMap. All rights reserved.