Algorithm to "transfer water from a set of bottles to another one" (metaphorically speaking)

Asked 27/2, 2011 at 13:40 Answered 27/2, 2011 at 14:4

Solved algorithm computer-science greedy

Ok, I have a problem. I have a set "A" of bottles of various sizes, all full of water. Then I have another set "B" of bottles of various sizes, all empty.

I want to transfer the water from A to B, knowing that the total capacity of each set is the same. (i.e.: Set A contains the same amount of water as set B).

This is of course trivial in itself, just take the first bottle in B, pour it in the first in A until this is full. Then if the bottle from B has still water in it, go on with the second bottle in A, etc.

However, I want to minimize the total number of pours (the action of pouring from a bottle into another, each action counts 1, independently from how much water it involves)

I'd like to find a greedy algorithm to do this, or if not possible at least an efficient one. However, efficiency is secondary to correctness of the algorithm (I don't want a suboptimal solution).

Of course this problem is just a metaphor for a real problem in a computer program to manage personal expenses.

Assamese answered 27/2, 2011 at 13:40 Comment(2)

Sounds like the knapsack problem. – Llano 27/2, 2011 at 13:43

Mmm.. actually it's quite different, in that case we maximize the amount, here the amount is irrelevant, only the "move" actions count.. – Assamese 27/2, 2011 at 14:6

Bad news: this problem is NP-hard by a reduction from subset sum. Given numbers x₁, …, x_n, S, the object of subset sum is to determine whether or not some subset of the x_is sum to S. We make A-bottles with capacities x₁, …, x_n and B-bottles with capacities S and (x₁ + … + x_n - S) and determine whether n pours are sufficient.

Good news: any greedy strategy (i.e., choose any nonempty A, choose any unfilled B, pour until we have to stop) is a 2-approximation (i.e., uses at most twice as many pours as optimal). The optimal solution uses at least max(|A|, |B|) pours, and greedy uses at most |A| + |B|, since every time greedy does a pour, either an A is drained or a B is filled and does not need to be poured out of or into again.

~~There might be an approximation scheme (a (1 + ε)-approximation for any ε > 0).~~ I think now it's more likely that there's an inapproximability result – the usual tricks for obtaining approximation schemes don't seem to apply here.

Here are some ideas that might lead to a practical exact algorithm.

Given a solution, draw a bipartite graph with left vertices A and right vertices B and an (undirected) edge from a to b if and only if a is poured into b. If the solution is optimal, I claim that there are no cycles – otherwise we could eliminate the smallest pour in the cycle and replace the lost volume going around the cycle. For example, if I have pours

a1 -> b1: 1
a1 -> b2: 2
a2 -> b1: 3
a2 -> b3: 4
a3 -> b2: 5
a3 -> b3: 6

then I can eliminate by a1 -> b1 pour like so:

a2 -> b1: 4 (+1)
a2 -> b3: 3 (-1)
a3 -> b3: 7 (+1)
a3 -> b2: 4 (-1)
a1 -> b2: 3 (+1)

Now, since the graph has no cycle, we can count the number of edges (pours) as |A| + |B| - #(connected components). The only variable here is the number of connected components, which we want to maximize.

I claim that the greedy algorithm forms graphs that have no cycle. If we knew what the connected components of an optimal solution were, we could use a greedy algorithm on each one and get an optimal solution.

One way to tackle this subproblem would be to use dynamic programming to enumerate all subset pairs X of A and Y of B such that sum(X) == sum(Y) and then feed these into an exact cover algorithm. Both steps are of course exponential, but they might work well on real data.

Choler answered 27/2, 2011 at 14:4 Comment(2)

i think my answer is optimal so please take a look – Manaus 27/2, 2011 at 22:2

Nice reduction. But it took a moment to click that you meant B has 2 bottles, one having capacity S and the other (x_1 + ... + x_n - S) -- maybe you could make that clearer? – Soekarno 27/2, 2011 at 22:29

Here's my take:

Identify bottles having the exact same size in both sets. This translate to one-to-one pour for these same-size bottles.
Sort the remaining bottles in A in descending order by capacity, and sort remaining bottles in B in ascending order. Compute the number of pours you need when pouring sorted list in A to B.

Update: After each pour in step 2, repeat step 1. (Optimization step suggested by Steve Jessop). Rinse and repeat until all water is transferred.

Abrasion answered 27/2, 2011 at 13:45 Comment(3)

At stage 2, after partially emptying or partially filling a bottle, perhaps it should be re-inserted into the list with its new size (remaining water in the case of A, remaining space in the case of B)? Costs presumably an extra O(log(n)) operation per pour, but it seems reasonable to me that if the "sorted list" feature of this solution is good to have before the first pour, then it should be good to have before the second pour too - re-apply as many available tools to the smaller problem as we can, basically. Perhaps re-check for equal pairs too, also O(log n) per pour. – Neoclassic 27/2, 2011 at 14:6

@Steve: interesting optimization. :) – Abrasion 27/2, 2011 at 14:10

Nice one.. I think it's a better approx than the one above.. but still it doesn't work very well (for example on the example given by Goran above it produces 7 steps, vs 5 optimal), already considering the reordering suggested by steve. – Assamese 27/2, 2011 at 14:15

i think this gives the minimum number of pours:

import bisect

def pours(A, B):
    assert sum(A) == sum(B)
    count = 0
    A.sort()
    B.sort()
    while A and B:
        i = A.pop()
        j = B.pop()
        if i == j:
            count += 1
        elif i > j:
            bisect.insort(A, i-j)
            count += 1
        elif i < j:
            bisect.insort(B, j-i)
            count += 1
    return count

A=[5,4]
B=[4,4,1]
print pours(A,B)
# gives 3

A=[5,3,2,1] 
B=[4,3,2,1,1]
print pours(A,B)
# gives 5

in English it reads:

assert that both lists have the same sum (i think the algorithm will still work if sum(A) > sum(B) or sum(A) < sum(B) is true)
take the two lists A and B, sort both them

while A isn't empty and B isn't empty:

take i (the largest) from A and j (the largest) from B
if i equals j, pour i in j and count 1 pour
if i is larger than j, pour i in j, place i-j remainder back in A (using an insertion sort), count 1 pour
if i is smaller than j, pour i in j, place j-i remainder back in B (using an insertion sort), count 1 pour

Manaus answered 27/2, 2011 at 13:45 Comment(7)

Example, A={5,4}, B={4,4,1}. Your algorithm does it in 4 pours (of sizes 4, 1, 3, 1). It's possible to do it in 3 pours, and that solution could be found either by the heuristic of cancelling same-size units or otherwise. So I don't think your answer does give the smallest number of pours, although certainly it's efficient to calculate as requested :-) – Neoclassic 27/2, 2011 at 14:0

This could lead to a very inefficient solution (consider the case where bottle sizes are offset ie. A {5,3,2,1} B{4,3,2,1,1}) – Cupbearer 27/2, 2011 at 14:3

those problems have been fixed. the python code gives the optimal number. – Manaus 27/2, 2011 at 21:49

No, it doesn't. For example, A = {10, 4, 3}; B = {7, 6, 4} results in 5 pours rather than the optimal 4. Your million-dollar prize will have to wait. – Choler 27/2, 2011 at 22:29

ah, to solve that one you'd need to check if A has any subset that sums to the current j from B, and check if B has any subset that sums to the current i from A, alright, if there was an index structure that contained {7:[4,3]} for A and {10:[6,4]} for B which was consulted this would then be possible for it to find that solution – Manaus 27/2, 2011 at 22:38

@Dan: Yes, but such an index structure would need to be of exponential size, because it requires an entry for each of 2^n possible subsets, and thus would take exponential time to compute. (It might still be a good way to tackle the problem for small input sizes, just saying it doesn't avoid an exponential blowup in work.) – Soekarno 27/2, 2011 at 23:13

yes, although the space needed is far less than that, it does trade space for time but listing the subset sums for A would only take at most sum( T(subset_sum(i, B)) for i in A ) and the subset sum problem has a nice but NP solution which is why building the index before hand makes sense; but one would need an index to the index to remove the possibilities that aren't possible when values in A or B are removed – Manaus 27/2, 2011 at 23:28

Recommended topics

Hot tags