Bad news: this problem is NP-hard by a reduction from subset sum. Given numbers x1, …, xn, S, the object of subset sum is to determine whether or not some subset of the xis sum to S. We make A-bottles with capacities x1, …, xn and B-bottles with capacities S and (x1 + … + xn - S) and determine whether n pours are sufficient.
Good news: any greedy strategy (i.e., choose any nonempty A, choose any unfilled B, pour until we have to stop) is a 2-approximation (i.e., uses at most twice as many pours as optimal). The optimal solution uses at least max(|A|, |B|) pours, and greedy uses at most |A| + |B|, since every time greedy does a pour, either an A is drained or a B is filled and does not need to be poured out of or into again.
There might be an approximation scheme (a (1 + ε)-approximation for any ε > 0). I think now it's more likely that there's an inapproximability result – the usual tricks for obtaining approximation schemes don't seem to apply here.
Here are some ideas that might lead to a practical exact algorithm.
Given a solution, draw a bipartite graph with left vertices A
and right vertices B
and an (undirected) edge from a
to b
if and only if a
is poured into b
. If the solution is optimal, I claim that there are no cycles – otherwise we could eliminate the smallest pour in the cycle and replace the lost volume going around the cycle. For example, if I have pours
a1 -> b1: 1
a1 -> b2: 2
a2 -> b1: 3
a2 -> b3: 4
a3 -> b2: 5
a3 -> b3: 6
then I can eliminate by a1 -> b1
pour like so:
a2 -> b1: 4 (+1)
a2 -> b3: 3 (-1)
a3 -> b3: 7 (+1)
a3 -> b2: 4 (-1)
a1 -> b2: 3 (+1)
Now, since the graph has no cycle, we can count the number of edges (pours) as |A| + |B| - #(connected components)
. The only variable here is the number of connected components, which we want to maximize.
I claim that the greedy algorithm forms graphs that have no cycle. If we knew what the connected components of an optimal solution were, we could use a greedy algorithm on each one and get an optimal solution.
One way to tackle this subproblem would be to use dynamic programming to enumerate all subset pairs X of A and Y of B such that sum(X) == sum(Y) and then feed these into an exact cover algorithm. Both steps are of course exponential, but they might work well on real data.