Given a target sum and a set of integers, find the closest subset of numbers that add to that target

Asked 24/10, 2013 at 16:57 Answered 24/10, 2013 at 17:17

Solved algorithm optimization set mathematical-optimization packing

I have a set of integers M and a target sum k. I want to find the subset of M that when added together is the closest to k without going over.

For example:

M = {1, 3, 5, 5, 14}

k = 12

answer = {1, 5, 5}

because 1 + 5 + 5 = 11 and there is no way to make 12.

I have the additional constraint that the subset can contain at most 4 elements.

In my application, the size of |M| can be large (on the order of thousands of elements). If it is not possible to find the optimal answer in a reasonable time, I am interested in solutions that at least give a "good" answer.

Right now I am solving this problem by generating 10,000 random subsets and selecting the closest one, which works better than one might expect but is slow. I'm not sure how far from optimal this actually is, but any insight on that would be interesting to me as well.

Discophile answered 24/10, 2013 at 16:57 Comment(4)

And just to confirm, you want the actual subset, not just the sum? – Dov 24/10, 2013 at 16:59

How large are the individual integer values? Are there any negatives among them? – Aelber 24/10, 2013 at 16:59

The integers are all positive. They span about 7 orders of magnitude (i.e. from 1 to 1M), but most are [1...10000]. – Discophile 24/10, 2013 at 17:2

Yes I am looking for the closest subset with max order 4. – Discophile 24/10, 2013 at 17:3

Since you have a limit on the number of items that you can pick, you can do it with a reasonably straightforward algorithm.

The algorithm produces the possible sums in "generations". Each element of a generation consists of a number representing the sum, and a N-tuple of indexes in M that were used to build that sum.

Generation zero is empty; generation X+1 is produced by walking the generation X, and adding the elements of M to each value of that generation, and recording their sum for the next generation X+1.

Before computing the sum, check its N-tuple for the presence of the index of the number that you are about to add. If it's there, skip the number. Next, check the sum: if it is already present among the X+1 sums, ignore it; otherwise, record the new sum, along with the new N-tuple of indexes (append the index of the number that you added to the N-tuple from the generation X).

Here is how this would work for your inputs:

Generation 0: empty

Generation 1:

 1 - {0}
 3 - {1}
 5 - {2}
14 - {4}

Generation 2:

 4 - {0, 1}
 6 - {0, 2}
 8 - {1, 2}
10 - {2, 3}
15 - {0, 4}
17 - {1, 4}
19 - {2, 4}

Generation 3:

 9 - {0, 1, 2}
11 - {0, 2, 3}
13 - {1, 2, 3}
18 - {0, 1, 4}
20 - {0, 2, 4}
22 - {1, 2, 4}
24 - {2, 3, 4}

Generation 4:

14 - {0, 1, 2, 3}
23 - {0, 1, 2, 4}
25 - {0, 2, 3, 4}
27 - {1, 2, 3, 4}

You can now search through the four generations for a number that is closest to your target number k.

Aelber answered 24/10, 2013 at 17:9 Comment(6)

This is a clever way to re-use work as one does an exhaustive search. Thanks for the idea. – Discophile 24/10, 2013 at 17:15

@JohnShedletsky This "clever way" is commonly called dynamic programming. – Aelber 24/10, 2013 at 17:22

This is O(n^4) in the worst case, right? If there is no overlapping sums, there will be n^4 elements in the 4th generation. – Calgary 24/10, 2013 at 17:34

@Dukeling Yes, in order to realize any speedup, the algorithm needs overlaps. If there are no overlaps, the algorithm would run in O(n^4). – Aelber 24/10, 2013 at 17:52

Good job mentioning Generation X! – Parvati 6/4, 2016 at 17:18

Great answer, quick note that you forgot to add 22 - {1, 2, 4} in Generation 3 – Acetamide 17/11, 2016 at 21:17

If the target sum k is not too large, look at http://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution - you can use this to create a bitmap which tells you which numbers can be produced using your subset. Then just pick the closest possible number to k in the bitmap.

Anteroom answered 24/10, 2013 at 17:14 Comment(2)

Won't an approximation algorithm make more sense here? – Gambier 27/2, 2015 at 4:48

What I have described has an easily calculable cost of at most Mk and - if you can afford it - gives the correct answer. It is up to the OP whether they can justify this. If you want an approximation, round all numbers to some multiple of G for some chosen G and then divide by G. This reduces the cost by effectively reducing k to k/G. – Anteroom 27/2, 2015 at 6:13

Split the problem into 4 parts:

Sum containing exactly 1 element

Simply loop through and find the highest value not larger than the target.
Sum containing exactly 2 elements

Use a double for-loop to find the largest sum not larger than the target.
Sum containing exactly 3 elements (similar to 3SUM)

Sort the elements

Use a double for-loop and do a binary search for the target minus the two values, looking for smaller values to find the largest sum not larger than the target.
Sum containing exactly 4 elements

Sort the elements (already done)

Use a double for-loop to generate all sums of 2 elements.

Now, for each such sum, do a binary search on the sums for the target, looking for smaller values until we find one that doesn't contain either value which this sum is made up of.

See this for code using this approach for a similar problem (the exact sum).

Average case running time (?) = O(n + n^2 + n^2 log n + n^2 log n) = O(n^2 log n).

Determining the running time of the last problem is somewhat difficult, it may be as bad as O(n^4 log n) in the worst case, as you may end up looking through most of them before you find one that fits, but this should rarely happen, and, within the same run, some should take less time, thus the overall running time may be less.

Calgary answered 24/10, 2013 at 17:17 Comment(0)

Recommended topics

Hot tags