maximum sum of a subset of size K with sum less than M
Asked Answered
S

3

10

Given: array of integers value K,M

Question: Find the maximum sum which we can obtain from all K element subsets of given array such that sum is less than value M?

is there a non dynamic programming solution available to this problem? or if it is only dp[i][j][k] can only solve this type of problem! can you please explain the algorithm.

Succedaneum answered 5/8, 2013 at 19:56 Comment(5)
I am not able to understand your question, Can you please provide one example here. Here is what i understand, [3,5,2,6,1,8,15,18,4] if K is 3 M is 15 then i can select 8,2,4 , is that what the answer should be?Kristopher
@Kristopher question is...we can make only K length subsets...and out of these K length subsets which subset gives maximum sum...subject to the condition that sum is less than MSuccedaneum
ok, so as per the example i have given, if k is 3 then out of all subsets of length 3 which is having maximum sum but less than 15. Thanks!!Kristopher
This problem is discussed here cs.dartmouth.edu/~ac/Teach/CS105-Winter05/Notes/…Daves
@Daves i don't think it is what i asked. It is a variant of the same, but not exactly what's asked.Succedaneum
T
8

Many people have commented correctly that the answer below from years ago, which uses dynamic programming, incorrectly encodes solutions allowing an element of the array to appear in a "subset" multiple times. Luckily there is still hope for a DP based approach.

Let dp[i][j][k] = true if there exists a size k subset of the first i elements of the input array summing up to j

Our base case is dp[0][0][0] = true

Now, either the size k subset of the first i elements uses a[i + 1], or it does not, giving the recurrence

dp[i + 1][j][k] = dp[i][j - a[i + 1]][k - 1] OR dp[i][j][k]

Put everything together:

given A[1...N]
initialize dp[0...N][0...M][0...K] to false
dp[0][0][0] = true
for i = 0 to N - 1:
    for j = 0 to M:
        for k = 0 to K:
            if dp[i][j][k]:
                dp[i + 1][j][k] = true
            if j >= A[i] and k >= 1 and dp[i][j - A[i + 1]][k - 1]:
                dp[i + 1][j][k] = true
max_sum = 0
for j = 0 to M:
    if dp[N][j][K]:
        max_sum = j
return max_sum

giving O(NMK) time and space complexity.

Stepping back, we've made one assumption here implicitly which is that A[1...i] are all non-negative. With negative numbers, initializing the second dimension 0...M is not correct. Consider a size K subset made up of a size K - 1 subset with sum exceeding M and one other sufficiently negative element of A[] such that overall sum no longer exceeds M. Similarly, our size K - 1 subset could sum to some extremely negative number and then with a sufficiently positive element of A[] sum to M. In order for our algorithm to still work in both cases we would need to increase the second dimension from M to the difference between the sum of all positive elements in A[] and the sum of all negative elements (the sum of the absolute values of all elements in A[]).

As for whether a non dynamic programming solution exists, certainly there is the naive exponential time brute force solution and variations that optimize the constant factor in the exponent.

Beyond that? Well your problem is closely related to subset sum and the literature for the big name NP complete problems is rather extensive. And as a general principle algorithms can come in all shapes and sizes -- it's not impossible for me to imagine doing say, randomization, approximation, (just choose the error parameter to be sufficiently small!) plain old reductions to other NP complete problems (convert your problem into a giant boolean circuit and run a SAT solver). Yes these are different algorithms. Are they faster than a dynamic programming solution? Some of them, probably. Are they as simple to understand or implement, without say training beyond standard introduction to algorithms material? Probably not.

This is a variant of the Knapsack or subset-problem, where in terms of time (at the cost of exponential growing space requirements as the input size grows), dynamic programming is the most efficient method that CORRECTLY solves this problem. See Is this variant of the subset sum problem easier to solve? for a similar question to yours.

However, since your problem is not exactly the same, I'll provide an explanation anyways. Let dp[i][j] = true, if there is a subset of length i that sums to j and false if there isn't. The idea is that dp[][] will encode the sums of all possible subsets for every possible length. We can then simply find the largest j <= M such that dp[K][j] is true. Our base case dp[0][0] = true because we can always make a subset that sums to 0 by picking one of size 0.

The recurrence is also fairly straightforward. Suppose we've calculated the values of dp[][] using the first n values of the array. To find all possible subsets of the first n+1 values of the array, we can simply take the n+1_th value and add it to all the subsets we've seen before. More concretely, we have the following code:

initialize dp[0..K][0..M] to false
dp[0][0] = true
for i = 0 to N:
    for s = 0 to K - 1:
        for j = M to 0:
            if dp[s][j] && A[i] + j < M:
                dp[s + 1][j + A[i]] = true
for j = M to 0:
    if dp[K][j]:
        print j
        break

Thrifty answered 6/8, 2013 at 13:57 Comment(5)
your solution is correct. minor error: "if dp[s][j] && A[i] + j >= M" should be "if dp[s][j] && A[i] + j < M"Bright
What is A in this solution? Not mentioned anywhere in the writeup.Crites
This assumes you can repeat elements of A in the same subset, which is not in line with the definition of a subset, i.e. (1, 1) is not a subset of (1, 2, 3).Cheerly
Indeed @Cheerly is correct. The proposed solution is incorrect. You need to track which elements you used in the subsets.Johnstone
You guys are right, the solution allows an item yo be used more than once. I'm going to take down this answerThrifty
D
1

We're looking for a subset of K elements for which the sum of the elements is a maximum, but less than M.

We can place bounds [X, Y] on the largest element in the subset as follows.

First we sort the (N) integers, values[0] ... values[N-1], with the element values[0] is the smallest.

The lower bound X is the largest integer for which

values[X] + values[X-1] + .... + values[X-(K-1)] < M.

(If X is N-1, then we've found the answer.)

The upper bound Y is the largest integer less than N for which

values[0] + values[1] + ... + values[K-2] + values[Y] < M.

With this observation, we can now bound the second-highest term for each value of the highest term Z, where

X <= Z <= Y.

We can use exactly the same method, since the form of the problem is exactly the same. The reduced problem is finding a subset of K-1 elements, taken from values[0] ... values[Z-1], for which the sum of the elements is a maximum, but less than M - values[Z].

Once we've bound that value in the same way, we can put bounds on the third-largest value for each pair of the two highest values. And so on.

This gives us a tree structure to search, hopefully with much fewer combinations to search than N choose K.

Dampen answered 5/8, 2013 at 22:36 Comment(2)
does this satisfy the need that it is a subset! i think the solution is as if it's a subarray..subset can be any subsequence! i will try out your solution though..Succedaneum
Yes, it does satisfy that it is a subset. I ordered the elements so that I could place bounds on the value of the largest element in that subset that would satisfy your condition. One doesn't need to use as a largest value any values lower than values[X], because the maximum sum of all subsets with K elements is still less than M. Similarly, one doesn't need to check any values larger than values[Y] because the minimum sum of all of those subsets with K elements is already larger than M. See what I mean?Dampen
C
0

Felix is correct that this is a special case of the knapsack problem. His dynamic programming algorithm takes O(K*M) size and O(K*K*M) amount of time. I believe his use of the variable N really should be K.

There are two books devoted to the knapsack problem. The latest one, by Kellerer, Pferschy and Pisinger [2004, Springer-Verlag, ISBN 3-540-40286-1] gives an improved dynamic programming algorithm on their page 76, Figure 4.2 that takes O(K+M) space and O(KM) time, which is huge reduction compared to the dynamic programming algorithm given by Felix. Note that there is a typo on the book's last line of the algorithm where it should be c-bar := c-bar - w_(r(c-bar)).

My C# implementation is below. I cannot say that I have extensively tested it, and I welcome feedback on this. I used BitArray to implement the concept of the sets given in the algorithm in the book. In my code, c is the capacity (which in the original post was called M), and I used w instead of A as the array that holds the weights.

An example of its use is:

int[] optimal_indexes_for_ssp = new SubsetSumProblem(12, new List<int> { 1, 3, 5, 6 }).SolveSubsetSumProblem();

where the array optimal_indexes_for_ssp contains [0,2,3] corresponding to the elements 1, 5, 6.

using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;

public class SubsetSumProblem
{
    private int[] w;
    private int c;

    public SubsetSumProblem(int c, IEnumerable<int> w)
    {
      if (c < 0) throw new ArgumentOutOfRangeException("Capacity for subset sum problem must be at least 0, but input was: " + c.ToString());
      int n = w.Count();
      this.w = new int[n];
      this.c = c;
      IEnumerator<int> pwi = w.GetEnumerator();
      pwi.MoveNext();
      for (int i = 0; i < n; i++, pwi.MoveNext())
        this.w[i] = pwi.Current;
    }

    public int[] SolveSubsetSumProblem()
    {
      int n = w.Length;
      int[] r = new int[c+1];
      BitArray R = new BitArray(c+1);
      R[0] = true;
      BitArray Rp = new BitArray(c+1);
      for (int d =0; d<=c ; d++) r[d] = 0;
      for (int j = 0; j < n; j++)
      {
        Rp.SetAll(false);
        for (int k = 0; k <= c; k++)
          if (R[k] && k + w[j] <= c) Rp[k + w[j]] = true;
        for (int k = w[j]; k <= c; k++) // since Rp[k]=false for k<w[j]
          if (Rp[k])
          {
            if (!R[k]) r[k] = j;
            R[k] = true;
          }
      }
      int capacity_used= 0;
      for(int d=c; d>=0; d--)
        if (R[d])
        {
          capacity_used = d;
          break;
        }
      List<int> result = new List<int>();
      while (capacity_used > 0)
      {
        result.Add(r[capacity_used]);
        capacity_used -= w[r[capacity_used]];
      } ;
      if (capacity_used < 0) throw new Exception("Subset sum program has an internal logic error");
      return result.ToArray();
    }
}
Calendre answered 30/8, 2013 at 14:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.