Fast solution to Subset sum

R

6

18

Consider this way of solving the Subset sum problem:

def subset_summing_to_zero (activities):
  subsets = {0: []}
  for (activity, cost) in activities.iteritems():
      old_subsets = subsets
      subsets = {}
      for (prev_sum, subset) in old_subsets.iteritems():
          subsets[prev_sum] = subset
          new_sum = prev_sum + cost
          new_subset = subset + [activity]
          if 0 == new_sum:
              new_subset.sort()
              return new_subset
          else:
              subsets[new_sum] = new_subset
  return []

I have it from here:

http://news.ycombinator.com/item?id=2267392

There is also a comment which says that it is possible to make it "more efficient".

How?

Also, are there any other ways to solve the problem which are at least as fast as the one above?

Edit

I'm interested in any kind of idea which would lead to speed-up. I found:

https://en.wikipedia.org/wiki/Subset_sum_problem#cite_note-Pisinger09-2

which mentions a linear time algorithm. But I don't have the paper, perhaps you, dear people, know how it works? An implementation perhaps? Completely different approach perhaps?

Edit 2

There is now a follow-up:
Fast solution to Subset sum algorithm by Pisinger

Rubenrubens answered 21/3, 2012 at 17:6 Comment(10)

You can download the paper from here. – Pazit 21/3, 2012 at 18:37

Is it really the same paper? They have different titles..? – Rubenrubens 21/3, 2012 at 18:44

The abstract claims that "Restricting a dynamic programming algorithm to only consider balanced states, implies that Subset-sub Problem [... is] solvable in linear time, provided that the coefficients are bounded by a constant.", which is what I think you are looking for. – Pazit 21/3, 2012 at 18:58

There is no linear time algorithm unless P=NP, since subset-sum is NP-complete. Your algorithm, and my answer, is linear time, when weights are bounded - Which means there are only O(activities.size() * 2^bound) possible elements in the subsets array. – Horripilation 26/3, 2012 at 13:1

@Horripilation talking to myself - since I can't edit the comment any more. The size is actually O(activities.size() * bound), not O(activities.size() * 2^bound) – Horripilation 28/3, 2012 at 9:44

Btw., what does this "bounded" mean? That there is a upper bound on subset-sums known in advance? – Rubenrubens 28/3, 2012 at 12:42

Ecir: "Bounded" = "not greater than a fixed number" in this context. I.e. there is a linear time algorithm when all weights are at most 1000000000. There is a linear time algorithm when all weights are at most 100000000000000000000. However, there is no (known) linear time algorithm if you do not fix a bound in advance, and if you take a larger bound, the constant in big-O will increase. – Lunneta 30/3, 2012 at 21:6

The code as written doesn't compute anything? Am I taking crazy pills? – Hemispheroid 26/1, 2015 at 17:40

@Hemispheroid probably there is no subset of the input you provided which can be summed to zero. Try: print(subset_summing_to_zero({'a':1,'f':-2,'e':-2,'d':4})) – Rubenrubens 26/1, 2015 at 21:16

Here is my open-source javascript solution for the linear time, iterative 2004 algorithm by Pisinger which solves subset sum for each x_i positive and bounded by some constant C: github.com/thorpep138/subset-sum-pisinger and npmjs.com/package/subset-sum-pisinger . It has an open test suite and follows precisely the algorithm outlined in link.springer.com/chapter/10.1007%2F978-3-540-24777-7_4 – Bacardi 1/10, 2020 at 15:3

A

7

While my previous answer describes the polytime approximate algorithm to this problem, a request was specifically made for an implementation of Pisinger's polytime dynamic programming solution when all x_i in x are positive:

from bisect import bisect

def balsub(X,c):
    """ Simple impl. of Pisinger's generalization of KP for subset sum problems
    satisfying xi >= 0, for all xi in X. Returns the state array "st", which may
    be used to determine if an optimal solution exists to this subproblem of SSP.
    """
    if not X:
        return False
    X = sorted(X)
    n = len(X)
    b = bisect(X,c)
    r = X[-1]
    w_sum = sum(X[:b])
    stm1 = {}
    st = {}
    for u in range(c-r+1,c+1):
        stm1[u] = 0
    for u in range(c+1,c+r+1):
        stm1[u] = 1
    stm1[w_sum] = b
    for t in range(b,n+1):
        for u in range(c-r+1,c+r+1):
            st[u] = stm1[u]
        for u in range(c-r+1,c+1):
            u_tick = u + X[t-1]
            st[u_tick] = max(st[u_tick],stm1[u])
        for u in reversed(range(c+1,c+X[t-1]+1)):
            for j in reversed(range(stm1[u],st[u])):
                u_tick = u - X[j-1]
                st[u_tick] = max(st[u_tick],j)
    return st

Wow, that was headache-inducing. This needs proofreading, because, while it implements balsub, I can't define the right comparator to determine if the optimal solution to this subproblem of SSP exists.

Almanac answered 31/3, 2012 at 3:24 Comment(7)

As you said, it does not work. I think, if we want to read the paper as close as possible, two line should look different: for t in range(b,n+1): and for j in reversed(range(stm1[u],st[u])):. But then it breaks with "list index out of range" so there might be an off-by-one error(s). Thank you anyway for the help so far! – Rubenrubens 31/3, 2012 at 8:57

@EcirHana balsub is now implemented, and it should be off-by-one free. I can't make sense of the given optimal solution for SSP, z = max{u <= c: sn(u) > 0} because of terseness of description, but you might be able to make headway with this. :) – Almanac 31/3, 2012 at 19:35

Sorry - what or where is this z = max...? – Rubenrubens 31/3, 2012 at 19:54

@EcirHana Bottom of page four of the paper, second paragraph before the pseudocode for balsub. – Almanac 31/3, 2012 at 19:55

@EcirHana Oh, I'm just looking for the right transformation, to go from the output of balsub to the verification that an optimal solution exists. It's a bit terse and difficult to decypher from the paper. :) – Almanac 31/3, 2012 at 20:2

Even if this doesn't work right now your answers were the most helpful so the eternal fame is yours: Eternal fame. I will post another question specifically about this algorithm. Thank you! – Rubenrubens 1/4, 2012 at 6:44

@EcirHana Glad to help! And I hope, too, that these help posterity as well. (Computer) science is awesome. :) – Almanac 1/4, 2012 at 6:52

A

17

I respect the alacrity with which you're trying to solve this problem! Unfortunately, you're trying to solve a problem that's NP-complete, meaning that any further improvement that breaks the polynomial time barrier will prove that P = NP.

The implementation you pulled from Hacker News appears to be consistent with the pseudo-polytime dynamic programming solution, where any additional improvements must, by definition, progress the state of current research into this problem and all of its algorithmic isoforms. In other words: while a constant speedup is possible, you're very unlikely to see an algorithmic improvement to this solution to the problem in the context of this thread.

However, you can use an approximate algorithm if you require a polytime solution with a tolerable degree of error. In pseudocode blatantly stolen from Wikipedia, this would be:

initialize a list S to contain one element 0.
 for each i from 1 to N do
   let T be a list consisting of xi + y, for all y in S
   let U be the union of T and S
   sort U
   make S empty 
   let y be the smallest element of U 
   add y to S 
   for each element z of U in increasing order do
      //trim the list by eliminating numbers close to one another
      //and throw out elements greater than s
     if y + cs/N < z ≤ s, set y = z and add z to S 
 if S contains a number between (1 − c)s and s, output yes, otherwise no

Python implementation, preserving the original terms as closely as possible:

from bisect import bisect

def ssum(X,c,s):
    """ Simple impl. of the polytime approximate subset sum algorithm 
    Returns True if the subset exists within our given error; False otherwise 
    """
    S = [0]
    N = len(X)
    for xi in X:
        T = [xi + y for y in S]
        U = set().union(T,S)
        U = sorted(U) # Coercion to list
        S = []
        y = U[0]
        S.append(y)
        for z in U: 
            if y + (c*s)/N < z and z <= s:
                y = z
                S.append(z)
    if not c: # For zero error, check equivalence
        return S[bisect(S,s)-1] == s
    return bisect(S,(1-c)*s) != bisect(S,s)

... where X is your bag of terms, c is your precision (between 0 and 1), and s is the target sum.

For more details, see the Wikipedia article.

(Additional reference, further reading on CSTheory.SE)

Almanac answered 29/3, 2012 at 18:28 Comment(6)

Thank you! The paper by Pisinger linked above says that there is a dynamic programing algorithm which runs O(nc) - I suppose that's the approach from Hacker News. But they say that if the weight are "bounded" the complexity may be written as O(n^2*r) and they present a algorithm running in O(nr). They even provide pseudo-code for it (page 4) but unfortunately it's a bit over my head. – Rubenrubens 29/3, 2012 at 20:33

However, I think I would be much happier with an exact solution. Btw, just for the reference, if we talk about approximate approach I once found: en.wikipedia.org/wiki/Polynomial-time_approximation_scheme – Rubenrubens 29/3, 2012 at 20:39

@EcirHana Will do! I need to take a rain check for now, but I'll be happy to write up a Python version sometime later today. Thank you for the helpful information on the paper, as well. :) – Almanac 29/3, 2012 at 20:42

thank you very much for the effort but I'm afraid we misunderstood each other. I would like to have an exact solution, rather than approximate one. I thought your "will do" means that you are going to look into the paper - I would have stopped you earlier - I'm sorry. – Rubenrubens 30/3, 2012 at 7:19

@EcirHana It's no trouble at all! This needed a Python implementation for posterity. I will look into the paper, though, and see if I can work out any improvements on the DP approach. No promises, but if I write an implementation, it'll be here. :) – Almanac 30/3, 2012 at 7:28

@EcirHana Wow, that paper is hard to read. Note the caveat (from Wikipedia): For the case that each xi is positive and bounded by the same constant, Pisinger found a linear time algorithm. I think I'll write this up and submit it as a separate answer. – Almanac 31/3, 2012 at 1:19

H

7

I don't know much python, but there is an approach called meet in the middle. Pseudocode:

Divide activities into two subarrays, A1 and A2
for both A1 and A2, calculate subsets hashes, H1 and H2, the way You do it in Your question.
for each (cost, a1) in H1
     if(H2.contains(-cost))
         return a1 + H2[-cost];

This will allow You to double the number of elements of activities You can handle in reasonable time.

Horripilation answered 26/3, 2012 at 12:4 Comment(2)

Very clever, thank you! Unfortunately, as you said, it improves the running time "just" by a constant factor (2x). – Rubenrubens 26/3, 2012 at 22:33

Suppose that you have numers 1,2,4,..2^39. You want to check for some number. Without MIM you need 2^40 operations. With it - 2^20 (Assuming that hash works in O(1)) – Whiffet 29/3, 2012 at 15:12

A

7

While my previous answer describes the polytime approximate algorithm to this problem, a request was specifically made for an implementation of Pisinger's polytime dynamic programming solution when all x_i in x are positive:

from bisect import bisect

def balsub(X,c):
    """ Simple impl. of Pisinger's generalization of KP for subset sum problems
    satisfying xi >= 0, for all xi in X. Returns the state array "st", which may
    be used to determine if an optimal solution exists to this subproblem of SSP.
    """
    if not X:
        return False
    X = sorted(X)
    n = len(X)
    b = bisect(X,c)
    r = X[-1]
    w_sum = sum(X[:b])
    stm1 = {}
    st = {}
    for u in range(c-r+1,c+1):
        stm1[u] = 0
    for u in range(c+1,c+r+1):
        stm1[u] = 1
    stm1[w_sum] = b
    for t in range(b,n+1):
        for u in range(c-r+1,c+r+1):
            st[u] = stm1[u]
        for u in range(c-r+1,c+1):
            u_tick = u + X[t-1]
            st[u_tick] = max(st[u_tick],stm1[u])
        for u in reversed(range(c+1,c+X[t-1]+1)):
            for j in reversed(range(stm1[u],st[u])):
                u_tick = u - X[j-1]
                st[u_tick] = max(st[u_tick],j)
    return st

Wow, that was headache-inducing. This needs proofreading, because, while it implements balsub, I can't define the right comparator to determine if the optimal solution to this subproblem of SSP exists.

Almanac answered 31/3, 2012 at 3:24 Comment(7)

As you said, it does not work. I think, if we want to read the paper as close as possible, two line should look different: for t in range(b,n+1): and for j in reversed(range(stm1[u],st[u])):. But then it breaks with "list index out of range" so there might be an off-by-one error(s). Thank you anyway for the help so far! – Rubenrubens 31/3, 2012 at 8:57

@EcirHana balsub is now implemented, and it should be off-by-one free. I can't make sense of the given optimal solution for SSP, z = max{u <= c: sn(u) > 0} because of terseness of description, but you might be able to make headway with this. :) – Almanac 31/3, 2012 at 19:35

Sorry - what or where is this z = max...? – Rubenrubens 31/3, 2012 at 19:54

@EcirHana Bottom of page four of the paper, second paragraph before the pseudocode for balsub. – Almanac 31/3, 2012 at 19:55

@EcirHana Oh, I'm just looking for the right transformation, to go from the output of balsub to the verification that an optimal solution exists. It's a bit terse and difficult to decypher from the paper. :) – Almanac 31/3, 2012 at 20:2

Even if this doesn't work right now your answers were the most helpful so the eternal fame is yours: Eternal fame. I will post another question specifically about this algorithm. Thank you! – Rubenrubens 1/4, 2012 at 6:44

@EcirHana Glad to help! And I hope, too, that these help posterity as well. (Computer) science is awesome. :) – Almanac 1/4, 2012 at 6:52

M

3

I apologize for "discussing" the problem, but a "Subset Sum" problem where the x values are bounded is not the NP version of the problem. Dynamic programing solutions are known for bounded x value problems. That is done by representing the x values as the sum of unit lengths. The Dynamic programming solutions have a number of fundamental iterations that is linear with that total length of the x's. However, the Subset Sum is in NP when the precision of the numbers equals N. That is, the number or base 2 place values needed to state the x's is = N. For N = 40, the x's have to be in the billions. In the NP problem the unit length of the x's increases exponentially with N.That is why the dynamic programming solutions are not a polynomial time solution to the NP Subset Sum problem. That being the case, there are still practical instances of the Subset Sum problem where the x's are bounded and the dynamic programming solution is valid.

Mosstrooper answered 1/4, 2012 at 18:44 Comment(1)

You are totally welcomed to "discuss" the problem, no reason to apologize. You are right about that NP and dynamic programing, I was/am willing to sacrifice some generality (e.g. the weight could be bounded) for speed. That said, the paper above claims to run faster than the classic DP solution from the question (at least that's what I understood). Also, I was hoping to collect a few tricks on the problem from people here. – Rubenrubens 1/4, 2012 at 19:7

J

2

Here are three ways to make the code more efficient:

The code stores a list of activities for each partial sum. It is more efficient in terms of both memory and time to just store the most recent activity needed to make the sum, and work out the rest by backtracking once a solution is found.
For each activity the dictionary is repopulated with the old contents (subsets[prev_sum] = subset). It is faster to simply grow a single dictionary
Splitting the values in two and applying a meet in the middle approach.

Applying the first two optimisations results in the following code which is more than 5 times faster:

def subset_summing_to_zero2 (activities):
  subsets = {0:-1}
  for (activity, cost) in activities.iteritems():
      for prev_sum in subsets.keys():
          new_sum = prev_sum + cost
          if 0 == new_sum:
              new_subset = [activity]
              while prev_sum:
                  activity = subsets[prev_sum]
                  new_subset.append(activity)
                  prev_sum -= activities[activity]
              return sorted(new_subset)
          if new_sum in subsets: continue
          subsets[new_sum] = activity
  return []

Also applying the third optimisation results in something like:

def subset_summing_to_zero3 (activities):
  A=activities.items()
  mid=len(A)//2
  def make_subsets(A):
      subsets = {0:-1}
      for (activity, cost) in A:
          for prev_sum in subsets.keys():
              new_sum = prev_sum + cost
              if new_sum and new_sum in subsets: continue
              subsets[new_sum] = activity
      return subsets
  subsets = make_subsets(A[:mid])
  subsets2 = make_subsets(A[mid:])

  def follow_trail(new_subset,subsets,s):
      while s:
         activity = subsets[s]
         new_subset.append(activity)
         s -= activities[activity]

  new_subset=[]
  for s in subsets:
      if -s in subsets2:
          follow_trail(new_subset,subsets,s)
          follow_trail(new_subset,subsets2,-s)
          if len(new_subset):
              break
  return sorted(new_subset)

Define bound to be the largest absolute value of the elements. The algorithmic benefit of the meet in the middle approach depends a lot on bound.

For a low bound (e.g. bound=1000 and n=300) the meet in the middle only gets a factor of about 2 improvement other the first improved method. This is because the dictionary called subsets is densely populated.

However, for a high bound (e.g. bound=100,000 and n=30) the meet in the middle takes 0.03 seconds compared to 2.5 seconds for the first improved method (and 18 seconds for the original code)

For high bounds, the meet in the middle will take about the square root of the number of operations of the normal method.

It may seem surprising that meet in the middle is only twice as fast for low bounds. The reason is that the number of operations in each iteration depends on the number of keys in the dictionary. After adding k activities we might expect there to be 2**k keys, but if bound is small then many of these keys will collide so we will only have O(bound.k) keys instead.

Johnnajohnnie answered 29/3, 2012 at 21:58 Comment(1)

I specifically wrote "does not relay on some implementation detail". The difference in #1 is just the fact that Python runs in O(n) for list + [item]. #2 adding to a hash table is O(1). #3 is exactly the kind of trick I'm looking for, see above. – Rubenrubens 30/3, 2012 at 7:31

Z

0

Thought I'd share my Scala solution for the discussed pseudo-polytime algorithm described in wikipedia. It's a slightly modified version: it figures out how many unique subsets there are. This is very much related to a HackerRank problem described at https://www.hackerrank.com/challenges/functional-programming-the-sums-of-powers. Coding style might not be excellent, I'm still learning Scala :) Maybe this is still helpful for someone.

object Solution extends App {
    var input = "1000\n2"

    System.setIn(new ByteArrayInputStream(input.getBytes()))        

    println(calculateNumberOfWays(readInt, readInt))

    def calculateNumberOfWays(X: Int, N: Int) = {
            val maxValue = Math.pow(X, 1.0/N).toInt

            val listOfValues = (1 until maxValue + 1).toList

            val listOfPowers = listOfValues.map(value => Math.pow(value, N).toInt)

            val lists = (0 until maxValue).toList.foldLeft(List(List(0)): List[List[Int]]) ((newList, i) => 
                    newList :+ (newList.last union (newList.last.map(y => y + listOfPowers.apply(i)).filter(z => z <= X)))
            )

            lists.last.count(_ == X)        

    }
}

Zebulun answered 7/4, 2014 at 17:51 Comment(0)

Recommended topics

Hot tags