Subset sum Problem
Asked Answered
T

6

8

recently I became interested in the subset-sum problem which is finding a zero-sum subset in a superset. I found some solutions on SO, in addition, I came across a particular solution which uses the dynamic programming approach. I translated his solution in python based on his qualitative descriptions. I'm trying to optimize this for larger lists which eats up a lot of my memory. Can someone recommend optimizations or other techniques to solve this particular problem? Here's my attempt in python:

import random
from time import time
from itertools import product

time0 = time()

# create a zero matrix of size a (row), b(col)
def create_zero_matrix(a,b):
    return [[0]*b for x in xrange(a)]

# generate a list of size num with random integers with an upper and lower bound
def random_ints(num, lower=-1000, upper=1000):
    return [random.randrange(lower,upper+1) for i in range(num)]

# split a list up into N and P where N be the sum of the negative values and P the sum of the positive values.
# 0 does not count because of additive identity
def split_sum(A):
    N_list = []
    P_list = []
    for x in A:
        if x < 0:
            N_list.append(x)
        elif x > 0:
            P_list.append(x)
    return [sum(N_list), sum(P_list)]

# since the column indexes are in the range from 0 to P - N
# we would like to retrieve them based on the index in the range N to P
# n := row, m := col
def get_element(table, n, m, N):
    if n < 0:
        return 0
    try:
        return table[n][m - N]
    except:
        return 0

# same definition as above
def set_element(table, n, m, N, value):
    table[n][m - N] = value

# input array
#A = [1, -3, 2, 4]
A = random_ints(200)

[N, P] = split_sum(A)

# create a zero matrix of size m (row) by n (col)
#
# m := the number of elements in A
# n := P - N + 1 (by definition N <= s <= P)
#
# each element in the matrix will be a value of either 0 (false) or 1 (true)
m = len(A)
n = P - N + 1;
table = create_zero_matrix(m, n)

# set first element in index (0, A[0]) to be true
# Definition: Q(1,s) := (x1 == s). Note that index starts at 0 instead of 1.
set_element(table, 0, A[0], N, 1)

# iterate through each table element
#for i in xrange(1, m): #row
#    for s in xrange(N, P + 1): #col
for i, s in product(xrange(1, m), xrange(N, P + 1)):
    if get_element(table, i - 1, s, N) or A[i] == s or get_element(table, i - 1, s - A[i], N):
        #set_element(table, i, s, N, 1)
        table[i][s - N] = 1

# find zero-sum subset solution
s = 0
solution = []
for i in reversed(xrange(0, m)):
    if get_element(table, i - 1, s, N) == 0 and get_element(table, i, s, N) == 1:
        s = s - A[i]
        solution.append(A[i])

print "Solution: ",solution

time1 = time()

print "Time execution: ", time1 - time0
Topic answered 16/5, 2011 at 3:39 Comment(5)
I might be thinking of using pytables to store huge lists.Topic
I'd suggest numpy for lower memory usageAnemograph
I tried using numpy.array(), but this almost doubled the execution speed :)Wolff
I think the problem you are trying to solve is NP-Complete -- so even if you manage to marginally optimize this code (which looks already well optimized) the run-time (and perhaps memory consumption) will explode with larger lists...Criterion
@plaes: i tried to use numpy's array, but just as you said it did increase execution speed.Topic
R
5

I'm not quite sure if your solution is exact or a PTA (poly-time approximation).

But, as someone pointed out, this problem is indeed NP-Complete.

Meaning, every known (exact) algorithm has an exponential time behavior on the size of the input.

Meaning, if you can process 1 operation in .01 nanosecond then, for a list of 59 elements it'll take:

2^59 ops -->     2^59     seconds -->     2^26      years -->      1 year
            --------------           ---------------
            10.000.000.000           3600 x 24 x 365

You can find heuristics, which give you just a CHANCE of finding an exact solution in polynomial time.

On the other side, if you restrict the problem (to another) using bounds for the values of the numbers in the set, then the problem complexity reduces to polynomial time. But even then the memory space consumed will be a polynomial of VERY High Order.
The memory consumed will be much larger than the few gigabytes you have in memory. And even much larger than the few tera-bytes on your hard drive.

( That's for small values of the bound for the value of the elements in the set )

May be this is the case of your Dynamic programing algorithm.

It seemed to me that you were using a bound of 1000 when building your initialization matrix.

You can try a smaller bound. That is... if your input is consistently consist of small values.

Good Luck!

Rainbow answered 17/5, 2011 at 0:33 Comment(0)
E
5

Someone on Hacker News came up with the following solution to the problem, which I quite liked. It just happens to be in python :):

def subset_summing_to_zero (activities):
  subsets = {0: []}
  for (activity, cost) in activities.iteritems():
      old_subsets = subsets
      subsets = {}
      for (prev_sum, subset) in old_subsets.iteritems():
          subsets[prev_sum] = subset
          new_sum = prev_sum + cost
          new_subset = subset + [activity]
          if 0 == new_sum:
              new_subset.sort()
              return new_subset
          else:
              subsets[new_sum] = new_subset
  return []

I spent a few minutes with it and it worked very well.

Expose answered 10/6, 2011 at 15:24 Comment(2)
Hi skorks, I came across this solution on hacker news as well. The person who posted that solution said he could make it more efficient. Do you know how this could be made more efficient?Topic
I haven't played with it enough to actually try optimising, so can't really be much help to you there. However no matter what you do if your input set is large enough and your range of numbers is wide enough, it will eventually bog down.Expose
H
1

An interesting article on optimizing python code is available here. Basically the main result is that you should inline your frequent loops, so in your case this would mean instead of calling get_element twice per loop, put the actual code of that function inside the loop in order to avoid the function call overhead.

Hope that helps! Cheers

Haffner answered 16/5, 2011 at 7:44 Comment(0)
D
1

, 1st eye catch

def split_sum(A):
  N_list = 0
  P_list = 0
  for x in A:
    if x < 0:
        N_list+=x
    elif x > 0:
        P_list+=x
  return [N_list, P_list]

Some advices:

  1. Try to use 1D list and use bitarray to reduce memory footprint at minimum (http://pypi.python.org/pypi/bitarray) so you will just change get / set functon. This should reduce your memory footprint by at lest 64 (integer in list is pointer to integer whit type so it can be factor 3*32)

  2. Avoid using try - catch, but figure out proper ranges at beginning, you might found out that you will gain huge speed.

Derekderelict answered 16/5, 2011 at 20:24 Comment(0)
A
0

The following code works for Python 3.3+ , I have used the itertools module in Python that has some great methods to use.

from itertools import chain, combinations
def powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

nums = input("Enter the Elements").strip().split() inputSum = int(input("Enter the Sum You want"))

for i, combo in enumerate(powerset(nums), 1): sum = 0 for num in combo: sum += int(num) if sum == inputSum: print(combo)

The Input Output is as Follows:

Enter the Elements 1 2 3 4
Enter the Sum You want 5
('1', '4')
('2', '3')
Anson answered 6/2, 2017 at 3:7 Comment(0)
U
0

Just change the values in your set w and correspondingly make an array x as big as the len of w then pass the last value in the subsetsum function as the sum for which u want subsets and you wl bw done (if u want to check by giving your own values).

def subsetsum(cs,k,r,x,w,d):
    x[k]=1
    if(cs+w[k]==d):
        for i in range(0,k+1):

            if x[i]==1:
                print (w[i],end=" ")
        print()

    elif cs+w[k]+w[k+1]<=d :
        subsetsum(cs+w[k],k+1,r-w[k],x,w,d)

    if((cs +r-w[k]>=d) and (cs+w[k]<=d)) :
        x[k]=0
        subsetsum(cs,k+1,r-w[k],x,w,d)
#driver for the above code
w=[2,3,4,5,0]
x=[0,0,0,0,0]

subsetsum(0,0,sum(w),x,w,7)     
Underpart answered 27/5, 2017 at 12:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.