Here is a solution that deals with duplicates.
First of all the problem of finding any solution is, as noted, NP-complete. So there are cases where this will churn for a long time to realize that there are none. I've applied reasonable heuristics to limit how often this happens. The heuristics can be improved. But be warned that there will be cases that simply nothing works.
The first step in this solution is to take a list of numbers and turn it into [(value1, repeat), (value2, repeat), ...]
. One of those heuristics requires that the values be sorted first by descending absolute value, and then by decreasing value. That is because I try to use the first elements first, and we expect a bunch of small leftover numbers to still give us sums.
Next, I'm going to try to split it into a possible maximal subset with the right target sum, and all remaining elements.
Then I'm going to split the remaining into a possible maximal remaining subset that is no bigger than the first, and the ones that result after that.
Do this recursively and we find a solution. Which we yield back up the chain.
But, and here is where it gets tricky, I'm not going to do the split by looking at combinations. Instead I'm going to use dynamic programming like we would for the usual subset-sum pseudo-polynomial algorithm, except I'll use it to construct a data structure from which we can do the split. This data structure will contain the following fields:
value
is the value of this element.
repeat
is how many times we used it in the subset sum.
skip
is how many copies we had and didn't use it in the subset sum.
tail
is the tail of these solutions.
prev
are some other solutions where we did something else.
Here is a class that constructs this data structure, with a method to split elements into a subset and elements still available for further splitting.
from collections import namedtuple
class RecursiveSums (
namedtuple('BaseRecursiveSums',
['value', 'repeat', 'skip', 'tail', 'prev'])):
def sum_and_rest(self):
if self.tail is None:
if self.skip:
yield ([self.value] * self.repeat, [(self.value, self.skip)])
else:
yield ([self.value] * self.repeat, [])
else:
for partial_sum, rest in self.tail.sum_and_rest():
for _ in range(self.repeat):
partial_sum.append(self.value)
if self.skip:
rest.append((self.value, self.skip))
yield (partial_sum, rest)
if self.prev is not None:
yield from self.prev.sum_and_rest()
You might have to look at this a few times to see how it works.
Next, remember I said that I used a heuristic to try to use large elements before small ones. Here is some code that we'll need to do that comparison.
class AbsComparator(int):
def __lt__ (self, other):
if abs(int(self)) < abs(int(other)):
return True
elif abs(other) < abs(self):
return False
else:
return int(self) < int(other)
def abs_lt (x, y):
return AbsComparator(x) < AbsComparator(y)
We'll need both forms. The function for a direct comparison, the class for Python's key
argument to the sort
function. See Using a comparator function to sort for more on the latter.
And now the heart of the method. This finds all ways to split into a subset (that is no larger than bound
in the comparison metric we are using) and the remaining elements to split more.
The idea is the same as the dynamic programming approach to subset sum https://www.geeksforgeeks.org/count-of-subsets-with-sum-equal-to-x/ except with two major differences. The first is that instead of counting the answers we are building up our data structure. The second is that our keys are (partial_sum, bound_index)
so we know whether our bound
is currently satisfied, and if it is not we know what element to compare next to test it.
def lexically_maximal_subset_rest (elements, target, bound=None):
"""
elements = [(value, count), (value, count), ...]
with largest absolute values first.
target = target sum
bound = a lexical bound on the maximal subset.
"""
# First let's deal with all of the trivial cases.
if 0 == len(elements):
if 0 == target:
yield []
elif bound is None or 0 == len(bound):
# Set the bound to something that trivially works.
yield from lexically_maximal_subset_rest(elements, target, [abs(elements[0][0]) + 1])
elif abs_lt(bound[0], elements[0][0]):
pass # we automatically use more than the bound.
else:
# The trivial checks are done.
bound_satisfied = (bound[0] != elements[0][0])
# recurse_by_sum will have a key of (partial_sum, bound_index).
# If the bound_index is None, the bound is satisfied.
# Otherwise it will be the last used index in the bound.
recurse_by_sum = {}
# Populate it with all of the ways to use the first element at least once.
(init_value, init_count) = elements[0]
for i in range(init_count):
if not bound_satisfied:
if len(bound) <= i or abs_lt(bound[i], init_value):
# Bound exceeded.
break
elif abs_lt(init_value, bound[i]):
bound_satisfied = True
if bound_satisfied:
key = (init_value * (i+1), None)
else:
key = (init_value * (i+1), i)
recurse_by_sum[key] = RecursiveSums(
init_value, i+1, init_count-i-1, None, recurse_by_sum.get(key))
# And now we do the dynamic programming thing.
for j in range(1, len(elements)):
value, repeat = elements[j]
next_recurse_by_sum = {}
for key, tail in recurse_by_sum.items():
partial_sum, bound_index = key
# Record not using this value at all.
next_recurse_by_sum[key] = RecursiveSums(
value, 0, repeat, tail, next_recurse_by_sum.get(key))
# Now record the rest.
for i in range(1, repeat+1):
if bound_index is not None:
# Bounds check.
if len(bound) <= bound_index + i:
break # bound exceeded.
elif abs_lt(bound[bound_index + i], value):
break # bound exceeded.
elif abs_lt(value, bound[bound_index + i]):
bound_index = None # bound satisfied!
if bound_index is None:
next_key = (partial_sum + value * i, None)
else:
next_key = (partial_sum + value * i, bound_index + i)
next_recurse_by_sum[next_key] = RecursiveSums(
value, i, repeat - i, tail, next_recurse_by_sum.get(next_key))
recurse_by_sum = next_recurse_by_sum
# We now have all of the answers in recurse_by_sum, but in several keys.
# Find all that may have answers.
bound_index = len(bound)
while 0 < bound_index:
bound_index -= 1
if (target, bound_index) in recurse_by_sum:
yield from recurse_by_sum[(target, bound_index)].sum_and_rest()
if (target, None) in recurse_by_sum:
yield from recurse_by_sum[(target, None)].sum_and_rest()
And now we implement the rest.
def elements_split (elements, target, k, bound=None):
if 0 == len(elements):
if k == 0:
yield []
elif k == 0:
pass # still have elements left over.
else:
for (subset, rest) in lexically_maximal_subset_rest(elements, target, bound):
for answer in elements_split(rest, target, k-1, subset):
answer.append(subset)
yield answer
def subset_split (raw_elements, k):
total = sum(raw_elements)
if 0 == (total % k):
target = total // k
counts = {}
for e in sorted(raw_elements, key=AbsComparator, reverse=True):
counts[e] = 1 + counts.get(e, 0)
elements = list(counts.items())
yield from elements_split(elements, target, k)
And here is a demonstration using your list, doubled. Which we split into 4 equal parts. On my laptop it finds all 10 solutions in 0.084 seconds.
n = 0
for s in subset_split([4, 3, 5, 6, 4, 3, 1]*2, 4):
n += 1
print(n, s)
So...no performance guarantees. But this should usually be able to find splits pretty quickly per split. Of course there are also usually an exponential number of splits. For example if you take 16 copies of your list and try to split into 32 groups, it takes about 8 minutes on my laptop to find all 224082 solutions.
If I didn't try to deal with negatives, this could be sped up quite a bit. (Use cheaper comparisons, drop all partial sums that have exceeded target
to avoid calculating most of the dynamic programming table.)
And here is the sped up version. For the case with only nonnegative numbers it is about twice as fast. If there are negative numbers it will produce wrong results.
from collections import namedtuple
class RecursiveSums (
namedtuple('BaseRecursiveSums',
['value', 'repeat', 'skip', 'tail', 'prev'])):
def sum_and_rest(self):
if self.tail is None:
if self.skip:
yield ([self.value] * self.repeat, [(self.value, self.skip)])
else:
yield ([self.value] * self.repeat, [])
else:
for partial_sum, rest in self.tail.sum_and_rest():
for _ in range(self.repeat):
partial_sum.append(self.value)
if self.skip:
rest.append((self.value, self.skip))
yield (partial_sum, rest)
if self.prev is not None:
yield from self.prev.sum_and_rest()
def lexically_maximal_subset_rest (elements, target, bound=None):
"""
elements = [(value, count), (value, count), ...]
with largest absolute values first.
target = target sum
bound = a lexical bound on the maximal subset.
"""
# First let's deal with all of the trivial cases.
if 0 == len(elements):
if 0 == target:
yield []
elif bound is None or 0 == len(bound):
# Set the bound to something that trivially works.
yield from lexically_maximal_subset_rest(elements, target, [abs(elements[0][0]) + 1])
elif bound[0] < elements[0][0]:
pass # we automatically use more than the bound.
else:
# The trivial checks are done.
bound_satisfied = (bound[0] != elements[0][0])
# recurse_by_sum will have a key of (partial_sum, bound_index).
# If the bound_index is None, the bound is satisfied.
# Otherwise it will be the last used index in the bound.
recurse_by_sum = {}
# Populate it with all of the ways to use the first element at least once.
(init_value, init_count) = elements[0]
for i in range(init_count):
if not bound_satisfied:
if len(bound) <= i or bound[i] < init_value:
# Bound exceeded.
break
elif init_value < bound[i]:
bound_satisfied = True
if bound_satisfied:
key = (init_value * (i+1), None)
else:
key = (init_value * (i+1), i)
recurse_by_sum[key] = RecursiveSums(
init_value, i+1, init_count-i-1, None, recurse_by_sum.get(key))
# And now we do the dynamic programming thing.
for j in range(1, len(elements)):
value, repeat = elements[j]
next_recurse_by_sum = {}
for key, tail in recurse_by_sum.items():
partial_sum, bound_index = key
# Record not using this value at all.
next_recurse_by_sum[key] = RecursiveSums(
value, 0, repeat, tail, next_recurse_by_sum.get(key))
# Now record the rest.
for i in range(1, repeat+1):
if target < partial_sum + value * i:
break # these are too big.
if bound_index is not None:
# Bounds check.
if len(bound) <= bound_index + i:
break # bound exceeded.
elif bound[bound_index + i] < value:
break # bound exceeded.
elif value < bound[bound_index + i]:
bound_index = None # bound satisfied!
if bound_index is None:
next_key = (partial_sum + value * i, None)
else:
next_key = (partial_sum + value * i, bound_index + i)
next_recurse_by_sum[next_key] = RecursiveSums(
value, i, repeat - i, tail, next_recurse_by_sum.get(next_key))
recurse_by_sum = next_recurse_by_sum
# We now have all of the answers in recurse_by_sum, but in several keys.
# Find all that may have answers.
bound_index = len(bound)
while 0 < bound_index:
bound_index -= 1
if (target, bound_index) in recurse_by_sum:
yield from recurse_by_sum[(target, bound_index)].sum_and_rest()
if (target, None) in recurse_by_sum:
yield from recurse_by_sum[(target, None)].sum_and_rest()
def elements_split (elements, target, k, bound=None):
if 0 == len(elements):
if k == 0:
yield []
elif k == 0:
pass # still have elements left over.
else:
for (subset, rest) in lexically_maximal_subset_rest(elements, target, bound):
for answer in elements_split(rest, target, k-1, subset):
answer.append(subset)
yield answer
def subset_split (raw_elements, k):
total = sum(raw_elements)
if 0 == (total % k):
target = total // k
counts = {}
for e in sorted(raw_elements, key=AbsComparator, reverse=True):
counts[e] = 1 + counts.get(e, 0)
elements = list(counts.items())
yield from elements_split(elements, target, k)
n = 0
for s in subset_split([4, 3, 5, 6, 4, 3, 1]*16, 32):
n += 1
print(n, s)
[]
(i.e., that's what the function should return then). – Vadneeif len(l) < 2 or sum(l) % 2 != 0:
. In your example[1,2]
the sum of the list (3) is not divisible by 2 so my code returns[]
. – Nephrolith1, 3, 2, 2, 2, 2
into 3 groups, do you want to say that[1, 3], [2, 2], [2,2]
as the only answer, or do you wish to count all of the different ways to split up the 42
s? – Photograph[1, 3], [2, 2], [2,2]
but if it makes it harder, then I don't mind having duplicates in the answer. – Nephrolith