Find all possible sums of the combinations of sets of integers, efficiently
Asked Answered
C

1

8

I have an algorithm that finds the set of all unique sums of the combinations of k tuples drawn with replacement from of a list of tuples. Each tuple contains n positive integers, the order of these integers matters, and the sum of the tuples is defined as element-wise addition. e.g. (1, 2, 3) + (4, 5, 6) = (5, 7, 9)

Simple example for k=2 and n=3:

input = [(1,0,0), (2,1,1), (3,3,2)]  
solution = [(1,0,0)+(2,1,1), (1,0,0)+(3,3,2), (2,1,1)+(3,3,2), (1,0,0)+(1,0,0), (2,1,1)+(2,1,1), (3,3,2)+(3,3,2)]  
solution = [(3,1,1), (4,3,2), (5,4,3), (2,0,0), (4,2,2), (6,6,4)]

In practice the integers in the tuples range from 0 to 50 (in some positions it may be a lot more constraint, like [0:2]), k goes up to 4 combinations, and the length of the tuples goes up to 5. The number of tuples to draw from goes up to a thousand.

The algorithm I currently have is an adaptation of an algorithm proposed in a related question, it's more efficient then enumerating all combinations with itertools (if we're drawing 4 tuples out of 1000, there are billions of combinations, but the number of sums will be orders of magnitude less), but I don't see how to apply bitsets for example to this problem.

# example where length of tuples n = 3:
lst = []
for x in range(0,50,2):
    for y in range(0, 20, 1):
        for z in range(0, 3, 1):
            lst.append((x,y,z))

# this function works for any k and n
def unique_combination_sums(lst, k):
    n = len(lst[0])
    sums = {tuple(0 for _ in range(n))}  # initialize with tuple of zeros
    for _ in range(k):
        sums = {tuple(s[i]+x[i] for i in range(n)) for s in sums for x in lst}
    return sums

unique_combination_sums(lst, 4)
Candlepin answered 15/4, 2023 at 7:8 Comment(3)
There is a typo: In solution last sum is (6,6,4) not (6,6,2).Baresark
The second example you give is simply combining the lists generated by range. Is this just an example? Or is the ACTUAL object also generated by range?Anthony
@Anthony the second example I'm generating with the 3 nested ranges, is simply to populate lst with artificial but somewhat realistic data to benchmark the unique_combination_sums function. The actual object is not generated by range, but we can assume random integers between 0 and 50.Candlepin
T
5

You can actually encode your tuples as integers. Since you mention that the integers range [0, 50], and there may be up to 5 such integers, so that creates a range of 51^5 = 345,025,251 values, which is perfectly doable.

To understand how we can do this encoding, think about how decimal numbers work- 123 means 1*100 + 2*10 + 1*1. Each digit is multiplied by the base (10) raised to some power, corresponding to its position. Each number has only one representation, because each digit is less than the base (10) itself. We could do something similar then; we could choose a sufficiently large base, say 100, and multiply each value in the tuple by the base to its corresponding power. Take the following example:

(1, 4, 7)
-> 1*100^2 + 4*100^1 + 7*100^0
-> 1*10000 + 4*100   + 7
-> 10407

This work work perfectly well by itself, however whatever underlying solver you're using for the integer case may very well perform better on smaller numbers, so we really should try to "compact" the representation as much as possible. This means picking the smallest base possible. In fact, it means picking multiple bases, for a mixed-radix number system. Without going into too much detail, it means that if one position of the tuples only spans a small interval of integers, we won't "waste" space for values that won't ever exist at that specific tuple position. What this may look like, for an arbitrary example:

(1, 4, 7, 11)
-> 1*22*7*15 + 4*22*7 + 7*22 + 11*1
-> 2310      + 616    + 154  + 11
-> 3091
// Here we arbitrarily choose the radices [22, 7, 15]
// In practice, we actually choose meaningful (and minimal) radices

Furthermore, we can also subtract off the smallest value at tuple position, to shrink the values even further. We just have to remember to add back the appropriate offset multiplied by the number of elements when we convert the value back to a tuple.

All that said, here's the code to do exactly that:

from functools import wraps


def transform_tuples(func):
    @wraps(func)
    def inner(arr, n):
        if n == 0 or not arr:
            return set()
        
        groups = [(max(g)-min(g), min(g)) for g in zip(*arr)]
        
        def encode(tup):
            val = 0
            for (size, low), elem in zip(groups, tup):
                val *= size * n + 1
                val += elem - low
            return val
            
        def decode(val):
            tup = []
            for size, low in groups[::-1]:
                val, part = divmod(val, size * n + 1)
                tup.append(part + low * n)
            return tuple(tup[::-1])
            
        result = func([encode(tup) for tup in arr], n)
        return [decode(val) for val in result]
    
    return inner

This is a decorator- you apply it to function the solves the original integer-based problem, and it will transform the function into one that operates on tuples.

For example, taking the Kelly1 solution from the related question you linked above, we can decorate it, and it will then work on tuples:

@transform_tuples
def Kelly1(a, n):
    sums = {0}
    for _ in range(n):
        sums = {s + x for s in sums for x in a}
    return sums

Calling it on your example:

tuples = [(1,0,0), (2,1,1), (3,3,2)]
k = 2

print(Kelly1(tuples, k))

Produces:

[(2, 0, 0), (5, 4, 3), (3, 1, 1), (6, 6, 4), (4, 2, 2), (4, 3, 2)]

So you can take whichever implementation is the fastest, tweak it / optimize it as you like, and then decorate it to operate on tuples.

Toting answered 17/4, 2023 at 23:49 Comment(1)
Brilliant, this is 500x faster on my data than the code in my original post.Candlepin

© 2022 - 2024 — McMap. All rights reserved.