Efficient cartesian product excluding items

Asked 15/1, 2020 at 9:45 Answered 15/1, 2020 at 10:51

Solved python python-itertools cartesian-product

I'm am trying to get all possible combinations of 11 values repeated 80 times but filter out cases where the sum is above 1. The code below achieves what I'm trying to do but takes days to run:

import numpy as np
import itertools

unique_values = np.linspace(0.0, 1.0, 11)

lst = []
for p in itertools.product(unique_values , repeat=80):
    if sum(p)<=1:
        lst.append(p)

The solution above would work but needs way too much time. Also, in this case I would have to periodically save the 'lst' into the disk and free the memory in order to avoid any memory errors. The latter part is fine, but the code needs days (or maybe weeks) to complete.

Is there any alternative?

Claudetteclaudia answered 15/1, 2020 at 9:45 Comment(12)

Is unique_values = np.linspace(0.0, 1.0, 11) real or is that an example? – Scribble 15/1, 2020 at 9:56

There are several implementations around stackoverflow of itertools.product in numpy, maybe that would be faster, otherwise, I'll try doing it in C or other fast language. Also, why np.linspace(0.0, 1.0, 11) instead of range(12)? – Hexagon 15/1, 2020 at 9:58

np.linspace(0.0, 1.0, 11) is real. The example above is exactly what I'm trying to get. If I used range(12) I would not be able to check if the sum is below 1 – Claudetteclaudia 15/1, 2020 at 10:8

@Claudetteclaudia then you simply want the 42 partitions of 10, scaled down by a factor of 10: wolframalpha.com/input/?i=partitions+of+10 – Scribble 15/1, 2020 at 10:14

@AlexHall I'm not exactly looking for the partitions of 10. For example, in my case the vector [0.1, 0, 0, 0....] is accepted (as long as the sum is <=1). They do not have to add to 1 necessarily. – Claudetteclaudia 15/1, 2020 at 10:20

My mistake, you also want the partitions of 9, 8, etc. – Scribble 15/1, 2020 at 10:22

Moreover, [0.2, 0.2, 0, 0 ...] and [0.2, 0, 0, 0.2, 0, 0 ...] are two different solutions.I need them both. – Claudetteclaudia 15/1, 2020 at 10:22

@Claudetteclaudia that will increase the number of results substantially. The worst example is that there are 37,957,920 (4! * 80C4) different versions of [0.1, 0.2, 0.3, 0.4, 0, 0, ...]. – Scribble 15/1, 2020 at 10:31

I know. That's why I'm trying to find the most efficient way to do this. – Claudetteclaudia 15/1, 2020 at 10:32

What is this ultimately for? Is storing gigabytes of lists of mostly zeroes really your best option? – Scribble 15/1, 2020 at 10:38

You can think of this as money. For instance, I could have 80 options to invest in and I want to see what proportion of my total money (1 in this case) should go to each investment. Based on these lists, I'm running some simulations on historical data, i.e. what would have happened if I had invested this proportion of money to each of the 80 options. – Claudetteclaudia 15/1, 2020 at 10:48

Try https://mcmap.net/q/129093/-elegant-python-code-for-integer-partitioning-closed – Scribble 20/1, 2020 at 20:11

Okay, this would be a bit more efficient, and you can use generator like this, and take your values as needed:

def get_solution(uniques, length, constraint):
    if length == 1:
        for u in uniques[uniques <= constraint + 1e-8]:
            yield u
    else:
        for u in uniques[uniques <= constraint + 1e-8]:
            for s in get_solution(uniques, length - 1, constraint - u):
                yield np.hstack((u, s))

g = get_solution(unique_values, 4, 1)
for _ in range(5):
    print(next(g))

prints

[0. 0. 0. 0.]
[0.  0.  0.  0.1]
[0.  0.  0.  0.2]
[0.  0.  0.  0.3]
[0.  0.  0.  0.4]

Comparing with your function:

def get_solution_product(uniques, length, constraint):
    return np.array([p for p in product(uniques, repeat=length) if np.sum(p) <= constraint + 1e-8])

%timeit np.vstack(list(get_solution(unique_values, 5, 1)))
346 ms ± 29.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit get_solution_product(unique_values, 5, 1)
2.94 s ± 256 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Criseyde answered 15/1, 2020 at 10:51 Comment(3)

@Viachelsav your solution does not return all the values I need. For instance, it does not return the solutions [0.8 0.2 0.0 0.0 0.0] or [0.8 0.0 0.0 0.2 0.0] which are valid. In fact, it returns exactly half of the possible solutions. – Claudetteclaudia 20/1, 2020 at 15:1

I'm accepting your answer. All I need to add is run the loop twice. Once for the unique_values and once for unique_values[::-1]. – Claudetteclaudia 21/1, 2020 at 8:57

@Claudetteclaudia that's strange, because it actually produces all the correct answers. check list(filter(lambda x: x[0] == 0.8, get_solution(unique_values, 5, 1))) – Criseyde 25/1, 2020 at 13:28

OP simply needs the partitions of 10, but here's some general code I wrote in the meantime.

def find_combinations(values, max_total, repeat):
    if not (repeat and max_total > 0):
        yield ()
        return
    for v in values:
        if v <= max_total:
            for sub_comb in find_combinations(values, max_total - v, repeat - 1):
                yield (v,) + sub_comb


def main():
    all_combinations = find_combinations(range(1, 11), 10, 80)
    unique_combinations = {
        tuple(sorted(t))
        for t in all_combinations
    }
    for comb in sorted(unique_combinations):
        print(comb)

main()

Scribble answered 15/1, 2020 at 10:16 Comment(3)

I'm not exactly looking for the partitions of 10. For example, in my case the vector [0.1, 0, 0, 0....] is accepted (as long as the sum is <=1). They do not have to add to 1 necessarily. Moreover, [0.2, 0.2, 0, 0 ...] and [0.2, 0, 0, 0.2, 0, 0 ...] are two different solutions.I need them both. – Claudetteclaudia 20/1, 2020 at 14:42

@Claudetteclaudia I understand, I haven't come back to this because that problem is a lot harder to make tractable. – Scribble 20/1, 2020 at 14:52

It's Ok. I've just added my previous comment under your solution for others' reference. – Claudetteclaudia 20/1, 2020 at 14:58

Recommended topics

Hot tags