How to make itertools combinations 'increase' evenly?
Asked Answered
B

2

6

Consider the following example:

import itertools
import numpy as np

a = np.arange(0,5)
b = np.arange(0,3)
c = np.arange(0,7)

prods = itertools.product(a,b,c)

for p in prods:
    print(p)

This iterate over the products in the following order:

(0, 0, 0)
(0, 0, 1)
(0, 0, 2)
(0, 0, 3)
(0, 0, 4)
(0, 1, 0)

But I would much rather have the products given in order of their sum, e.g.

(0, 0, 0)
(0, 0, 1)
(0, 1, 0)
(1, 0, 0)
(0, 1, 1)
(1, 0, 1)
(1, 1, 0)
(0, 0, 2)

How can I achieve this without storing all combinations in memory?

Note: a b and c are always ranges, but not necessarily with the same maximum. There is also no 2nd-level ordering when the sums of two products are equal, i.e. (0,1,1) is equivalent to (2,0,0).

Begum answered 28/9, 2021 at 13:50 Comment(11)
You can't do that with product, which deliberately orders them as shown. This would be quite hard to solve in general and require at least some data to be held in memory, unless you can guarantee ordered inputs with matching steps.Meemeece
@Meemeece but perhaps there is a way to map one ordering to the other? that's something I was looking at but without successBegum
Are a,b,c always ranges?Directorate
@DaniMesejo yes!Begum
@ThomasWagenaar to do that you'd need to consume then sort the whole series, which might require impractical amounts of storage. What I'm saying is that, depending on the specific context, there may be ways to keep this in the iterator world.Meemeece
Are a, b, and c always the same range? Edit: Also, are they ever ranges with strides larger than 1?Denticle
@Denticle no, but good one, i'll add it it to questionBegum
For the sake of providing an answer, if memory consumption isn't an issue this can done by iterating over sorted(prods, key=lambda t: (sum(t), t.count(0))).Skite
Are you looking for an ordering to tiebreak equal sums, or is any ordering fine? This can be done, but probably not using itertoolsDenticle
@Denticle no there is no preference for ordering in the tiebreak case, so (0,0,2) and (1,1,0) are 'equal'Begum
Are the strides always 1?Directorate
D
3

The easiest way to do this without storing extra products in memory is with recursion. Instead of range(a,b), pass in a list of (a,b) pairs and do the iteration yourself:

def prod_by_sum(range_bounds: List[Tuple[int, int]]):
    """
    Yield from the Cartesian product of input ranges, produced in order of sum.

    >>> range_bounds = [(2, 4), (3, 6), (0, 2)]
    >>> for prod in prod_by_sum(range_bounds):
    ...    print(prod)
    (2, 3, 0)
    (2, 3, 1)
    (2, 4, 0)
    (3, 3, 0)
    (2, 4, 1)
    (2, 5, 0)
    (3, 3, 1)
    (3, 4, 0)
    (2, 5, 1)
    (3, 4, 1)
    (3, 5, 0)
    (3, 5, 1)

    """
    def prod_by_sum_helper(start: int, goal_sum: int):
        low, high = range_bounds[start]
        if start == len(range_bounds) - 1:
            if low <= goal_sum < high:
                yield (goal_sum,)
            return

        for current in range(low, min(high, goal_sum + 1)):
            yield from ((current,) + extra
                        for extra in prod_by_sum_helper(start + 1, goal_sum - current))

    lowest_sum = sum(lo for lo, hi in range_bounds)
    highest_sum = sum(hi - 1 for lo, hi in range_bounds)

    for goal_sum in range(lowest_sum, highest_sum + 1):
        yield from prod_by_sum_helper(0, goal_sum)

which has output for range_bounds = [(0, 5), (0, 3), (0, 7)] starting with:

(0, 0, 0)
(0, 0, 1)
(0, 1, 0)
(1, 0, 0)
(0, 0, 2)
(0, 1, 1)
(0, 2, 0)
(1, 0, 1)
(1, 1, 0)
(2, 0, 0)

You can do this exact process iteratively by modifying a single list and yielding copies of it, but the code either becomes more complicated or less efficient.

You can also trivially modify this to support steps besides 1, however that does work less efficiently with larger and larger steps since the last range might not contain the element needed to produce the current sum. That seems unavoidable, because at that point you'd need to solve a difficult computational problem to efficiently loop over those products by sum.

Denticle answered 28/9, 2021 at 15:55 Comment(0)
D
3

If the steps are always 1 and avoiding storing all combinations is your top priority, you could do the following (partially using itertools.product):

import itertools
import numpy as np

def weak_compositions(boxes, balls, parent=tuple()):
    """https://mcmap.net/q/131228/-next-composition-of-n-into-k-parts-does-anyone-have-a-working-algorithm-closed"""
    if boxes > 1:
        for i in range(balls + 1):
            for x in weak_compositions(boxes - 1, i, parent + (balls - i,)):
                yield x
    else:
        yield parent + (balls,)


def verify_limits(x, minimum, maximum):
    all_max = all(xi <= li for xi, li in zip(x, maximum))
    all_min = all(xi >= li for xi, li in zip(x, minimum))
    return all_max and all_min


def iterate_in_sum(ranges):
    prods = itertools.product(*ranges)

    # find number of different sums
    unique_total_sums = sorted(set(sum(p) for p in prods))

    # find the minimum limits
    minimum_allowed = [min(r) for r in ranges]

    # find the maximum limits
    maximum_allowed = [max(r) for r in ranges]

    for total_sum in unique_total_sums:
        # decompose each sum into its summands
        for x in weak_compositions(len(ranges), total_sum):

            # if the decomposition meets the limits
            if verify_limits(x, minimum_allowed, maximum_allowed):
                yield x


a = np.arange(0, 5)
b = np.arange(0, 3)
c = np.arange(0, 7)

for s in iterate_in_sum([a, b, c]):
    print(s, sum(s))

Output (partial)

(0, 0, 0) 0
(1, 0, 0) 1
(0, 1, 0) 1
(0, 0, 1) 1
(2, 0, 0) 2
(1, 1, 0) 2
(1, 0, 1) 2
(0, 2, 0) 2
(0, 1, 1) 2
(0, 0, 2) 2
(3, 0, 0) 3
(2, 1, 0) 3
(2, 0, 1) 3
(1, 2, 0) 3
(1, 1, 1) 3
(1, 0, 2) 3
(0, 2, 1) 3
(0, 1, 2) 3

The core of the solution is the weak_compositions function that will decompose a number into it's summands (something like integer partition). More solutions to the above problem of composition of n into k parts can be found here.

Note:

The solution can be extended to non uniform steps with additional complexity cost.

Directorate answered 28/9, 2021 at 15:44 Comment(4)
I'm curious about why for total_sum in unique_total_sums: works to produce the sums in order from lowest to highest, given that unique_total_sums is a set. Is that a coincidence for this input, or do Python sets preserve insertion order now?Denticle
@Denticle this explain it #45582401Directorate
In any case I just added sorted to make it explicitDirectorate
Ah thank you, that's a very useful link. Apparently the fact that small positive ints hash to themselves means sets are usually sorted; from the link, not using sorted would cause this to fail on inputs like a = np.arange(5, 8), b = np.arange(1, 3)Denticle
D
3

The easiest way to do this without storing extra products in memory is with recursion. Instead of range(a,b), pass in a list of (a,b) pairs and do the iteration yourself:

def prod_by_sum(range_bounds: List[Tuple[int, int]]):
    """
    Yield from the Cartesian product of input ranges, produced in order of sum.

    >>> range_bounds = [(2, 4), (3, 6), (0, 2)]
    >>> for prod in prod_by_sum(range_bounds):
    ...    print(prod)
    (2, 3, 0)
    (2, 3, 1)
    (2, 4, 0)
    (3, 3, 0)
    (2, 4, 1)
    (2, 5, 0)
    (3, 3, 1)
    (3, 4, 0)
    (2, 5, 1)
    (3, 4, 1)
    (3, 5, 0)
    (3, 5, 1)

    """
    def prod_by_sum_helper(start: int, goal_sum: int):
        low, high = range_bounds[start]
        if start == len(range_bounds) - 1:
            if low <= goal_sum < high:
                yield (goal_sum,)
            return

        for current in range(low, min(high, goal_sum + 1)):
            yield from ((current,) + extra
                        for extra in prod_by_sum_helper(start + 1, goal_sum - current))

    lowest_sum = sum(lo for lo, hi in range_bounds)
    highest_sum = sum(hi - 1 for lo, hi in range_bounds)

    for goal_sum in range(lowest_sum, highest_sum + 1):
        yield from prod_by_sum_helper(0, goal_sum)

which has output for range_bounds = [(0, 5), (0, 3), (0, 7)] starting with:

(0, 0, 0)
(0, 0, 1)
(0, 1, 0)
(1, 0, 0)
(0, 0, 2)
(0, 1, 1)
(0, 2, 0)
(1, 0, 1)
(1, 1, 0)
(2, 0, 0)

You can do this exact process iteratively by modifying a single list and yielding copies of it, but the code either becomes more complicated or less efficient.

You can also trivially modify this to support steps besides 1, however that does work less efficiently with larger and larger steps since the last range might not contain the element needed to produce the current sum. That seems unavoidable, because at that point you'd need to solve a difficult computational problem to efficiently loop over those products by sum.

Denticle answered 28/9, 2021 at 15:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.