How to generate all possible combinations with a given condition to make it more efficient?
Asked Answered
S

2

7

(Python) I would like to generate all possible combinations with length 9 out of a sorted list list with 150 numbers. However, that's not very efficient, so I want to have a condition where the difference between each of the selected numbers is 150 or less in order to only generate combinations, that I can use later on. How can I achieve this in Python? The input list is sorted and I need the output to be sorted as well.

I already tried the combinations function from itertools, but as I already mentioned, that's not efficient and would produce more than a billion possible combinations.

itertools.combinations(list, 9)

Thanks in advance#

I already found this solution, which was very good. However the output wasn't sorted which was my problem. import itertools import random

def combs(nums):
    result = set()
    for lower in nums:
        options = [n for n in nums if lower <= n <= lower + 150]
        result.update(itertools.combinations(options, 9))
    return result

print(combs([random.randrange(0, 5000) for _ in range(150)]))
Stately answered 24/11, 2019 at 14:49 Comment(15)
just a note: 150 choose 9 is is actually a few dozen trillions possible combinationsAnne
Yeah i know, 150^9, exactly 38 443 359 375 000 000 000Stately
Maybe that's NP-Complete. Could you share the real purpose of doing that? Maybe we can help you with a different approachLubricious
So the contents of your list can be any number? Are there any bounds on that? And you'd like combinations of 9 picks from that list where the lowest pick and highest pick are no more than 150 apart? Do you need all the combinations that meet this condition, or just some?Bollay
I will clarify: if your data is like [3, 4, 4, 5, 8, 31, 41, 57, 57, 61, 63, <135 sorted ints here> ,979, 987, 990, 994] you don't want to have combinations like (3, 4, 4, 5, 8, 31, 41, 57, 994) from it?Filide
Yeah it can be any number and all of the combinations have to meet this condition and every number has to be no more than 150 apart to the previous and next number.Stately
@Filide Yes, exactlyStately
When you say 'previous and next number' that implies an order. Combinations are not ordered, so are you actually thinking permutations or are you implicitly assuming they'll be ordered before applying this condition? Also, you say 'all combinations must meet this condition'. That's different to 'we must find all combinations which meet this condition'. I have a suspicion however you do this it's going to be very very resource intensive.Bollay
Permutations aren't ordered but combinations are, if the input list is ordered. *itertools.combinations(iterable, r) : It return r-length tuples in sorted order with no repeated elements. *Stately
Does your sorted data have repeated elements? E.g. [3, 4, 4, 5, 8, ...]. If so, should combination with equal numbers be skipped?Filide
@Filide No, every element is only once in the list.Stately
@JaKali - yes sorry I was unclear. I meant mathematically permutations are defined by their order as well as their contents whereas combinations have no intrinsic order; it just happens that they come out of itertools ordered. If you mean no more than 150 difference between adjacent elements when numerically ordered, that's clear enough. This constraint may reduce the number of combinations but it's not at all clear that it makes the process of finding the combinations any faster overall.Bollay
@JaKalli123, i've just checked your algorithm - it seems not working properly. I've fed him arrays from my tests. So check out my answerFilide
Possible duplicate of Generate all combinations with maximum difference of consecutive elementsMonaxial
I think actually the duplicate question is by somebody who copy/pasted the text of this question there; before I edited it for clarity, it looked very similar to this one, but this one was posted a couple of hours earlier. I'll flag the other one, and move my answer from there to here.Monaxial
F
3

Here it is:

from itertools import combinations, islice, takewhile

def mad_combinations(data, comb_lenth, diff, create_comb=tuple):
    assert comb_lenth >= 2
    sorted_nums = sorted(frozenset(data))
    stop_index = len(sorted_nums) # or use None - what is faster?
    combination = [None]*comb_lenth # common memory

    def last_combinator(start_index, right_max_number):
        """Last combination place loop"""
        return takewhile(right_max_number.__ge__, islice(sorted_nums, start_index, stop_index))
        # In other words:
        # for x in islice(sorted_nums, start_index, stop_index):
        #     if x <= right_max_number:
        #         yield x
        #     else: return

    def _create_combinator(next_place_combinator, current_combination_place):
        # this namespace should store variables above
        def combinator(start_index, right_max_number):
            """Main loop"""
            for i, combination[current_combination_place] in \
                enumerate(
                    takewhile(
                        right_max_number.__ge__,
                        islice(sorted_nums, start_index, stop_index)),
                    start_index + 1):
                yield from ( # it yields last combination place number
                    next_place_combinator(i, combination[current_combination_place] + diff))

        return combinator

    for combination_place in range(comb_lenth-2, 0, -1): # create chain of loops
        last_combinator = _create_combinator(last_combinator, combination_place)

    last_index = comb_lenth - 1
    # First combination place loop:
    for j, combination[0] in enumerate(sorted_nums, 1):
        for combination[last_index] in last_combinator(j, combination[0] + diff):
            yield create_comb(combination) # don't miss to create a copy!!!

The function above is roughly equivalent to:

def example_of_comb_length_3(data, diff):
    sorted_nums = sorted(frozenset(data))
    for i1, n1 in enumerate(sorted_nums, 1):
        for i2, n2 in enumerate(sorted_nums[i1:], i1 + 1):
            if n2 - n1 > diff:break
            for n3 in sorted_nums[i2:]:
                if n3 - n2 > diff:break
                yield (n1, n2, n3)

Versions that use filter:

def insane_combinations(data, comb_lenth, diff):
    assert comb_lenth >= 2
    for comb in combinations(sorted(frozenset(data)), comb_lenth):
        for left, right in zip(comb, islice(comb, 1, comb_lenth)):
            if right - left > diff:
                break
        else:
            yield comb


def crazy_combinations(data, comb_lenth, diff):
    assert comb_lenth >= 2
    last_index = comb_lenth - 1
    last_index_m1 = last_index - 1
    last_rule = (lambda comb: comb[last_index] - comb[last_index_m1] <= diff)
    _create_rule = (lambda next_rule, left, right:
        (lambda comb: (comb[right] - comb[left] <= diff) and next_rule(comb)))
    for combination_place in range(last_index_m1, 0, -1): 
        last_rule = _create_rule(last_rule, combination_place - 1, combination_place)
    return filter(last_rule, combinations(sorted(frozenset(data)), comb_lenth))

Tests:

def test(fetch, expected, comb_length, diff):
    fetch = tuple(fetch)
    assert list(insane_combinations(fetch, comb_length, diff)) == \
           list(crazy_combinations(fetch, comb_length, diff)) == \
           list(mad_combinations(fetch, comb_length, diff)) == list(expected)

if __name__ == '__main__':
    test([1,2,3,4,5,6],
         comb_length=3, diff=2,
         expected=[
            (1, 2, 3), (1, 2, 4), (1, 3, 4), (1, 3, 5), (2, 3, 4), (2, 3, 5), (2, 4, 5),
            (2, 4, 6), (3, 4, 5), (3, 4, 6), (3, 5, 6), (4, 5, 6)])

    test([1, 2, 3, 8, 9, 10, 11, 12, 13],
         comb_length=3, diff=3,
         expected=[
             (1, 2, 3), (8, 9, 10), (8, 9, 11), (8, 9, 12), (8, 10, 11), (8, 10, 12),
             (8, 10, 13), (8, 11, 12), (8, 11, 13), (9, 10, 11), (9, 10, 12), (9, 10, 13),
             (9, 11, 12), (9, 11, 13), (9, 12, 13), (10, 11, 12), (10, 11, 13), (10, 12, 13),
             (11, 12, 13)])

I did not bother much with edge cases!! And I've tested only these 2 fetches! If you will find my answer helpful, be sure to test all possible options and write about bugs found (many bugs, I think). To check your concrete fetch use mad_combinations(your_fetch, 9, 150).

Filide answered 24/11, 2019 at 15:58 Comment(2)
First of all, thank you very much for your answer. However, this doesn't help me that much because it does still have to generate all possible combinations which is not really efficient. That was my problem and I'm searching for a way, where it only generates combinations which meet the condition.Stately
@Stately according to my benchmarks, this solution is quite efficient; for the sample input of 150 random numbers in the range 0-5000, there are about 17 million combinations meeting the condition, and this solution runs in a little under 16 seconds on my machine. The "generate all and filter" approach should take multiple years to finish running.Monaxial
M
2

Here's a solution using a recursive generator function: the function combinations_max_diff takes a list of numbers nums, a number of elements k, and a maximum difference max_diff.

The helper function does all of the work; it takes a partial combination comb, a number of remaining elements r, a minimum list index i for the next element to be chosen in the combination, and a max_next which controls the maximum size of that next element.

def combinations_max_diff(nums, k, max_diff):
    # input list must be sorted
    nums = sorted(nums)
    n = len(nums)

    def helper(comb, r, i, max_next):
        if r == 0:
            yield comb
        else:
            for ii in range(i, n - r + 1):
                v = nums[ii]
                if v > max_next: break
                comb_v = comb + (v,)
                yield from helper(comb_v, r - 1, ii + 1, v + max_diff)

    return helper((), k, 0, nums[-1])

Example usage:

>>> nums = [1, 2, 3, 4, 5, 6, 7]
>>> for c in combinations_max_diff(nums, 3, 2):
...     print(c)
... 
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(1, 3, 5)
(2, 3, 4)
(2, 3, 5)
(2, 4, 5)
(2, 4, 6)
(3, 4, 5)
(3, 4, 6)
(3, 5, 6)
(3, 5, 7)
(4, 5, 6)
(4, 5, 7)
(4, 6, 7)
(5, 6, 7)

The question asks about efficiency, so here's some idea about that:

>>> import random, timeit
>>> nums = sorted(random.randrange(0, 5000) for _ in range(150))
>>> len(list(combinations_max_diff(nums, 9, 150)))
16932905
>>> timeit.timeit(lambda: list(combinations_max_diff(nums, 9, 150)), number=1)
15.906288493999455

So, about 16 seconds to generate about 17 million combinations, or a little under one microsecond per combination on my machine.

Monaxial answered 24/11, 2019 at 23:50 Comment(10)
as I understand, an input is sorted set of numbers. So probably it's better to test sorted(frozenset(random.randrange(0, 5000) for _ in range(150)))Filide
sorted returns a list anyway, and filtering out duplicates just means the size won't necessarily be 150. The algorithm works perfectly well in case there are any duplicates.Monaxial
yes, but it makes measurements depend on length of fetch. I've just tried to compare your and my resuls, and it seams unfair because my algo throws duplications out explicitly - it is much cheatingFilide
I've read through your solution and it seems to be the same idea as mine, but you used a mutable list where I used tuple concatenation, and you used islice and takewhile where I've used range and break. I removed the frozenset from yours to do a fair comparison, and yours is about 5% faster than mine. I tried using a mutable list with mine but that actually made it about 20% slower; if you're sufficiently interested, you might want to check whether tuple concatenation could speed up your solution.Monaxial
For sure. My results are too broad - from (my18sec, yours 26sec) to (my 3.2 sec, yours 4.1 sec)Filide
The running time will depend significantly on whether there are any dense clumps of numbers, because the number of combinations is O(n^k) in the worst case when all the numbers are closer together than max_diff, and I think it should average about O(n^max(k,d)) where d is the number of list elements that fit in a window of size max_diff. So different randomly generated lists could have quite different running times; I made sure to test both on the same lists.Monaxial
Of course! I'm also tried to measure my other filter-algorithms and itertools.combinations, but MemoryError stoped me)Filide
For itertools.combinations, we can generously assume that it would take 1-2 years to finish running. Anything that iterates over all (150 choose 9) combinations and rejects some of them has to contend with the fact that (150 choose 9) is about 10^14.Monaxial
@Monaxial how stable is the run time with your solution? I know my solution below is suboptimal in various ways but fundamentally if I get a continuous block that is too big (i.e. too many consecutive numbers without a gap of 150) the runtime just blows up as itertools.combinations(block,9) takes ages. Worst case scenario, the whole thing is one big block and you're back to 150 choose 9.Bollay
The complexity of yielding each combination that meets the constraints, for any algorithm, will always be at least linear in the number of combinations that meet the constraints. Since that number of combinations depends quite a lot on the distribution of nums, no algorithm is going to be both efficient and consistent. The more useful measure is running time per combination yielded.Monaxial

© 2022 - 2024 — McMap. All rights reserved.