Index into union of ranges on a single interval
Asked Answered
D

3

7

I have list of ranges which all of the ranges in that list has the same start and stop, but not the same step.
eg:

rr = [range(0, 10, 2), range(0, 10, 3)]

Of course, the list can contain more than only 2 ranges.
The sorted union of these ranges contains the following numbers:

u = [0, 2, 3, 4, 6, 8, 9]

I want to index into u (e.g., u[5] == 8).

The problem is that the range can be huge (millions) and I don't want to turn rr into u. I need somehow to calculate the value at an index without expanding the list.

Tried for several hours to think of an algorithm to do that, the best solution I found includes using binary search to do that, which I think it is not the ideal way of doing so.

Diplex answered 6/5, 2020 at 14:36 Comment(7)
will the list always contain 2 range iterablesDamaraland
So you basically don't want to use a static container but want the items ordered without duplicates?Melindamelinde
@Akash Karnatak, no it can contain more.Diplex
@ywbaek, I don't understand what do you mean by "static container", but yesDiplex
@DarkStorm97 what all information are available, like start, stop, all stepsDamaraland
@AkashKarnatak that information is available inside the range itself...Ideatum
Just how many ranges (divisors) are we talking about? I think any reasonable random-access algorithm will depend sensitively on that count (even though not on the size of the ranges).Moorehead
I
1

You can make a new range with the same start and end, and pack all steps to a new list. Now on each number from the range you can check if it matches any step. You can make this into a generator:

def steps_range(start, end, steps):
    for i in range(start, end):
        if any(i % x == 0 for x in steps):
            yield i

Now you can loop on that generator until you reach the relevant index. According to your example:

ranges = [range(0, 10, 2), range(0, 10, 3)]

start = ranges[0].start
end = ranges[0].stop
steps = [r.step for r in ranges]

target_index = 5

for i, num in enumerate(steps_range(start, end, steps)):
    print(num)
    if i == target_index:
        break

And this will print out:

0
2
3
4
6
8
Ideatum answered 6/5, 2020 at 14:47 Comment(2)
Again, the range can be huge. What if start = 0, stop = 1000000 and idx=900000. The for loop will run 900,000 timesDiplex
Well at least it will not create a list of size 1,000,000Ideatum
S
0

The key point is to use yield.

The difficulty is how to handle the situation when one yield ends. The final solution I choose is to use a dict, use min get the iterator who needs to move(next). And check if the iterator goes to the end. If so, move it out of the dict.

#!/usr/bin/env python3
import operator


class IterTool:
    def __init__(self, the_range):
        def foo(the_range):
            def bar():
                for i in the_range:
                    yield i
            return bar()

        self.the_range = foo(the_range)
        self.end = False

    def next_item(self):
        try:
            foo = next(self.the_range)
            return foo
        except StopIteration:
            self.end = True

    def is_end(self):
        return self.end


pool = {}
for i in [IterTool(range(0, 10000000000000, 2)), IterTool(range(0, 10000000000000, 3))]:
    pool[i] = i.next_item()
idx = 0
last_val = None
while all(map(lambda x: not x.is_end(), pool)):
    if len(pool) == 0:
        break
    key = min(pool.items(), key=operator.itemgetter(1))[0]
    val = pool[key]
    if val != last_val:
        if idx == 99999:
            print("%s=> %s" % (idx, val))
            break
        idx += 1
        last_val = val
    pool[key] = key.next_item()
    if key.is_end():
        del pool[key]

Result:

99999=> 149998

real    0m0.209s
user    0m0.200s
sys     0m0.004s
Scopp answered 6/5, 2020 at 15:43 Comment(0)
Y
0
def is_not_divisible_by_any(num, divs):
    return all(num % divisor for divisor in divs)

def get_idx_of_concated_list(the_list, idx):
    # Get the start, end and steps
    start, end = the_list[0].start, the_list[0].stop
    shifted_end = end - start
    steps = [r.step for r in the_list]

    # Get the number of numbers non-divisble by the steps until the given index (inclusive)
    num_not_divisibles = sum(is_not_divisible_by_any(num, steps) for num in range(idx+1))

    # The first candidate will be the given index + the above number of non-divisibles
    candidate = idx + num_not_divisibles

    # Forward by one till it is divisible by any of the steps
    while is_not_divisible_by_any(candidate, steps):
        candidate += 1

    # Check if the given index was valid
    if candidate > shifted_end:
        raise ValueError("The given index with those ranges exceed the stop value")

    # Since assumed start = 0, add the original start
    return candidate + start

# Example:
concat_list = [range(0, 1_000_000, 2), range(0, 1_000_000, 3), range(0, 1_000_000, 7)]
idx = 225_000
print(get_idx_of_concated_list(concat_list, idx))
# 289286

Explanation: without loss of generality, let's assume that the start is 0 (we can easily add the original start back at the end as you will see). Then, we essentially have the following sequence:

0, 1, 2, 3, 4, 5, ..., stop-1

If, for this sequence, we were asked to find value at x'th index, we'd directly say x as the answer. However, the steps of the ranges skip some values in this sequence. For example, if steps are 2 and 3, we would have 0, 2, 3, 4, 6, .. and so on. So if we can find the number of those skipped numbers that are not divisible by any of the given steps (till the given index, inclusive), then we would simply add it and get a candidate for the solution.

But the candidate still may not be divisible by any of the steps.(e.g consider your example in the question, where the number of non-divisibles would be 2 (1 and 5) and we add 2 to given index 5 and get 7; but this does not evenly divide 2 or 3). Therefore, we do an incremental search from the candidate and on, till we find our desired value. And lastly, since we assumed start as 0, we add the original start value back to get the result.

Edit: I added a check for the index so that it does not exceed the stop value.

Yogurt answered 6/5, 2020 at 15:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.