A recent similar question (isinstance(foo, types.GeneratorType) or inspect.isgenerator(foo)?) got me curious about how to implement this generically.
It seems like a generally-useful thing to have, actually, to have a generator-type object that will cache the first time through (like itertools.cycle
), report StopIteration, and then return items from the cache next time through, but if the object isn't a generator (i.e. a list or dict that inherently supports O(1) lookup), then don't cache, and have the same behaviour, but for the original list.
Possibilities:
1) Modify itertools.cycle. It looks like this:
def cycle(iterable):
saved = []
try:
saved.append(iterable.next())
yield saved[-1]
isiter = True
except:
saved = iterable
isiter = False
# cycle('ABCD') --> A B C D A B C D A B C D ...
for element in iterable:
yield element
if isiter:
saved.append(element)
# ??? What next?
If I could restart the generator, that would be perfect - I could send back a StopIteration, and then on the next gen.next(), return entry 0 i.e. `A B C D StopIteration A B C D StopIteration' but it doesn't look like that's actually possible.
Second would be that once StopIteration is hit, then saved has a cache. But it doesn't look like there's any way to get to the internal saved[] field. Maybe a class version of this?
2) Or I could pass in the list directly:
def cycle(iterable, saved=[]):
saved.clear()
try:
saved.append(iterable.next())
yield saved[-1]
isiter = True
except:
saved = iterable
isiter = False
# cycle('ABCD') --> A B C D A B C D A B C D ...
for element in iterable:
yield element
if isiter:
saved.append(element)
mysaved = []
myiter = cycle(someiter, mysaved)
But that just looks nasty. And in C/++ I could pass in some reference, and change the actual reference to saved to point to iterable - you can't actually do that in python. So this doesn't even work.
Other options?
Edit: More data. The CachingIterable method appears to be too slow to be effective, but it did push me in a direction that might work. It's slightly slower than the naive method (converting to list myself), but appears not to take the hit if it's already iterable.
Some code and data:
def cube_generator(max=100):
i = 0
while i < max:
yield i*i*i
i += 1
# Base case: use generator each time
%%timeit
cg = cube_generator(); [x for x in cg]
cg = cube_generator(); [x for x in cg]
cg = cube_generator(); [x for x in cg]
10000 loops, best of 3: 55.4 us per loop
# Fastest case: flatten to list, then iterate
%%timeit
cg = cube_generator()
cl = list(cg)
[x for x in cl]
[x for x in cl]
[x for x in cl]
10000 loops, best of 3: 27.4 us per loop
%%timeit
cg = cube_generator()
ci2 = CachingIterable(cg)
[x for x in ci2]
[x for x in ci2]
[x for x in ci2]
1000 loops, best of 3: 239 us per loop
# Another attempt, which is closer to the above
# Not exactly the original solution using next, but close enough i guess
class CacheGen(object):
def __init__(self, iterable):
if isinstance(iterable, (list, tuple, dict)):
self._myiter = iterable
else:
self._myiter = list(iterable)
def __iter__(self):
return self._myiter.__iter__()
def __contains__(self, key):
return self._myiter.__contains__(key)
def __getitem__(self, key):
return self._myiter.__getitem__(key)
%%timeit
cg = cube_generator()
ci = CacheGen(cg)
[x for x in ci]
[x for x in ci]
[x for x in ci]
10000 loops, best of 3: 30.5 us per loop
# But if you start with a list, it is faster
cg = cube_generator()
cl = list(cg)
%%timeit
[x for x in cl]
[x for x in cl]
[x for x in cl]
100000 loops, best of 3: 11.6 us per loop
%%timeit
ci = CacheGen(cl)
[x for x in ci]
[x for x in ci]
[x for x in ci]
100000 loops, best of 3: 13.5 us per loop
Any faster recipes that can get closer to the 'pure' loop?
StopIteration
is raised, then by the generator specification, it should no longer yield anything... – Bassarisklist()
and then iterating over the list - so yourCacheGen
would be the way to go. if ultimately you have to exhaust the whole iterator then you might as well do it all in one go at the beginning. But if you have infinite generators then you won't be able to do it that way. or if you might not iterate over the whole thing you'll waste resources. I've updated my answer with a more efficient "as you go" cacher, but still slower than the simple one – Evanensure_list = lambda i: i if isinstance(i, (list, tuple, dict)) else list(i)
=P. – Evan