Why doesn't iterating work the second time for iterators?
It does "work", in the sense that the for
loop in the examples does run. It simply performs zero iterations. This happens because the iterator is "exhausted"; it has already iterated over all of the elements.
Why does it work for other kinds of iterables?
Because, behind the scenes, a new iterator is created for each loop, based on that iterable. Creating the iterator from scratch means that it starts at the beginning.
This happens because iterating requires an iterable. If an iterable was already provided, it will be used as-is; but otherwise, a conversion is necessary, which creates a new object.
Given an iterator, how can we iterate twice over the data?
By caching the data; starting over with a new iterator (assuming we can re-create the initial condition); or, if the iterator was specifically designed for it, seeking or resetting the iterator. Relatively few iterators offer seeking or resetting.
Caching
The only fully general approach is to remember what elements were seen (or determine what elements will be seen) the first time and iterate over them again. The simplest way is by creating a list
or tuple
from the iterator:
elements = list(iterator)
for element in elements:
...
for element in elements:
...
Since the list
is a non-iterator iterable, each loop will create a new iterable that iterates over all the elements. If the iterator is already "part way through" an iteration when we do this, the list
will only contain the "following" elements:
abstract = (x for x in range(10)) # represents integers from 0 to 9 inclusive
next(abstract) # skips the 0
concrete = list(abstract) # makes a list with the rest
for element in concrete:
print(element) # starts at 1, because the list does
for element in concrete:
print(element) # also starts at 1, because a new iterator is created
A more sophisticated way is using itertools.tee
. This essentially creates a "buffer" of elements from the original source as they're iterated over, and then creates and returns several custom iterators that work by remembering an index, fetching from the buffer if possible, and appending to the buffer (using the original iterable) when necessary. (In the reference implementation of modern Python versions, this does not use native Python code.)
from itertools import tee
concrete = list(range(10)) # `tee` works on any iterable, iterator or not
x, y = tee(concrete, 2) # the second argument is the number of instances.
for element in x:
print(element)
if element == 3:
break
for element in y:
print(element) # starts over at 0, taking 0, 1, 2, 3 from a buffer
Starting over
If we know and can recreate the starting conditions for the iterator when the iteration started, that also solves the problem. This is implicitly what happens when iterating multiple times over a list: the "starting conditions for the iterator" are just the contents of the list, and all the iterators created from it give the same results. For another example, if a generator function does not depend on an external state, we can simply call it again with the same parameters:
def powers_of(base, *range_args):
for i in range(*range_args):
yield base ** i
exhaustible = powers_of(2, 1, 12):
for value in exhaustible:
print(value)
print('exhausted')
for value in exhaustible: # no results from here
print(value)
# Want the same values again? Then use the same generator again:
print('replenished')
for value in powers_of(2, 1, 12):
print(value)
Seekable or resettable iterators
Some specific iterators may make it possible to "reset" iteration to the beginning, or even to "seek" to a specific point in the iteration. In general, iterators need to have some kind of internal state in order to keep track of "where" they are in the iteration. Making an iterator "seekable" or "resettable" simply means allowing external access to, respectively, modify or re-initialize that state.
Nothing in Python disallows this, but in many cases it's not feasible to provide a simple interface; in most other cases, it just isn't supported even though it might be trivial. For generator functions, the internal state in question, on the other hand, the internal state is quite complex, and protects itself against modification.
The classic example of a seekable iterator is an open file
object created using the built-in open
function. The state in question is a position within the underlying file on disk; the .tell
and .seek
methods allow us to inspect and modify that position value - e.g. .seek(0)
will set the position to the beginning of the file, effectively resetting the iterator. Similarly, csv.reader
is a wrapper around a file; seeking within that file will therefore affect the subsequent results of iteration.
In all but the simplest, deliberately-designed cases, rewinding an iterator will be difficult to impossible. Even if the iterator is designed to be seekable, this leaves the question of figuring out where to seek to - i.e., what the internal state was at the desired point in the iteration. In the case of the powers_of
generator shown above, that's straightforward: just modify i
. For a file, we'd need to know what the file position was at the beginning of the desired line, not just the line number. That's why the file interface provides .tell
as well as .seek
.
Here's a re-worked example of powers_of
representing an unbound sequence, and designed to be seekable, rewindable and resettable via an exponent
property:
class PowersOf:
def __init__(self, base):
self._exponent = 0
self._base = base
def __iter__(self):
return self
def __next__(self):
result = self._base ** self._exponent
self._exponent += 1
return result
@property
def exponent(self):
return self._exponent
@exponent.setter
def exponent(self, value):
if not isinstance(new_value, int):
raise TypeError("must set with an integer")
if new_value < 0:
raise ValueError("can't set to negative value")
self._exponent = new_value
Examples:
pot = PowersOf(2)
for i in pot:
if i > 1000:
break
print(i)
pot.exponent = 5 # jump to this point in the (unbounded) sequence
print(next(pot)) # 32
print(next(pot)) # 64
Technical detail
Iterators vs. iterables
Recall that, briefly:
- "iteration" means looking at each element in turn, of some abstract, conceptual sequence of values. This can include:
- "iterable" means an object that represents such a sequence. (What the Python documentation calls a "sequence" is in fact more specific than that - basically it also needs to be finite and ordered.). Note that the elements do not need to be "stored" - in memory, disk or anywhere else; it is sufficient that we can determine them during the process of iteration.
- "iterator" means an object that represents a process of iteration; in some sense, it keeps track of "where we are" in the iteration.
Combining the definitions, an iterable is something that represents elements that can be examined in a specified order; an iterator is something that allows us to examine elements in a specified order. Certainly an iterator "represents" those elements - since we can find out what they are, by examining them - and certainly they can be examined in a specified order - since that's what the iterator enables. So, we can conclude that an iterator is a kind of iterable - and Python's definitions agree.
How iteration works
In order to iterate, we need an iterator. When we iterate in Python, an iterator is needed; but in normal cases (i.e. except in poorly written user-defined code), any iterable is permissible. Behind the scenes, Python will convert other iterables to corresponding iterators; the logic for this is available via the built-in iter
function. To iterate, Python repeatedly asks the iterator for a "next element" until the iterator raises a StopException
. The logic for this is available via the built-in next
function.
Generally, when iter
is given a single argument that already is an iterator, that same object is returned unchanged. But if it's some other kind of iterable, a new iterator object will be created. This directly leads to the problem in the OP. User-defined types can break both of these rules, but they probably shouldn't.
The iterator protocol
Python roughly defines an "iterator protocol" that specifies how it decides whether a type is an iterable (or specifically an iterator), and how types can provide the iteration functionality. The details have changed a slightly over the years, but the modern setup works like so:
Anything that has an __iter__
or a __getitem__
method is an iterable. Anything that defines an __iter__
method and a __next__
method is specifically an iterator. (Note in particular that if there is a __getitem__
and a __next__
but no __iter__
, the __next__
has no particular meaning, and the object is a non-iterator iterable.)
Given a single argument, iter
will attempt to call the __iter__
method of that argument, verify that the result has a __next__
method, and return that result. It does not ensure the presence of an __iter__
method on the result. Such objects can often be used in places where an iterator is expected, but will fail if e.g. iter
is called on them.) If there is no __iter__
, it will look for __getitem__
, and use that to create an instance of a built-in iterator type. That iterator is roughly equivalent to
class Iterator:
def __init__(self, bound_getitem):
self._index = 0
self._bound_getitem = bound_getitem
def __iter__(self):
return self
def __next__(self):
try:
result = self._bound_getitem(self._index)
except IndexError:
raise StopIteration
self._index += 1
return result
Given a single argument, next
will attempt to call the __next__
method of that argument, allowing any StopIteration
to propagate.
With all of this machinery in place, it is possible to implement a for
loop in terms of while
. Specifically, a loop like
for element in iterable:
...
will approximately translate to:
iterator = iter(iterable)
while True:
try:
element = next(iterator)
except StopIteration:
break
...
except that the iterator is not actually assigned any name (the syntax here is to emphasize that iter
is only called once, and is called even if there are no iterations of the ...
code).
def _view(self,dbName): db = self.dictDatabases[dbName] data = db[3]
can be removed safely since no other answer discusses that portion of the code. – Nilladata
. The issue with thedb
lines isn't so much that they were unnecessary, but that they didn't explain howdata
came to be an iterator. – Guelph