What (if any) performance advantages are offered by using iterators. It seems like the 'Right Way' to solve many problems, but does it create faster/more memory-conscious code? I'm thinking specifically in Python, but don't restrict answers to just that.
There's actually a very good mail on the python mailing list about this: Iterators vs Lists. It's a bit dated (from 2003), but as far as I know, it's still valid.
Here's the summary:
For small datasets, iterator and list based approaches have similar performance. For larger datasets, iterators save both time and space.
What I would draw from it is this: iterators are to be preferred over loading data into a list if possible. But unless you have a big dataset, don't contort your code to make something that should fit in a list to work with an iterator.
Iterators will be faster and have better memory efficiency. Just think of an example of range(1000)
vs xrange(1000)
. (This has been changed in 3.0, range
is now an iterator.) With range
you pre-build your list, but xrange
is an iterator and yields the next item when needed instead.
The performance difference isn't great on small things, but as soon as you start cranking them out getting larger and larger sets of information you'll notice it quite quickly. Also, not just having to generate and then step through, you will be consuming extra memory for your pre-built item whereas with the iterator, only 1 item at a time gets made.
range
is not an iterator, it is an iterable. To prove this, try doing x = next(range(1000))
. You will get a TypeError
. You can get a returned iterator from range
by doing iter(range(1000))
. I think you meant to say that in 3.0 range
no longer returns a list. It returns one item at a time as you iterate over it. –
Crankcase The primary benefit of iterators is not one of performance. In my experience the most performant solution is creating an algorithm which embeds your data structure of choice. The benefit of iterators is that they allow you to decouple data and algorithm and, therefore, generalize and reuse both. If this can also be done without (or with little) performance degradation then it's a net gain.
My favorite example of iterator usage can be found in the C++ Standard Template Library. It manages to demonstrate the power and beauty of the abstraction by cleanly separating container and algorithm without sacrificing performance. Understanding this design had a profound effect on the way I think about code.
There is one answer that I think confuses the concept of generator and iterator a little bit. So I decided to give a try answering this question with a metaphor example.
I'm working at a kitchen, my boss give me a task of adding up the weight of 10 (or 100 or a million) breads. I have a scale and a calculator( magic tricks of my algorithmn). Below are the iterable object, generator, iterator, approach difference:
Iterable object: Each bread is stored in one box(memory), I weigh the first (or the 0th) bread, put down its weight, and put the bread back to the box, then go to the next one, weigh it and put it back, on and on, etc, etc. In the end, I got the overall weight, and the 10 (100 or million) breads are still there in their boxes.
Generator: There are not enough boxes to store all these bread, So I asked for the help of a baker(the generator), he makes the first bread, give it to me, I weigh it, put the result down, throw that bread away and ask him for another one,on and on, etc, until I got the last bread (or maybe the baker runs out of flour). In the end, I have the result, none of the bread is there. But who cares, my boss only asks me to weigh these breads, he didn't say I cannot throw them away ( what a brilliant busboy).
Iterator: I ask someone(iterator) to help me move first bread onto the scale, I weigh it, put the result down. This someone would go grab the next one for measuring, on and on, etc. I actually have no idea if someone (iterator) get the bread from a box or from a baker. Eventually, I got the overall weight, it doesn't matter to me.
Anyway, to sum up:
Iterable object need some memory to store data to start with.In the end, data is still there.
Generator wouldn't need memory to store data to start with, it generates data on the go.
Iterator is a channel between algorithm and its data. This data may already been there and stored in memory or may be generated on the go by a generator. In the first case, that memory would be freed bit by bit as iterator keeps iterating. So I agree a lot with above answer that iterator is good because of its abstraction that enables isolation of algorithm and data.
python doesn't exactly work like this. Hope it help clarify a little bit.
To backup the @Christian Witts's answer:
range
vs. xrange
performance
python25 -mtimeit "for i in xrange(1000): pass"
10000 loops, best of 3: 56.3 usec per loop
python25 -mtimeit "for i in range(1000): pass"
10000 loops, best of 3: 80.9 usec per loop
python26 -mtimeit "for i in xrange(1000): pass"
10000 loops, best of 3: 48.8 usec per loop
python26 -mtimeit "for i in range(1000): pass"
10000 loops, best of 3: 68.6 usec per loop
btw, neither range()
nor xrange()
are iterators:
>>> hasattr(range(1), 'next')
False
>>> hasattr(xrange(1), 'next')
False
>>> iter(xrange(1))
<rangeiterator object at 0x0097A500>
>>> iter(range(1))
<listiterator object at 0x00A7BFD0>
>>> iter([])
<listiterator object at 0x00A7BE30>
>>> iter(i for i in (1,))
<generator object at 0x00A7F940>
>>> (i for i in (1,))
<generator object at 0x00A7FDC8>
xrange
in Python 3.x therefore nothing to compare to. –
Elman My inference from many answers above is "Use list to code. If necessary, re-factor using iterators" The difference is not apparent unless you have a large dataset.
Another thing to note is that, even when using lists often, the dataset we are operating upon is progressively smaller and smaller.
Iterators are just classes that implement a particular interface, specifically an interface for going to the next one. In Python, lists, tuples, dicts, strings, and files all implement this interface. If they are implemented poorly, it may result in poor performance, but there is nothing inherent to the interface that implies good or bad performance.
An iterator is simply an object that provides methods to allow traversing through a collection. You could traverse all of the elements of an array or all the nodes of a tree with the same interface. Trees and arrays are very different data structures and require different methods to traverse .. but with an iterator you can loop through all elements in the same way.
For one type of collection, there may also be different ways to traverse it and a single collection could have multiple iterators .. You could have a depth-first iterator or a breadth-first iterator traversing a tree structure and returning the nodes in different orders. Iterators are not intended for performance ... but typically for providing a consistent interface for traversing structures.
Slightly off-topic but adds more weight to usage of lists over iterators in general: with iterators it's easier to have side effects, consider this:
def foo(arg: Iterable[str]):
print(list(arg)) # side effect: arg is exhausted at this point
...
You can say testing should catch this but sometimes it doesn't. Lists do not have this problem since they're stateless (in the sense of iteration).
© 2022 - 2024 — McMap. All rights reserved.