Performance Advantages to Iterators?
Asked Answered
F

9

27

What (if any) performance advantages are offered by using iterators. It seems like the 'Right Way' to solve many problems, but does it create faster/more memory-conscious code? I'm thinking specifically in Python, but don't restrict answers to just that.

Forepart answered 10/3, 2009 at 4:5 Comment(2)
Are you sure you don't mean "generator" instead? If you actually mean iterators, then the only way to avoid them is to have a while loop and increment index variables manually, which is... pretty awkward...Bordelaise
I think what the OP means to ask is what the performance benefits of using an iterator directly are vs loading data into a list and then using its iterator.Bitterweed
B
34

There's actually a very good mail on the python mailing list about this: Iterators vs Lists. It's a bit dated (from 2003), but as far as I know, it's still valid.

Here's the summary:

For small datasets, iterator and list based approaches have similar performance. For larger datasets, iterators save both time and space.

What I would draw from it is this: iterators are to be preferred over loading data into a list if possible. But unless you have a big dataset, don't contort your code to make something that should fit in a list to work with an iterator.

Bitterweed answered 10/3, 2009 at 18:16 Comment(0)
N
13

Iterators will be faster and have better memory efficiency. Just think of an example of range(1000) vs xrange(1000). (This has been changed in 3.0, range is now an iterator.) With range you pre-build your list, but xrange is an iterator and yields the next item when needed instead.

The performance difference isn't great on small things, but as soon as you start cranking them out getting larger and larger sets of information you'll notice it quite quickly. Also, not just having to generate and then step through, you will be consuming extra memory for your pre-built item whereas with the iterator, only 1 item at a time gets made.

Nyx answered 10/3, 2009 at 7:7 Comment(1)
range is not an iterator, it is an iterable. To prove this, try doing x = next(range(1000)). You will get a TypeError. You can get a returned iterator from range by doing iter(range(1000)). I think you meant to say that in 3.0 range no longer returns a list. It returns one item at a time as you iterate over it.Crankcase
A
8

The primary benefit of iterators is not one of performance. In my experience the most performant solution is creating an algorithm which embeds your data structure of choice. The benefit of iterators is that they allow you to decouple data and algorithm and, therefore, generalize and reuse both. If this can also be done without (or with little) performance degradation then it's a net gain.

My favorite example of iterator usage can be found in the C++ Standard Template Library. It manages to demonstrate the power and beauty of the abstraction by cleanly separating container and algorithm without sacrificing performance. Understanding this design had a profound effect on the way I think about code.

Aekerly answered 10/3, 2009 at 5:0 Comment(0)
S
5

There is one answer that I think confuses the concept of generator and iterator a little bit. So I decided to give a try answering this question with a metaphor example.

I'm working at a kitchen, my boss give me a task of adding up the weight of 10 (or 100 or a million) breads. I have a scale and a calculator( magic tricks of my algorithmn). Below are the iterable object, generator, iterator, approach difference:

  1. Iterable object: Each bread is stored in one box(memory), I weigh the first (or the 0th) bread, put down its weight, and put the bread back to the box, then go to the next one, weigh it and put it back, on and on, etc, etc. In the end, I got the overall weight, and the 10 (100 or million) breads are still there in their boxes.

  2. Generator: There are not enough boxes to store all these bread, So I asked for the help of a baker(the generator), he makes the first bread, give it to me, I weigh it, put the result down, throw that bread away and ask him for another one,on and on, etc, until I got the last bread (or maybe the baker runs out of flour). In the end, I have the result, none of the bread is there. But who cares, my boss only asks me to weigh these breads, he didn't say I cannot throw them away ( what a brilliant busboy).

  3. Iterator: I ask someone(iterator) to help me move first bread onto the scale, I weigh it, put the result down. This someone would go grab the next one for measuring, on and on, etc. I actually have no idea if someone (iterator) get the bread from a box or from a baker. Eventually, I got the overall weight, it doesn't matter to me.

Anyway, to sum up:

  1. Iterable object need some memory to store data to start with.In the end, data is still there.

  2. Generator wouldn't need memory to store data to start with, it generates data on the go.

  3. Iterator is a channel between algorithm and its data. This data may already been there and stored in memory or may be generated on the go by a generator. In the first case, that memory would be freed bit by bit as iterator keeps iterating. So I agree a lot with above answer that iterator is good because of its abstraction that enables isolation of algorithm and data.

python doesn't exactly work like this. Hope it help clarify a little bit.

Spiegleman answered 4/11, 2016 at 17:25 Comment(0)
E
3

To backup the @Christian Witts's answer:

range vs. xrange performance

python25 -mtimeit "for i in xrange(1000): pass"
10000 loops, best of 3: 56.3 usec per loop

python25 -mtimeit "for i in range(1000): pass"
10000 loops, best of 3: 80.9 usec per loop

python26 -mtimeit "for i in xrange(1000): pass"
10000 loops, best of 3: 48.8 usec per loop

python26 -mtimeit "for i in range(1000): pass"
10000 loops, best of 3: 68.6 usec per loop

btw, neither range() nor xrange() are iterators:

>>> hasattr(range(1), 'next')
False
>>> hasattr(xrange(1), 'next')
False
>>> iter(xrange(1))
<rangeiterator object at 0x0097A500>
>>> iter(range(1))
<listiterator object at 0x00A7BFD0>
>>> iter([])
<listiterator object at 0x00A7BE30>
>>> iter(i for i in (1,))
<generator object at 0x00A7F940>
>>> (i for i in (1,))
<generator object at 0x00A7FDC8>
Elman answered 10/3, 2009 at 19:31 Comment(5)
btw, answer for python30 is 31.5 usec, doesn't really fit into your comparison, but good to know, i reckonMillihenry
@SilentGhost: there is no xrange in Python 3.x therefore nothing to compare to.Elman
@SilentGhost: Also, unless you have access to J.F. Sebastian's computer, the comparison is not very useful..Verbal
should be noted that the times are microseconds... there are probably better places in your code to spend your time optimising (like the database access)Starofbethlehem
@Jim: 1. The OP does ask about performance advantages. 2. Measure first, optimize second (don't guess that it is the database access, prove it and only then optimize it).Elman
B
3

My inference from many answers above is "Use list to code. If necessary, re-factor using iterators" The difference is not apparent unless you have a large dataset.

Another thing to note is that, even when using lists often, the dataset we are operating upon is progressively smaller and smaller.

Breathing answered 9/4, 2009 at 12:45 Comment(0)
D
2

Iterators are just classes that implement a particular interface, specifically an interface for going to the next one. In Python, lists, tuples, dicts, strings, and files all implement this interface. If they are implemented poorly, it may result in poor performance, but there is nothing inherent to the interface that implies good or bad performance.

De answered 10/3, 2009 at 4:42 Comment(2)
What you're saying is technically true to a point. However, I disagree that speed is a result of the quality of the underlying data structure. It depends more on whether the data structure is the right one for the task or if one is really even required.Bitterweed
My point is that none of this has to do with iterators as asked in the question. With an iterator, you call next() until StopIteration is raised. What that next() is doing is where your performance metric is. In the end, the accepted answer is about generators, not iterators so I guess it's moot.De
S
2

An iterator is simply an object that provides methods to allow traversing through a collection. You could traverse all of the elements of an array or all the nodes of a tree with the same interface. Trees and arrays are very different data structures and require different methods to traverse .. but with an iterator you can loop through all elements in the same way.

For one type of collection, there may also be different ways to traverse it and a single collection could have multiple iterators .. You could have a depth-first iterator or a breadth-first iterator traversing a tree structure and returning the nodes in different orders. Iterators are not intended for performance ... but typically for providing a consistent interface for traversing structures.

Stercoricolous answered 10/3, 2009 at 5:0 Comment(0)
T
2

Slightly off-topic but adds more weight to usage of lists over iterators in general: with iterators it's easier to have side effects, consider this:

def foo(arg: Iterable[str]):
  print(list(arg)) # side effect: arg is exhausted at this point
  ...

You can say testing should catch this but sometimes it doesn't. Lists do not have this problem since they're stateless (in the sense of iteration).

Toogood answered 20/11, 2017 at 23:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.