Why converting list to set is faster than converting generator to set?
Asked Answered
K

1

2

Here is an example

>>> from timeit import timeit
>>> print(timeit('[y for y in range(100)]', number=100000))
0.7025867114395824
>>> print(timeit('(y for y in range(100))', number=100000))
0.09295392291478244
>>> print(timeit('set([y for y in range(100)])', number=100000))
1.0864544935180334
>>> print(timeit('set((y for y in range(100)))', number=100000))
1.1277489876506621

It is very confusing. Generator takes less time to create(and that is understandable) but why converting generator to set is slower than converting list when it should(atleast to my knowledge) have been the opposite.

Kielty answered 6/9, 2017 at 16:30 Comment(2)
(y for y in range(100)) doesn't do anything but create a single generator object. No iteration is done, so that's really a worthless test compared to other things that happen.Communicable
The advantages of generators are in memory consumption and the possibility of early termination; iterating over an entire generator is not expected to be faster than iterating over an entire corresponding list comprehension.Sorely
C
4

First of all, there is no point in timing the creation of a generator expression. Creating a generator doesn't iterate over the contents, so it's very fast. Spot the differences between creating a generator expression over one element vs. over 10 million:

>>> print(timeit('(y for y in range(1))', number=100000))
0.060932624037377536
>>> print(timeit('(y for y in range(10000000))', number=100000))
0.06168231705669314

Generators take more time to iterate over than, say a list object:

>>> from collections import deque
>>> def drain_iterable(it, _deque=deque):
...     deque(it, maxlen=0)
...
>>> def produce_generator():
...     return (y for y in range(100))
...
>>> print(timeit('drain_iterable(next(generators))',
...              'from __main__ import drain_iterable, produce_generator;'
...              'generators=iter([produce_generator() for _ in range(100000)])',
...              number=100000))
0.5204695729771629
>>> print(timeit('[y for y in range(100)]', number=100000))
0.3088444779859856

Here I tested iteration over the generator expression by just discarding all elements as fast as possible.

That's because a generator is essentially a function being executed until it yields a value, then is paused, then is activated again for the next value, then paused again. See What does the "yield" keyword do? for a good overview. The administration involved with this process takes time. In contrast, a list comprehension doesn't have to spend this time, it does all looping without re-activating and de-activating a function for every value produced.

Generators are memory efficient, not execution efficient. They can save execution time, sometimes, but usually because you are avoiding allocating and deallocating larger blocks of memory.

Communicable answered 6/9, 2017 at 16:44 Comment(2)
Okay, so you mean that since generator takes more time to iterate than list; converting generator to set is thus slower than converting list to same. Thanks for the clarification. I didn't knew(or I assumed wrong) than generators are slower than lists.Kielty
@IsaacVassell: exactly; all a set() does is iterate over the input to add the resulting elements, so I focused on their speed differences.Communicable

© 2022 - 2024 — McMap. All rights reserved.