Why this list comprehension is faster than equivalent generator expression?

Asked 30/4, 2013 at 19:14 Answered 3/9, 2018 at 5:6

Solved python python-3.x list-comprehension generator-expression

I'm using Python 3.3.1 64-bit on Windows and this code snippet:

len ([None for n in range (1, 1000000) if n%3 == 1])

executes in 136ms, compared to this one:

sum (1 for n in range (1, 1000000) if n%3 == 1)

which executes in 146ms. Shouldn't a generator expression be faster or the same speed as the list comprehension in this case?

I quote from Guido van Rossum From List Comprehensions to Generator Expressions:

...both list comprehensions and generator expressions in Python 3 are actually faster than they were in Python 2! (And there is no longer a speed difference between the two.)

EDIT:

I measured the time with timeit. I know that it is not very accurate, but I care only about relative speeds here and I'm getting consistently shorter time for list comprehension version, when I test with different numbers of iterations.

Joinville answered 30/4, 2013 at 19:14 Comment(10)

And how did you measure the speed difference? – Baran 30/4, 2013 at 19:16

A difference of 7% is pretty trivial—especially if you're not timing very accurately. (A typical naive implementation with time or clock instead of timeit for something that takes only 1/8th of a second can easily have an error much, much larger than 7%.) – Goon 30/4, 2013 at 19:21

Why are you comparing len with sum? Counting elements is a lot faster than adding their contents. – Beagle 30/4, 2013 at 19:25

Somewhat surprisingly, in PyPy 1.9.0 (which is Python 2.7.2, and doesn't have any of the modern genexp improvements), the genexp version is almost twice as fast (26.6ms vs. 49.7ms). The adding probably doesn't matter there (because in PyPy, adding integers is a few orders of magnitude faster than iterating), but I'm still a bit surprised by the results. – Goon 30/4, 2013 at 19:34

@MartijnPieters I use timeit - edited the question. – Joinville 30/4, 2013 at 19:47

@TimPietzcker I'm adding 1, which is probably the same as visiting each list element and incrementing a counter. If list is a regular structure then counting is much faster, but what about an overhead of creating the list? – Joinville 30/4, 2013 at 19:53

@Goon I'm using timeit and I tried larger number of iterations with the same result. – Joinville 30/4, 2013 at 19:55

@PaulJurczak: Probably? I don't think sum() is smart enough to figure out that all it ever needs to add in this special case is just 1s... – Beagle 30/4, 2013 at 19:56

@PaulJurczak: You can read the source, but I'm willing to bet that sum does a PyNumber_InPlaceAdd for each element returned by PyIter_Next, so there's no way it can optimize the case of always adding 1. – Goon 30/4, 2013 at 20:11

A JIT-compiled implementation like PyPy or Jython, can at least theoretically notice that the first few elements are all 1, and create an optimized fast-path that it will keep using as long as the iterator keeps returning 1. But that will never happen in CPython. – Goon 30/4, 2013 at 20:13

I believe the difference here is entirely in the cost of 1000000 additions. Testing with 64-bit Python.org 3.3.0 on Mac OS X:

In [698]: %timeit len ([None for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 127 ms per loop
In [699]: %timeit sum (1 for n in range (1, 1000000) if n%3 == 1)
10 loops, best of 3: 138 ms per loop
In [700]: %timeit sum ([1 for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 139 ms per loop

So, it's not that the comprehension is faster than the genexp; they both take about the same time. But calling len on a list is instant, while summing 1M numbers adds another 7% to the total time.

Throwing a few different numbers at it, this seems to hold up unless the list is very tiny (in which case it does seem to get faster), or large enough that memory allocation starts to become a significant factor (which it isn't yet, at 333K).

Goon answered 30/4, 2013 at 19:26 Comment(10)

This is exactly what I found in my testing with Python 3.3.1 64bit (Win7). +1 – Beagle 30/4, 2013 at 19:26

@TimPietzcker: Since you were apparently writing your comment at the same time I was writing my answer, I'm not surprised we were also running the exact same test simultaneously. :) – Goon 30/4, 2013 at 19:28

For the sake of adding data -- with Python 3.2 32bit (Win7) I find the generator expression consistently 2% slower. Trivial, but reproducible. – Unbeatable 30/4, 2013 at 20:6

I didn't know that len of list is O(1) - I've spent only a few days with Python so far. Thanks for pointing this out. – Joinville 30/4, 2013 at 20:14

@PaulJurczak: It's actually surprisingly hard to dig the performance guarantees out of the documentation. However, if you know that a list is just a resizeable array, and that [0,1,2][3] raises an IndexError instead of segfaulting, obviously it must be keeping the length around somewhere, right? (In CPython, it's in the PyVarObject header.) So, it would be silly to not just return it immediately. – Goon 30/4, 2013 at 20:25

@StevenRumbalski: I don't know whether that's evidence that they've continued to improve genexps between 3.2 and 3.3, or that they've generally neglected optimizing 32-bit Python in favor of 64-bit… but it's an interesting data point. – Goon 30/4, 2013 at 20:26

@Goon it must be keeping the length around somewhere - it also means that the list length is incremented once per iteration, so technically expression generator should be faster (no memory allocation and initialization), unless it is implemented as a list too. – Joinville 30/4, 2013 at 21:2

@PaulJurczak: Well (still talking CPython here), the embedded length is a C long, not a Python object, and doing ++(self->ob_size) is orders of magnitude faster than the other stuff that happens once per iteration while building a list. You can profile it, but this is one of those nearly-free parts that can be ignored. As for memory allocation… if the list gets big enough, the genexp is faster, as Niklas B. pointed out in a comment to a deleted answer (and I added into my answer), but at 333K there's enough unused space that the list gets it for free. – Goon 30/4, 2013 at 21:50

@PaulJurczak: Finally, for initialization… well, the genexp initializes 333333 objects (the elements, one by one); the listcomp initializes 333334 (the element, plus the list itself). I wouldn't expect to be able to even measure that. (And in fact, the genexp probably initializes an iterator object of some kind under the covers, so probably there's no difference, not just a tiny one). – Goon 30/4, 2013 at 21:53

another data point: listcomp is twice as fast as genexpr ([True for _ in range(1000000)] vs. list(True for _ in range(1000000))) on my machine (Python 3.5.2, Ubuntu). ideone also shows that listcomp is faster than genexpr (though ideone is more a code example than an additional data point—comparing performance in the shared environment such as ideone is probably useless). – Telegraphese 20/9, 2016 at 0:19

Borrowed from this answer, there are two things to consider:

1. A Python list is index-able and fetching its length only takes O(1) times. This means that the speed of calling len() on a list does not depend on its size. However, if you call len() on a generator, you're consuming all the items it generates and thus, the time complexity is O(n).

2. See the linked answer above. A list comprehension is a tight C loop, whereas a generator has to store a reference to the iterator inside and call next(iter) for every item it generates. This creates another layer of overhead for generators. At a small scale, the difference in performance between list comprehension and generators can be safely ignored, but at a larger scale, you have to consider this.

Unbelievable answered 3/9, 2018 at 5:6 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags