How does List Comprehension exactly work in Python? [duplicate]

Asked 14/8, 2016 at 11:13 Answered 14/8, 2016 at 11:45

I am going trough the docs of Python 3.X, I have doubt about List Comprehension speed of execution and how it exactly work.

Let's take the following example:

Listing 1

...
L = range(0,10)
L = [x ** 2 for x in L]
...

Now in my knowledge this return a new listing, and it's equivalent to write down:

Listing 2

...
res = []
for x in L:
  res.append(x ** 2)
...

The main difference is the speed of execution if I am correct. Listing 1 is supposed to be performed at C language speed inside the interpreter, meanwhile Listing 2 is not.

But Listing 2 is what the list comprehension does internally (not sure), so why Listing 1 is executed at C Speed inside the interpreter & Listing 2 is not? Both are converted to byte code before being processed, or am I missing something?

Amnion answered 14/8, 2016 at 11:13 Comment(6)

both are converted to byte code before being processed yeah, sure, so what? They are not converted to same byte code, so why they should take same amount of time? – Benefic 14/8, 2016 at 11:20

@ŁukaszRogalski isn't listing 2 what list comprehension does internally? If so the should be converted to the same byte code I guess – Amnion 14/8, 2016 at 11:23

The main purpose of list comprehensions is to simplify complex list creation and improve readability to a one-liner. You could potentially compare if there is a difference in the compiled bytecode pyc. The Python source code is open, so if you read C, you can read through the algorithms corresponding to list comprehension. – Vedetta 14/8, 2016 at 11:25

You may use dis module to analyze byte codes. And no, second code snippet does much more. All in-between states have to available to interpreter on each iteration (even if they are not used). res.append invokes a function call (and they are not that cheap in Python). List comprehensions allows you for much less that explicit loop, but due to this limitations it allows for much efficient implementation. – Benefic 14/8, 2016 at 11:26

I'll check with that module and also have a look at C implementation, thanks guys! – Amnion 14/8, 2016 at 11:30

blog.cdleary.com/2010/04/efficiency-of-list-comprehensions – Tor 14/8, 2016 at 11:56

Look at the actual bytecode that is produced. I've put the two fragments of code into fuctions called f1 and f2.

The comprehension does this:

  3          15 LOAD_CONST               3 (<code object <listcomp> at 0x7fbf6c1b59c0, file "<stdin>", line 3>)
             18 LOAD_CONST               4 ('f1.<locals>.<listcomp>')
             21 MAKE_FUNCTION            0
             24 LOAD_FAST                0 (L)
             27 GET_ITER
             28 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             31 STORE_FAST               0 (L)

Notice there is no loop in the bytecode. The loop happens in C.

Now the for loop does this:

  4          21 SETUP_LOOP              31 (to 55)
             24 LOAD_FAST                0 (L)
             27 GET_ITER
        >>   28 FOR_ITER                23 (to 54)
             31 STORE_FAST               2 (x)
             34 LOAD_FAST                1 (res)
             37 LOAD_ATTR                1 (append)
             40 LOAD_FAST                2 (x)
             43 LOAD_CONST               3 (2)
             46 BINARY_POWER
             47 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             50 POP_TOP
             51 JUMP_ABSOLUTE           28
        >>   54 POP_BLOCK

In contrast to the comprehension, the loop is clearly here in the bytecode. So the loop occurs in python.

The bytecodes are different, and the first should be faster.

Crass answered 14/8, 2016 at 11:45 Comment(3)

Yes that's it! I've just checked the byte code as suggested by you & Lukasz and got the same result. Actually yes the two listing are the same, and list comprehension does the same thing as L2 but in C therefore is obviously faster, roughly twice. Thanks once again! – Amnion 14/8, 2016 at 12:2

@Amnion there is a loop in the bytecode, it is in 3 (<code object <listcomp> at 0x7fbf6c1b59c0, file "<stdin>", line 3>). You can see this by introspecting the code object – Silage 2/4, 2019 at 17:28

e.g. dis.dis(f1.__code__.co_consts[3]) (or it will automatically recurse in Python 3.7). The speed advantage comes from a special opcode for the .append operation, but you could get almost the same speed-up by "caching" the method resolution, l_append = L.append – Silage 2/4, 2019 at 17:35

The answer is actually in your question.

When you run any built in python function you are running something that has been written in C and compiled into machine code.

When you write your own version of it, that code must be converted into CPython objects which are handled by the interpreter.

In consequence the built-in approach or function is always quicker (or takes less space) in Python than writing your own function.

Hydrocarbon answered 14/8, 2016 at 11:22 Comment(0)

Recommended topics

Hot tags