How do python Set Comprehensions work?
Asked Answered
C

1

14

Q1 - Is the following a set() of a generator expression or a set comprehension? (Or are they same? If so, are list & dict comprehensions also corresponding type-cast on generators?)

my_set = {x for x in range(10)}

Q2 - Does the evaluation consider duplicate values & then remove them by applying set()?

dup_set = {x for x in [0, 1, 2, 0, 1, 2]}

Does the comprehension perform (speed-wise) better than regular for loops?

Update - I tried using timeit for speed comparisons. Am not sure if I am being just (fair) about it.

C:\>python -m timeit "s = set()" "for x in range(10):" "
  s.add(x)"
100000 loops, best of 3: 2.3 usec per loop

C:\>python -m timeit "s = {x for x in range(10)}"
1000000 loops, best of 3: 1.68 usec per loop

Now, using some conditionals

C:\>python -m timeit "s = set()" "for x in range(10):" "
  if x%2: s.add(x)"
100000 loops, best of 3: 2.27 usec per loop

C:\>python -m timeit "s = {x for x in range(10) if x%2}"
1000000 loops, best of 3: 1.83 usec per loop

So, there is quite some difference, is it due to the functionality being hardcoded in c?

Cowans answered 10/12, 2013 at 14:1 Comment(1)
Maybe you could use timeit(docs.python.org/2/library/timeit.html) or build a code timing function with the time module to find out the time/speed differencesLet
E
8

Q1 : Yes, yes, yes and yes. Or at least they behave like this. It's a little bit different if you're taking a look at the bytecode. Let's disassembly this code (Python 2.7) :

def list_comp(l):
    return [x+1 for x in l]

def dict_comp(l):
    return {x+1:0 for x in l}

def set_comp(l):
    return {x+1 for x in l}

def generator(l):
    return (x+1 for x in l)

This is what you get:

Disassembly of list_comp:
  2           0 BUILD_LIST              0
              3 LOAD_FAST               0 (l)
              6 GET_ITER            
        >>    7 FOR_ITER               16 (to 26)
             10 STORE_FAST              1 (x)
             13 LOAD_FAST               1 (x)
             16 LOAD_CONST              1 (1)
             19 BINARY_ADD          
             20 LIST_APPEND             2
             23 JUMP_ABSOLUTE           7
        >>   26 RETURN_VALUE
Disassembly of dict_comp:
  5           0 LOAD_CONST              1 (<code object <dictcomp> at 029DEE30)
              3 MAKE_FUNCTION           0
              6 LOAD_FAST               0 (l)
              9 GET_ITER            
             10 CALL_FUNCTION           1
             13 RETURN_VALUE  
Disassembly of set_comp:
  8           0 LOAD_CONST              1 (<code object <setcomp> at 029DECC8)
              3 MAKE_FUNCTION           0
              6 LOAD_FAST               0 (l)
              9 GET_ITER            
             10 CALL_FUNCTION           1
             13 RETURN_VALUE  
Disassembly of generator:
 11           0 LOAD_CONST              1 (<code object <genexpr> at 02A8FD58)
              3 MAKE_FUNCTION           0
              6 LOAD_FAST               0 (l)
              9 GET_ITER            
             10 CALL_FUNCTION           1
             13 RETURN_VALUE                     

The bytecode is barely the same for the dict comprenhension, the set comprehension and the generator. They all load a code object (<dictcomp>, <setcomp> or <genexpr>) and then make a callable function out of it. The list comprehension is different because it generates the bytecode corresponding to your list comprehension. This time it is interpreted and thus not native.

Q2 : It doesn't really consider duplicate values since it creates a comprehension with the list you gave. And then it creates the set with the comprehension.

About timing : List/Dict/Set comprehensions tend to be faster than anything else. Even if they're interpreted, the bytecode generated is optimized for most of the cases with special bytecode instructions like SET_ADD, LIST_APPEND or MAP_ADD.

Equilibrist answered 10/12, 2013 at 14:29 Comment(4)
I'm not sure, I'll check that.Equilibrist
obviously what it's written in will depend on which python you are using (CPython, IronPython, Jython, PyPy, etc). however, the important point is that it is native i.e. not intepreted.Teutonic
In addition, the "for loop with add/append calls" version causes one function call per iteration, which means extra stack frames (not to mention the function's name has to be resolved in the first place, which is one array lookup and one dict lookup)... The comprehensions do none of that.Milkmaid
And in Python 3, list comprehensions got their own scope too.Degreeday

© 2022 - 2024 — McMap. All rights reserved.