"Tuple comprehensions" and the star splat/unpack operator * [duplicate]
Asked Answered
A

1

6

I just read the question Why is there no tuple comprehension in Python?

In the comments of the accepted answer, it is stated that there are no true "tuple comprehensions". Instead, our current option is to use a generator expression and pass the resulting generator object to the tuple constructor:

tuple(thing for thing in things)

Alternatively, we can create a list using a list comprehension and then pass the list to the tuple constructor:

tuple([thing for thing in things])

Lastly and to the contrary of the accepted answer, a more recent answer stated that tuple comprehensions are indeed a thing (since Python 3.5) using the following syntax:

*(thing for thing in things),
  • To me, it seems like the second example is also one where a generator object is created first. Is this correct?

  • Is there any difference between these expressions in terms of what goes on behind the scenes? In terms of performance? I assume the first and third could have latency issues while the second could have memory issues (as is discussed in the linked comments).

  • Comparing the first one and the last, which one is more pythonic?

Update:

As expected, the list comprehension is indeed much faster. I don't understand why the first one is faster than the third one however. Any thoughts?

>>> from timeit import timeit

>>> a = 'tuple(i for i in range(10000))'
>>> b = 'tuple([i for i in range(10000)])'
>>> c = '*(i for i in range(10000)),'

>>> print('A:', timeit(a, number=1000000))
>>> print('B:', timeit(b, number=1000000))
>>> print('C:', timeit(c, number=1000000))

A: 438.98362647295824
B: 271.7554752581845
C: 455.59842588083677
Abfarad answered 30/11, 2017 at 12:23 Comment(6)
You ask about performance. Test them. Try %timeit in ipython. Find out which is better on your specific machine.Bludgeon
The x for y in z in the list comprehension may look like a generator, but it isn't. The inner workings are different. E.g. a StopIteration raised in the x part will stop a generator but will bubble out of the list comprehension.Nesta
I'd say neither is very pythonic, because tuples are generally used to represent a statically known, possibly heterogeneous set of items (which you can e.g. destructure over), with some semantic meaning associated with each position. Lists are more suited to indeterminate, homogeneous multitudes where operations like iterating makes sense. That's just my opinion though.Verecund
While the last can technically be used, it is the slowest out of the options and having to paste in a stray comma just to have the interpreter be able to understand it needs to unpack a tuple is in my humble view not very "pythonic".Helse
Done! I updated the question @JohnZwinck . Also @schwobaseggl, I'm not sure if I understand, I've used x for x in y and not x for y in z. With regards to the other points raised here, I agree with you all.Abfarad
The questions about performance are addressed in the original, and "pythonicness" is subjective and off-topic; therefore I'm closing this as a duplicate back to the original.Putupon
A
1

To me, it seems like the second example is also one where a generator object is created first. Is this correct?

Yes, you're correct, checkout the CPython bytecode:

>>> import dis
>>> dis.dis("*(thing for thing in thing),")
  1           0 LOAD_CONST               0 (<code object <genexpr> at 0x7f56e9347ed0, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<genexpr>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (thing)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 BUILD_TUPLE_UNPACK       1
             14 POP_TOP
             16 LOAD_CONST               2 (None)
             18 RETURN_VALUE

Is there any difference between these expressions in terms of what goes on behind the scenes? In terms of performance? I assume the first and third could have latency issues while the second could have memory issues (as is discussed in the linked comments).

My timings suggest the first 1 is slightly faster, presumably because the unpacking is more expensive via BUILD_TUPLE_UNPACK than the tuple() call:

>>> from timeit import timeit
>>> def f1(): tuple(thing for thing in range(100000))
... 
>>> def f2(): *(thing for thing in range(100000)),
... 
>>> timeit(lambda: f1(), number=100)
0.5535585517063737
>>> timeit(lambda: f2(), number=100)
0.6043887557461858

Comparing the first one and the last, which one is more pythonic?

The first one seems far more readable to me, and also will work across different Python versions.

Abnegate answered 30/11, 2017 at 12:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.