Do Python coders have a bias towards list over tuple? [closed]
Asked Answered
T

2

5

Basic Facts

  • Lists are mutable (supporting inserts, appending etc.), Tuples are not
  • Tuples are more memory efficient, and faster to iterate over

So it would seem their use-cases are clear. Functionally speaking, lists offer a superset of operations, tuples are more performant at what they do.

Observation

Most arrays that my team creates in the course of a program, are in fact, perfectly fine as immutable. We iterate over them, apply map, reduce, filter on them, may be insert into a database from them etc. all without insertion, popping or appending on the array-like-structure.

Question

Yet, a list seems to be not only the default (and only) choice among my developers, but seems even favoured by many library APIs to pass data around (like polars, tensorflow etc. which I use heavily).

And not even like using Tuples require some special skill, knowledge or understanding another-data-structure, it's really the same in terms of necessary syntax to subscript, or iterate.

What am I missing in the reasoning here?

Tana answered 8/7 at 2:16 Comment(3)
I don't think the performance difference matters in python. Well I've never seen it make a difference beyond micro-optimization. Tuples are hashable though.Manche
List comprehensions have a dedicated syntax, for one. (Also, even though there’s nothing at the language semantics level to dictate this – or there wasn’t before typing, anyway – tuples feel heterogeneous and lists homogeneous, which influences writing for readability.)Adolescence
I like your question. I learned from reading the answers. I think rephrased it could be made not "opinion-based". I personally default to list mostly because I usually don't know exactly every way I'll use some sequence later, and may decide to mutate it for some purpose, so I only use tuples when I know I definitely won't need to mutate.Omni
A
5

It's not a matter of bias. By convention, lists are used for homogeneous data, and tuples are used for heterogeneous data, unless a requirement for mutability or hashability forces the opposite.

You can see this convention stated in places like the documentation for built-in types, where lists are described as

mutable sequences, typically used to store collections of homogeneous items

and tuples are described as

immutable sequences, typically used to store collections of heterogeneous data

So for example, if seq[2] represents "the third thing" and seq[3] represents "the fourth thing", you use a list, while if seq[2] represents "income" and seq[3] represents "birthday", you use a tuple.

Anorak answered 8/7 at 2:25 Comment(1)
Thanks, this convention was something I was not aware of, as both of them support heterogeneous data. For homogeneous data though, I have long delegated to libraries like polars and tensorflow (where possible, depending on the types available), which even store the elements in contiguous memory locations (making them more performant than python native data types).Tana
S
5

I've been using Python since the start, and the intended use cases between lists and tuples have always been a bit fuzzy. Year after year, though, I tend to use tuples more.

While I have no compelling way to argue this case, I always thought a lot of it came down to dislike of the syntax in the first simple cases a programmer tried. Parentheses are "overused" in Python's syntax.

x = ()

looks more like a syntax error at first ("which function did they intend to call?"), while

y = 42,

still looks like a syntax error to my eyes ;-)

The corresponding cases for lists are self-evident at first sight:

x = []
y = [42]

"Readability counts", and first impressions are hard to shake off.

EDIT: BTW, there's another underappreciated reason to use tuples for "very large" sequences, when possible: when CPython's cyclic gc runs and determines that a tuple, and all its components, are immutable "all the way down", the entire tuple is exempted from being scanned in future runs of cyclic gc (it's been proved that it can never become part of a cycle). The same isn't true of lists. Even if a list is immutable "all the way down", there's nothing to stop the programmer from doing, e.g., L[0] = L next, making it part of a cycle.

Exempting large sequences from being scanned by cyclic gc can save lots of cycles in long-running programs. For that reason, e.g., I routinely create tuples with millions of ints rather than use lists.

Example:

>>> import gc
>>> t = tuple(range(100))
>>> gc.is_tracked(t)
True
>>> gc.collect()
0
>>> gc.is_tracked(t) # gc determined `t` can never be in a cycle
False
Stomatitis answered 8/7 at 2:50 Comment(3)
The requirement for untracking a tuple isn't quite immutability "all the way down". For example, a tuple of frozensets of ints won't get untracked, even though it's immutable "all the way down". All the tuple's elements have to be untracked tuples, or objects that don't support GC tracking at all.Anorak
On the other hand, a tuple of bytearrays will get untracked, even though it isn't immutable "all the way down", because bytearrays don't support GC tracking.Anorak
The point to giving an example was to show people how they can use gc.is_tracked() to check for themselves. There is, of course, no guarantee that the exact "rules" won't change from one release to the next.Stomatitis

© 2022 - 2024 — McMap. All rights reserved.