What is the cost of len()
function for Python built-ins? (list/tuple/string/dictionary)
It's O(1) (constant time, not depending of actual length of the element - very fast) on every type you've mentioned, plus set
and others such as array.array
.
O(1)?
–
Enkindle Calling len()
on those data types is O(1) in CPython, the official and most common implementation of the Python language. Here's a link to a table that provides the algorithmic complexity of many different functions in CPython:
All those objects keep track of their own length. The time to extract the length is small (O(1) in big-O notation) and mostly consists of [rough description, written in Python terms, not C terms]: look up "len" in a dictionary and dispatch it to the built_in len function which will look up the object's __len__
method and call that ... all it has to do is return self.length
length
show up in dictionary by dir(list)
? –
Vaccinate list.lenght
variable is implemented in C, not Python. –
Proterozoic len(x)
doesn't need to perform a dict
lookup on CPython; the length getting function is stored in a dedicated slot in the PyTypeObject
struct, that can be looked up at a constant offset, so finding that function involves: 1) Extracting the type from the object (a lookup at offset in C), 2) Looking up tp_len
in the type (another lookup at offset in C), 3) Calling said function (C function call through function pointer), which 4) Looks up the actual length from the original instance (a final lookup at offset in C), then 5) Converts to Python level int
. –
Ronni dict
involved (aside from the global ones to find the built-in len
itself, and modern CPython uses caches so it doesn't even do that most of the time); all the offsets are baked into the compiled code, so no strings are involved, let alone hashing and bucket lookups. A dict
lookup is involved if you get the length by calling x.__len__()
, which is why the __len__
method is slower than the len
function. –
Ronni The below measurements provide evidence that len()
is O(1) for oft-used data structures.
A note regarding timeit
: When the -s
flag is used and two strings are passed to timeit
the first string is executed only once and is not timed.
List:
$ python -m timeit -s "l = range(10);" "len(l)"
10000000 loops, best of 3: 0.0677 usec per loop
$ python -m timeit -s "l = range(1000000);" "len(l)"
10000000 loops, best of 3: 0.0688 usec per loop
Tuple:
$ python -m timeit -s "t = (1,)*10;" "len(t)"
10000000 loops, best of 3: 0.0712 usec per loop
$ python -m timeit -s "t = (1,)*1000000;" "len(t)"
10000000 loops, best of 3: 0.0699 usec per loop
String:
$ python -m timeit -s "s = '1'*10;" "len(s)"
10000000 loops, best of 3: 0.0713 usec per loop
$ python -m timeit -s "s = '1'*1000000;" "len(s)"
10000000 loops, best of 3: 0.0686 usec per loop
Dictionary (dictionary-comprehension available in 2.7+):
$ python -mtimeit -s"d = {i:j for i,j in enumerate(range(10))};" "len(d)"
10000000 loops, best of 3: 0.0711 usec per loop
$ python -mtimeit -s"d = {i:j for i,j in enumerate(range(1000000))};" "len(d)"
10000000 loops, best of 3: 0.0727 usec per loop
Array:
$ python -mtimeit -s"import array;a=array.array('i',range(10));" "len(a)"
10000000 loops, best of 3: 0.0682 usec per loop
$ python -mtimeit -s"import array;a=array.array('i',range(1000000));" "len(a)"
10000000 loops, best of 3: 0.0753 usec per loop
Set (set-comprehension available in 2.7+):
$ python -mtimeit -s"s = {i for i in range(10)};" "len(s)"
10000000 loops, best of 3: 0.0754 usec per loop
$ python -mtimeit -s"s = {i for i in range(1000000)};" "len(s)"
10000000 loops, best of 3: 0.0713 usec per loop
Deque:
$ python -mtimeit -s"from collections import deque;d=deque(range(10));" "len(d)"
100000000 loops, best of 3: 0.0163 usec per loop
$ python -mtimeit -s"from collections import deque;d=deque(range(1000000));" "len(d)"
100000000 loops, best of 3: 0.0163 usec per loop
len()
, and also fixed the measurements to properly use the -s
flag. –
Kuhlman python -m timeit -s "l = range(10000);" "len(l); len(l); len(l)"
223 nsec per loop python -m timeit -s "l = range(100);" "len(l)"
66.2 nsec per loop –
Mattias len is an O(1) because in your RAM, lists are stored as tables (series of contiguous addresses). To know when the table stops the computer needs two things : length and start point. That is why len() is a O(1), the computer stores the value, so it just needs to look it up.
It is O(1)
in CPython because length is derived from the size attribute on the Pyobject representing the list. See [1], [2] and [3] in that order:
[1]:
static PyObject *
listiter_len(_PyListIterObject *it, PyObject *Py_UNUSED(ignored))
{
Py_ssize_t len;
if (it->it_seq) {
len = PyList_GET_SIZE(it->it_seq) - it->it_index;
if (len >= 0)
return PyLong_FromSsize_t(len);
}
return PyLong_FromLong(0);
}
[2]:
static inline Py_ssize_t PyList_GET_SIZE(PyObject *op) {
PyListObject *list = _PyList_CAST(op);
return Py_SIZE(list);
}
[3]
static inline Py_ssize_t Py_SIZE(PyObject *ob) {
assert(ob->ob_type != &PyLong_Type);
assert(ob->ob_type != &PyBool_Type);
PyVarObject *var_ob = _PyVarObject_CAST(ob);
return var_ob->ob_size;
}
[1] listiter_len
[2] PyList_GET_SIZE
[3] Py_SIZE
© 2022 - 2024 — McMap. All rights reserved.