When writing a function/method with an items
argument, I often prefer Iterable
to Sequence
.
Hereafter is why and I hope it will help understanding the difference.
Say my_func_1
is:
from typing import Iterable
def my_func_1(items: Iterable[int]) -> None:
for item in items:
...
if condition:
break
return
Iterable
offers the maximum possibilities to the caller. Correct calls include:
my_func_1((1, 2, 3)) # tuple is Sequence, Collection, Iterator
my_func_1([1, 2, 3]) # list is MutableSequence, Sequence, Collection, Iterator
my_func_1({1, 2, 3}) # set is Collection, Iterator
my_func_1(my_dict) # dict is Mapping, Collection, Iterator
my_func_1(my_dict.keys()) # dict.keys() is MappingKeys, Set, Collection, Iterator
my_func_1(range(10)) # range is Sequence, Collection, Iterator
my_func_1(x**2 for x in range(100)) # "strict' Iterator, i.e. neither a Collection nor a Sequence
...
... because all areIterable
.
The implicit message to a function caller is: transfer data "as-is", just don't transform it.
In case the caller doesn't have data as a Sequence
(e.g. tuple
, list
) or as a non-Sequence
Collection
(e.g. set
), and because the iteration breaks before StopIteration
, it is also more performing if he provides an 'strict' Iterator
.
However if the function algorithm (say my_func_2
) requires more than one iteration, then Iterable
will fail if the caller provides a 'strict' Iterator
because the first iteration exhausts it. Hence use a Collection
:
from typing import Collection
def my_func_2(items: Collection[int]) -> None:
for item in items:
...
for item in items:
...
return
If the function algorithm (my_func_3)
has to access by index to specific items, then both Iterable
and Collection
will fail if the caller provides a set, a Mapping
or a 'strict' Iterator
.
Hence use a Sequence
:
from typing import Sequence
def my_func_3(items: Sequence[int]) -> None:
return items[5]
Conclusion: The strategy is: "use the most generic type that the function can handle". Don't forget that all this is only about typing, to help a static type checker to report incorrect calls (e.g. using a set
when a Sequence
is required). Then it's the caller responsibility to transform data when necessary, such as:
my_func_3(tuple(x**2 for x in range(100)))
Actually, all this is really about performance when scaling the length of items.
Always prefer Iterator
when possible. Performance shall be handle as a daily task, not as a firemen task force.
In that direction, you will probably face the situation when a function only handles the empty use case and delegates the others, and you don't want to transform items into a Collection
or a Sequence
. Then do something like this:
from more_itertools import spy
def my_func_4(items: Iterable[int]) -> None:
(first, items) = spy(items)
if not first: # i.e. items is empty
...
else:
my_func_1(items) # Here 'items' is always a 'strict' Iterator
return