What is the difference between type-hinting a variable as an Iterable versus a Sequence?
Asked Answered
C

3

82

I don't understand the difference when hinting Iterable and Sequence.

What is the main difference between those two and when to use which?

I think set is an Iterable but not Sequence, are there any built-in data type that is Sequence but not Iterable?

def foo(baz: Sequence[float]):
  ...

# What is the difference?
def bar(baz: Iterable[float]):
  ...
Counterproposal answered 8/5, 2022 at 0:36 Comment(1)
Related, with a meaningful answer: quora.com/…Fortenberry
V
79

The Sequence and Iterable abstract base classes (can also be used as type annotations) mostly* follow Python's definition of sequence and iterable. To be specific:

  • Iterable is any object that defines __iter__ or __getitem__.
  • Sequence is any object that defines __getitem__ and __len__. By definition, any sequence is an iterable. The Sequence class also defines other methods such as __contains__, __reversed__ that calls the two required methods.

Some examples:

  • list, tuple, str are the most common sequences.
  • Some built-in iterables are not sequences. For example, reversed returns a reversed object (or list_reverseiterator for lists) that cannot be subscripted.

* Iterable does not exactly conform to Python's definition of iterables — it only checks if the object defines __iter__, and does not work for objects that's only iterable via __getitem__ (see this table for details). The gold standard of checking if an object is iterable is using the iter builtin.

Voluntaryism answered 8/5, 2022 at 4:34 Comment(10)
"Some built-in iterators are not sequences" - I think all built-in iterators are not sequences. Or do you know one that is?Cayman
@joel Yes, that's possible, although they did use an iterator as an example. If they meant to talk about iterables, I'd say dictionaries or sets would be more prominent examples. (Now that I'm re-reading their answer, I see their paraphrased definition of "sequence" is actually wrong. Maybe that's why they didn't use dictionaries as an example.)Cayman
@KellyBundy An example would be range; isinstance(range(5), collections.abc.Sequence) returns True. set is not a sequence because it doesn't define __getitem__. dict is an interesting example because it does define both __getitem__ and __len__, but the docs explicitly mentioned that it's not a sequence, because its __getitem__ takes arbitrary types instead of just int.Voluntaryism
range(5) isn't an iterator. iter(range(5)) is (it's a range_iterator), and isinstance(iter(range(5)), collections.abc.Sequence) as expected returns False.Cayman
Ah, I see, thanks for correcting me. I should change my answer to say iterable instead of iterator.Voluntaryism
Iterable is any object that defines __iter__ only (object of class with only __getitem__ method is not Iterable instance). Details: docs.python.orgSnooze
@Snooze The link you shared is the docs for collections.abc.Iterable, which is not the official definition for what an iterable is. In fact, that paragraph explicitly mentions that "it does not detect classes that iterate with the __getitem__() method."Voluntaryism
@ZecongHu As I can see, the question was about Iterable (typing.Iterable or collections.abc.Iterable), not about iterating possibility. And Iterable is not defines __getitem__. Perhaps, the answer needs to be rephrased a bit - for example, mentioning that iter() builtin is the only way to detect iterating possibility through __iter__ or __getitem__.Snooze
And about difference of Iterable and Sequence, i would send to this tableSnooze
@Snooze Good point. I made an edit to the answer explaining the differences b/w collections.abc.Iterable and iterable as defined in the glossary, and added a reference to that table. Thanks for point this out!Voluntaryism
D
21

When writing a function/method with an items argument, I often prefer Iterable to Sequence. Hereafter is why and I hope it will help understanding the difference.

Say my_func_1 is:

from typing import Iterable
def my_func_1(items: Iterable[int]) -> None:
    for item in items:
        ...
        if condition:
            break
    return

Iterable offers the maximum possibilities to the caller. Correct calls include:

my_func_1((1, 2, 3)) # tuple is Sequence, Collection, Iterator
my_func_1([1, 2, 3]) # list is MutableSequence, Sequence, Collection, Iterator
my_func_1({1, 2, 3}) # set is Collection, Iterator
my_func_1(my_dict) # dict is Mapping, Collection, Iterator
my_func_1(my_dict.keys()) # dict.keys() is MappingKeys, Set, Collection, Iterator
my_func_1(range(10)) # range is Sequence, Collection, Iterator
my_func_1(x**2 for x in range(100)) # "strict' Iterator, i.e. neither a Collection nor a Sequence
... 

... because all areIterable.

The implicit message to a function caller is: transfer data "as-is", just don't transform it.

In case the caller doesn't have data as a Sequence (e.g. tuple, list) or as a non-Sequence Collection (e.g. set), and because the iteration breaks before StopIteration, it is also more performing if he provides an 'strict' Iterator.

However if the function algorithm (say my_func_2) requires more than one iteration, then Iterable will fail if the caller provides a 'strict' Iterator because the first iteration exhausts it. Hence use a Collection:

from typing import Collection
def my_func_2(items: Collection[int]) -> None:
    for item in items:
        ...
    for item in items:
        ...
    return

If the function algorithm (my_func_3) has to access by index to specific items, then both Iterable and Collection will fail if the caller provides a set, a Mapping or a 'strict' Iterator. Hence use a Sequence:

from typing import Sequence
def my_func_3(items: Sequence[int]) -> None:
    return items[5]

Conclusion: The strategy is: "use the most generic type that the function can handle". Don't forget that all this is only about typing, to help a static type checker to report incorrect calls (e.g. using a set when a Sequence is required). Then it's the caller responsibility to transform data when necessary, such as:

my_func_3(tuple(x**2 for x in range(100)))

Actually, all this is really about performance when scaling the length of items. Always prefer Iterator when possible. Performance shall be handle as a daily task, not as a firemen task force.

In that direction, you will probably face the situation when a function only handles the empty use case and delegates the others, and you don't want to transform items into a Collection or a Sequence. Then do something like this:

from more_itertools import spy
def my_func_4(items: Iterable[int]) -> None:
    (first, items) = spy(items)
    if not first: # i.e. items is empty
        ...
    else:
        my_func_1(items) # Here 'items' is always a 'strict' Iterator
    return
Dnepropetrovsk answered 17/11, 2022 at 15:9 Comment(3)
Great answer, This should be accepted. it explains the theory and the applicationMaihem
The question had two parts, what are they and when to use them. The accepted answer only answers the first part. This answers the second and by doing so also clarifies the first part.Krimmer
Would be an improvement to update the examples with the PEP-585 collections equivalents of the old typing aliases: https://docs.python.org/3/library/typing.html#typing.IterableKrimmer
Q
3
  • Iterable is anything that, roughly said, can be iterated over (looped over using a for loop).
  • Sequence is an iterable with some additional properties: You can get its size len(sequence), you can access its elements by their position sequence[n] etc.

When adding type hints to your function, you should follow two basic rules:

  1. Types of input arguments should be as general as possible.
  2. Output type should be as specific as possible.

Look at these examples:

from typing import Iterable, List

def foo(numbers: Iterable[int]) -> List[int]:
    digits = list()

    for number in numbers:
        if 0 <= number <= 9:
            digits.append(number)

    return digits

The function foo searches for single-digit numbers. If you used the more strict type hint Sequence[int] for the argument numbers, it would indicate to the user of the function that the non-sequence iterables such as set are not allowed. But they work perfectly fine! You want your function to support as wide range of inputs as possible (to make most users happy) so it makes no sense to add such restriction.

The output type List[int], on the other hand, is very specific so users of the function can have detailed information about the output they get.

from typing import Sequence

def bar(numbers: Sequence[int]) -> bool:
    sorted = True

    for i in range(len(numbers) - 1):
        if numbers[i] > numbers[i+1]:
            sorted = False

    return sorted

The function bar checks if numbers are in ascending order. Here the more strict type hint Sequence[int] is required since the code uses some features of sequences that are not supported in all iterables. If you used the type hint Iterable[int] instead, the user then might have been tempted to input set as an argument and got surprised by an exception TypeError: 'set' object does not support indexing.

Quinze answered 9/8, 2023 at 13:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.