tee() function from itertools library
Asked Answered
S

1

7

Here is an simple example that gets min, max, and avg values from a list. The two functions below have same result. I want to know the difference between these two functions. And why use itertools.tee()? What advantage does it provide?

from statistics import median
from itertools import tee

purchases = [1, 2, 3, 4, 5]

def process_purchases(purchases):
    min_, max_, avg = tee(purchases, 3)
    return min(min_), max(max_), median(avg)

def _process_purchases(purchases):
    return min(purchases), max(purchases), median(purchases)

def main():
    stats = process_purchases(purchases=purchases)
    print("Result:", stats)
    stats = _process_purchases(purchases=purchases)
    print("Result:", stats)

if __name__ == '__main__':
    main()
Sasha answered 17/5, 2020 at 16:10 Comment(5)
tee() is used in case purchases is an iterable which can be exhausted.Smutty
Can you turn this into a running example? A couple of imports at the top, calling the functions at the bottom.Aviator
Try _process_purchases(int(var) for var in ["1", "2", "3"])Aviator
Pretty much what quamrana said. The first one is more versatile and can be used for both containers (like a list) and generator expressions, map objects, filter objects, zip objects etc where the second cannot.Critic
Using tee in this case has not much interest though: as min will have to exhaust the iterator, all the values that it produced will have to be kept in memory for max and median. One could as well have created a list and run min and max on it. I just timed it, it's even a little bit faster.Hassiehassin
D
8

Iterators can only be iterated once in python. After that they are "exhausted" and don't return more values.

You can see this in functions like map(), zip(), filter() and many others:

purchases = [1, 2, 3, 4, 5]

double = map(lambda n: n*2, purchases)

print(list(double))
# [2, 4, 6, 8, 10]

print(list(double))
# [] <-- can't use it twice

You can see the difference between your two functions if you pass them an iterator, such as the return value from map(). In this case _process_purchases() fails because min() exhausts the iterator and leaves no values for max() and median().

tee() takes an iterator and gives you two or more, allowing you to use the iterator passed into the function more than once:

from itertools import tee
from statistics import median

purchases = [1, 2, 3, 4, 5]

def process_purchases(purchases):
    min_, max_, avg = tee(purchases, 3)
    return min(min_), max(max_), median(avg)


def _process_purchases(purchases):
    return min(purchases), max(purchases), median(purchases)

double = map(lambda n: n*2, purchases)
_process_purchases(double)
# ValueError: max() arg is an empty sequence

double = map(lambda n: n*2, purchases)
process_purchases(double)
# (2, 10, 6)
Delmadelmar answered 17/5, 2020 at 16:21 Comment(3)
Or just create a generator double = (n*2 for n in purchases) for those trying to keep map at bay.Aviator
Isn't it the same than...? min_, max_, avg = iter(purchases), iter(purchases), iter(purchases)Tayler
@Tayler in this simple case there is no practical difference. However, tee is useful in situations when it's not so simple (like for example pulling items off a stream). Consider the iterator it = lambda: (random.randint(0, 10) for _ in range(10)) Doing list(it), list(it) will give you two different lists, while use a, b = tee(it) will give you two iterators over the same values.Delmadelmar

© 2022 - 2024 — McMap. All rights reserved.