how itertools.tee works, can type 'itertools.tee' be duplicated in order to save it's "status"?
Asked Answered
M

1

9

Below are some tests about itertools.tee:

    li = [x for x in range(10)]
    ite = iter(li)
==================================================
    it = itertools.tee(ite, 5)
    >>> type(ite)
    <type 'listiterator'>
    >>> type(it)
    <type 'tuple'>
    >>> type(it[0])
    <type 'itertools.tee'>
    >>> 

    >>> list(ite)
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(it[0])          # here I got nothing after 'list(ite)', why?
    []
    >>> list(it[1])
    []
====================play again===================
    >>> ite = iter(li)
    it = itertools.tee(ite, 5)
    >>> list(it[1])
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(it[2])
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(it[3])
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(it[4])
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(ite)
    []                       # why I got nothing? and why below line still have the data?   
    >>> list(it[0])
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(it[0])
    []
====================play again===================    
    >>> ite = iter(li)
    itt = itertools.tee(it[0], 5)    # tee the iter's tee[0].
    >>> list(itt[0])
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(itt[1])
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(it[0])
    []                               # why this has no data?
    >>> list(it[1])
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(ite)
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  

My question is

  1. How does tee work and why sometimes the original iter 'has data' and other time it doesn't?
  2. Can I keep an iter deep copy as a "status seed" to keep the raw iterator status and tee it to use later?
  3. Can I swap 2 iters or 2 itertools.tee?

Thanks!

Mukul answered 18/10, 2010 at 7:34 Comment(1)
I'm not sure if I understand your question, but I think you should not touch the original iterator after you use itertools.tee to multiply it. You should use get n+1 iterators from tee and use one of those as the original to 'track the status' whatever you mean by thatThetic
C
16

tee takes over the original iterator; once you tee an iterator, discard the original iterator since the tee owns it (unless you really know what you're doing).

You can make a copy of a tee with the copy module:

import copy, itertools
it = [1,2,3,4]
a, b = itertools.tee(it)
c = copy.copy(a)

... or by calling a.__copy__().

Beware that tee works by keeping track of all of the iterated values that have been consumed from the original iterator, which may still be consumed by copies.

For example,

a = [1,2,3,4]
b, c = itertools.tee(a)
next(b)

At this point, the tee object underlying b and c has read one value, 1. It's storing that in memory, since it has to remember it for when c is iterated. It has to keep every value in memory until it's consumed by all copies of the tee.

The consequence of this is that you need to be careful with "saving state" by copying a tee. If you don't actually consume any values from the "saved state" tee, you're going to cause the tee to keep every value returned by the iterator in memory forever (until the copied tee is discarded and collected).

Chalkboard answered 18/10, 2010 at 8:4 Comment(2)
thanks #Glenn, so tee can be think as a data buffer could conduct as iter, right? so it is may not suitable for large dataset,and is there a way can duplicate the "pure" iter for a single sequence? I know deep copy can not work on an iter.Mukul
No, there's no way in general to copy an iterator. Iterators can expose __copy__ as tee instances do, but usually do not.Chalkboard

© 2022 - 2024 — McMap. All rights reserved.