How to convert an iterable to a stream?

Asked 11/7, 2011 at 23:18 Answered 14/8, 2020 at 18:58

If I've got an iterable containing strings, is there a simple way to turn it into a stream? I want to do something like this:

def make_file():
    yield "hello\n"
    yield "world\n"

output = tarfile.TarFile(…)
stream = iterable_to_stream(make_file())
output.addfile(…, stream)

Jobey answered 11/7, 2011 at 23:18 Comment(4)

I don't know streams well, but do you want stream = io.StringIO("".join(make_file())) ? – Abhorrence 11/7, 2011 at 23:25

Nope — I don't want that. make_file() may return a large file, and I'd rather not load it into memory. – Jobey 11/7, 2011 at 23:41

interesting link: hg.python.org/cpython/file/ab162f925761/Lib/tarfile.py#l249 – Altman 12/7, 2011 at 4:2

@TokenMacGuy: Sorry, I don't think I see the significance of that link… – Jobey 12/7, 2011 at 4:4

Here's my streaming iterator an experimental branch of urllib3 supporting streaming chunked request via iterables:

class IterStreamer(object):
    """
    File-like streaming iterator.
    """
    def __init__(self, generator):
        self.generator = generator
        self.iterator = iter(generator)
        self.leftover = ''

    def __len__(self):
        return self.generator.__len__()

    def __iter__(self):
        return self.iterator

    def next(self):
        return self.iterator.next()

    def read(self, size):
        data = self.leftover
        count = len(self.leftover)

        if count < size:
            try:
                while count < size:
                    chunk = self.next()
                    data += chunk
                    count += len(chunk)
            except StopIteration:
                pass

        self.leftover = data[size:]

        return data[:size]

Source with context: https://github.com/shazow/urllib3/blob/filepost-stream/urllib3/filepost.py#L23

Alas this code hasn't made it into the stable branch yet as sizeless chunked requests are poorly supported, but it should be a good foundation for what you're trying to do. See the source link for examples showing how it can be used.

Bilbrey answered 12/7, 2011 at 2:40 Comment(10)

This has a bug where it will continue to emit the last leftover bit of data forever. – Subchaser 2/5, 2012 at 20:33

Swap out the pass for return data and the bug is gone. – Latimer 24/9, 2012 at 9:33

No. Swap out the pass for self.leftover = ''; return data and the bug is gone. – Banyan 2/5, 2013 at 14:33

Fixed the bug you guys mentioned. Sorry for the lack of response, didn't notice Stackoverflow's notifications for a long time. :) – Bilbrey 2/5, 2013 at 17:54

Hyperlinks are broken. – Moisture 28/11, 2013 at 6:37

@Mechanicalsnail Thanks for letting me know. Alas I no longer have a mirror of that branch. :( Updated the answer accordingly. – Bilbrey 29/11, 2013 at 21:59

@Mechanicalsnail Good news. Found a backup. Updated the links. :) – Bilbrey 29/11, 2013 at 22:16

Your __iter__ implementation is incorrect as it should iterate over the lines, not the chunks of the iterator. – Chantellechanter 10/3, 2014 at 14:47

read still has a bug with stale leftover, fixed by the changes in this diff github.com/jennyyuejin/Kaggle/commit/… – Lotty 4/1, 2016 at 18:59

Thank you James -- your version seems to be the only one that completely fixes the stale leftover. Funny how difficult this has proven to be. – Melliemelliferous 22/5, 2020 at 17:3

Python 3 has a new I/O stream API (library docs), replacing the old file-like object protocol. (The new API is also available in Python 2 in the io module, and it's backwards-compatible with the file-like object protocol.)

Here's an implementation for the new API, in Python 2 and 3:

import io

def iterable_to_stream(iterable, buffer_size=io.DEFAULT_BUFFER_SIZE):
    """
    Lets you use an iterable (e.g. a generator) that yields bytestrings as a read-only
    input stream.

    The stream implements Python 3's newer I/O API (available in Python 2's io module).
    For efficiency, the stream is buffered.
    """
    class IterStream(io.RawIOBase):
        def __init__(self):
            self.leftover = None
        def readable(self):
            return True
        def readinto(self, b):
            try:
                l = len(b)  # We're supposed to return at most this much
                chunk = self.leftover or next(iterable)
                output, self.leftover = chunk[:l], chunk[l:]
                b[:len(output)] = output
                return len(output)
            except StopIteration:
                return 0    # indicate EOF
    return io.BufferedReader(IterStream(), buffer_size=buffer_size)

Example usage:

with iterable_to_stream(str(x**2).encode('utf8') for x in range(11)) as s:
    print(s.read())

Moisture answered 28/11, 2013 at 7:22 Comment(1)

In 2020 and with Python 3.8, is still the best way to do it? Tried it and it still works, but maybe it can be simplified? – Supernatural 20/2, 2020 at 8:14

Since it doesn't look like there is a "standard" way of doing it, I've banged together a simple implementation:

class iter_to_stream(object):
    def __init__(self, iterable):
        self.buffered = ""
        self.iter = iter(iterable)

    def read(self, size):
        result = ""
        while size > 0:
            data = self.buffered or next(self.iter, None)
            self.buffered = ""
            if data is None:
                break
            size -= len(data)
            if size < 0:
                data, self.buffered = data[:size], data[size:]
            result += data
        return result

Jobey answered 12/7, 2011 at 0:1 Comment(0)

Here's my streaming iterator an experimental branch of urllib3 supporting streaming chunked request via iterables:

class IterStreamer(object):
    """
    File-like streaming iterator.
    """
    def __init__(self, generator):
        self.generator = generator
        self.iterator = iter(generator)
        self.leftover = ''

    def __len__(self):
        return self.generator.__len__()

    def __iter__(self):
        return self.iterator

    def next(self):
        return self.iterator.next()

    def read(self, size):
        data = self.leftover
        count = len(self.leftover)

        if count < size:
            try:
                while count < size:
                    chunk = self.next()
                    data += chunk
                    count += len(chunk)
            except StopIteration:
                pass

        self.leftover = data[size:]

        return data[:size]

Source with context: https://github.com/shazow/urllib3/blob/filepost-stream/urllib3/filepost.py#L23

Bilbrey answered 12/7, 2011 at 2:40 Comment(10)

This has a bug where it will continue to emit the last leftover bit of data forever. – Subchaser 2/5, 2012 at 20:33

Swap out the pass for return data and the bug is gone. – Latimer 24/9, 2012 at 9:33

No. Swap out the pass for self.leftover = ''; return data and the bug is gone. – Banyan 2/5, 2013 at 14:33

Fixed the bug you guys mentioned. Sorry for the lack of response, didn't notice Stackoverflow's notifications for a long time. :) – Bilbrey 2/5, 2013 at 17:54

Hyperlinks are broken. – Moisture 28/11, 2013 at 6:37

@Mechanicalsnail Thanks for letting me know. Alas I no longer have a mirror of that branch. :( Updated the answer accordingly. – Bilbrey 29/11, 2013 at 21:59

@Mechanicalsnail Good news. Found a backup. Updated the links. :) – Bilbrey 29/11, 2013 at 22:16

Your __iter__ implementation is incorrect as it should iterate over the lines, not the chunks of the iterator. – Chantellechanter 10/3, 2014 at 14:47

read still has a bug with stale leftover, fixed by the changes in this diff github.com/jennyyuejin/Kaggle/commit/… – Lotty 4/1, 2016 at 18:59

Thank you James -- your version seems to be the only one that completely fixes the stale leftover. Funny how difficult this has proven to be. – Melliemelliferous 22/5, 2020 at 17:3

A starting point:

class iterable_to_stream:
    def __init__(self, iterable):
        self.iter = iter(iterable)

    def read(self):
        try:
            return self.iter.next()
        except StopIteration:
            return ""

Northman answered 11/7, 2011 at 23:23 Comment(3)

Hhmm… While that would most certainly explode on its own (what if next(iter) returns ""? What if someone has the audacity to pass a size into read(…))… I guess I could use a BufferedReader to take care of those details… – Jobey 11/7, 2011 at 23:40

Sorry dude, this appears to be unworkable. BufferedReader needs an instance of RawIOBase, and this doesn't come anywhere near to implementing that interface… And it doesn't even implement the basic stream API (eg, read() doesn't accept a size). – Jobey 11/7, 2011 at 23:57

@David Wolever: Seems like coding a RawIOBase-like wrapper for your iterable and passing that to BufferReader would be feasible. RawIOBase objects only have 4 methods and you might be able to get away with only implementing the 3 read...() ones. – Henghold 12/7, 2011 at 0:33

A bit modified version of a great Mechanical snail's answer. Here, readinto(b) implementation makes multiple calls to the the underlying iterator, in order to gather as much as possible amount of bytes for the size of the given writable bytes-like object b.

class IteratorReader(io.RawIOBase):

    def __init__(self, iterator):
        self.iterator = iterator
        self.leftover = []

    def readinto(self, buffer: bytearray) -> Optional[int]:
        size = len(buffer)
        while len(self.leftover) < size:
            try:
                self.leftover.extend(next(self.iterator))
            except StopIteration:
                break

        if len(self.leftover) == 0:
            return 0

        output, self.leftover = self.leftover[:size], self.leftover[size:]
        buffer[:len(output)] = output
        return len(output)

    def readable(self) -> bool:
        return True

and usage:

def iterator1():
    for i in ('a', 'b', 'c', 'd', 'e', 'f', 'g'):
        res = i * 3
        yield res.encode("utf8")


iterreader = IteratorReader(iterator1())
while True:
    r = iterreader.read(4)
    if not r:
        break
    print(r)

Actress answered 14/8, 2020 at 18:58 Comment(0)

TarFile takes anything that provides a file-like interface -- so you could either use StringIO (io.StringIO if you are using Python 3.X) to yield what you need to TarFile.addfile() or you could create your own class that provides a file-like interface and yields what you need.

Ozellaozen answered 11/7, 2011 at 23:27 Comment(2)

Right — but is there any way to stream an iterator through a StringIO? I'd rather not load the entire input file into memory before writing it to the StringIO. – Jobey 11/7, 2011 at 23:38

@David -- not that I know of. I'd give you an example of wrapping a class around StringIO, but it looks like you've got what you need already :-) – Ozellaozen 12/7, 2011 at 0:18

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags