Why is StringIO object slower than real file object?

Asked 30/8, 2014 at 9:28 Answered 3/2, 2016 at 22:14

I'm looking through the source of StringIO where it says says some notes:

Using a real file is often faster (but less convenient).
There's also a much faster implementation in C, called cStringIO, but it's not subclassable.

StringIO just like a memory file object, why is it slower than real file object?

Herrmann answered 30/8, 2014 at 9:28 Comment(0)

Python's file handling is implemented entirely in C. This means that it's quite fast (at least in the same order of magnitude as native C code).

The StringIO library, however, is written in Python. The module itself is thus interpreted, with the associated performance penalties.

As you know, there is another module, cStringIO, with a similar interface, which you can use in performance-sensitive code. The reason this isn't subclassable is because it's written in C.

Formally answered 30/8, 2014 at 9:51 Comment(1)

@Formally - is there something similar for BytesIO ? – Tickler 20/7, 2020 at 3:53

This is not actually about Python's interpreted nature: BytesIO is implemented in Python*, same as StringIO, but still beats file I/O.

In fact, StringIO is faster than file I/O under StringIO's ideal use case (a single write to the beginning of an empty buffer). Actually, if the write is big enough it'll even beat cStringIO. See my question here.

So why is StringIO considered "slow"? StringIO's real problem is being backed by immutable sequences, whether str or unicode. This is fine if you only write once, obviously. But, as pointed out by tdelaney's answer to my question, it slows down a ton (like, 10-100x) when writing to random locations, since every time it gets a write in the middle it has to copy the entire backing sequence.

BytesIO doesn't have this problem since it's backed by a (mutable) bytearray instead. Likewise, whatever cStringIO does, it seems to handle random writes much more easily. I'd guess that it breaks the immutability rule internally, since C strings are mutable.

* Well, the version in _pyio is, anyway. The standard library version in io is written in C.

Golda answered 3/2, 2016 at 22:14 Comment(0)

Python's file handling is implemented entirely in C. This means that it's quite fast (at least in the same order of magnitude as native C code).

The StringIO library, however, is written in Python. The module itself is thus interpreted, with the associated performance penalties.

As you know, there is another module, cStringIO, with a similar interface, which you can use in performance-sensitive code. The reason this isn't subclassable is because it's written in C.

Formally answered 30/8, 2014 at 9:51 Comment(1)

@Formally - is there something similar for BytesIO ? – Tickler 20/7, 2020 at 3:53

It is not neccessarily obvious from the source but python file objects is built straight on the C library functions, with a likely small layer of python to present a python class, or even a C wrapper to present a python class. The native C library is going to be highly optimised to read bytes and blocks from disk. The python StringIO library is all native python code - which is slower than native C code.

Dahliadahlstrom answered 30/8, 2014 at 9:35 Comment(2)

ok - re-reading the question it is contradictory. A small edit is required - done :-) – Dahliadahlstrom 30/8, 2014 at 9:43

sorry for my poor english, my meaning is look through, not look for. – Herrmann 30/8, 2014 at 9:53

Recommended topics

Hot tags