This is not actually about Python's interpreted nature: BytesIO
is implemented in Python*, same as StringIO
, but still beats file I/O.
In fact, StringIO
is faster than file I/O under StringIO
's ideal use case (a single write to the beginning of an empty buffer). Actually, if the write is big enough it'll even beat cStringIO
. See my question here.
So why is StringIO
considered "slow"? StringIO
's real problem is being backed by immutable sequences, whether str
or unicode
. This is fine if you only write once, obviously. But, as pointed out by tdelaney's answer to my question, it slows down a ton (like, 10-100x) when writing to random locations, since every time it gets a write in the middle it has to copy the entire backing sequence.
BytesIO
doesn't have this problem since it's backed by a (mutable) bytearray
instead. Likewise, whatever cStringIO
does, it seems to handle random writes much more easily. I'd guess that it breaks the immutability rule internally, since C strings are mutable.
* Well, the version in _pyio
is, anyway. The standard library version in io
is written in C.