How do you use StringIO in Python3?
Asked Answered
B

9

645

I am using Python 3.2.1 and I can't import the StringIO module. I use io.StringIO and it works, but I can't use it with numpy's genfromtxt() like this:

x="1 3\n 4.5 8"        
numpy.genfromtxt(io.StringIO(x))

I get the following error:

TypeError: Can't convert 'bytes' object to str implicitly  

and when I write import StringIO it says

ImportError: No module named 'StringIO'
Bosk answered 11/8, 2012 at 11:53 Comment(0)
S
1075

when i write import StringIO it says there is no such module.

From What’s New In Python 3.0:

The StringIO and cStringIO modules are gone. Instead, import the io module and use io.StringIO or io.BytesIO for text and data respectively.

.


A possibly useful method of fixing some Python 2 code to also work in Python 3 (caveat emptor):

try:
    from StringIO import StringIO ## for Python 2
except ImportError:
    from io import StringIO ## for Python 3

Note: This example may be tangential to the main issue of the question and is included only as something to consider when generically addressing the missing StringIO module. For a more direct solution the message TypeError: Can't convert 'bytes' object to str implicitly, see this answer.

Stockdale answered 17/8, 2013 at 3:43 Comment(16)
Worth mentioning these are not the same, so you can end up with TypeErrors ( string argument expected, got 'bytes') if you make this change in isolation. You need to carefully distinguish btyes and str (unicode) in python 3.Mane
For newbs like me: from io import StringIO means you call it as StringIO(), not io.StringIO().Racer
How to actually be compatible with Python 2 and 3: just from io import StringIODita
THIS IS SIMPLY WRONG for numpy.genfromtxt() in python 3. Please refer to the answer from Roman Shapovalov.Ouphe
@realtemper: Do you mean that the first part of the answer is wrong (it's a quote from the docs), or that the example doesn't apply to the question.Stockdale
@nobar: The latter. The original question uses python 3.x, from which the module StringIO is gone and from io import BytesIO should be applied instead. Tested myself on python 3.5 @ eclipse pyDev + win7 x64. Please let me know if I were wrong thanks.Ouphe
Seconding @realtemper. This answered the question I googled, but not the question OP asked. (Presumably that is why it has all the upvotes.)Throe
@josePhoenix: RomanShapovalov had already answered part of the question. I just provided a bit more information (from the docs), which directly addressed a different aspect of the question: "I can't import the StringIO module".Stockdale
I can't vouch for the example because it was added by someone else and I haven't tested it, but it looks like it is a general example of how to write an import that will work with both Python2 and Python3. The example is probably not a direct solution to the numpy aspect of the question.Stockdale
Please delete this answer, it is misleading. Downvoted.Genetics
@OlehPrypin: It didn't turn out to be compatible for me. Python 2 io.StringIO only supported unicode strings, not 8-bit strings.Sporty
@HelgaIliashenko This answer doesn't deserve downvotes. In an ideal world the other answer simply has more upvotes.Sporty
@ImperishableNight in an ideal world, this answer does not exist anymore.Adrenalin
Worked for me like a charm!Guddle
@AndyHayden, so I encountered the issue you mentioned, but how to adderss this?Alathia
Six is a Python 2 and 3 compatibility library.Stockdale
C
174

In my case I have used:

from io import StringIO
Copra answered 17/3, 2016 at 10:15 Comment(0)
C
79

On Python 3 numpy.genfromtxt expects a bytes stream. Use the following:

numpy.genfromtxt(io.BytesIO(x.encode()))
Canicula answered 15/8, 2012 at 13:44 Comment(0)
A
27

Roman Shapovalov's code should work in Python 3.x as well as Python 2.6/2.7. Here it is again with the complete example:

import io
import numpy
x = "1 3\n 4.5 8"
numpy.genfromtxt(io.BytesIO(x.encode()))

Output:

array([[ 1. ,  3. ],
       [ 4.5,  8. ]])

Explanation for Python 3.x:

  • numpy.genfromtxt takes a byte stream (a file-like object interpreted as bytes instead of Unicode).
  • io.BytesIO takes a byte string and returns a byte stream. io.StringIO, on the other hand, would take a Unicode string and and return a Unicode stream.
  • x gets assigned a string literal, which in Python 3.x is a Unicode string.
  • encode() takes the Unicode string x and makes a byte string out of it, thus giving io.BytesIO a valid argument.

The only difference for Python 2.6/2.7 is that x is a byte string (assuming from __future__ import unicode_literals is not used), and then encode() takes the byte string x and still makes the same byte string out of it. So the result is the same.


Since this is one of SO's most popular questions regarding StringIO, here's some more explanation on the import statements and different Python versions.

Here are the classes which take a string and return a stream:

  • io.BytesIO (Python 2.6, 2.7, and 3.x) - Takes a byte string. Returns a byte stream.
  • io.StringIO (Python 2.6, 2.7, and 3.x) - Takes a Unicode string. Returns a Unicode stream.
  • StringIO.StringIO (Python 2.x) - Takes a byte string or Unicode string. If byte string, returns a byte stream. If Unicode string, returns a Unicode stream.
  • cStringIO.StringIO (Python 2.x) - Faster version of StringIO.StringIO, but can't take Unicode strings which contain non-ASCII characters.

Note that StringIO.StringIO is imported as from StringIO import StringIO, then used as StringIO(...). Either that, or you do import StringIO and then use StringIO.StringIO(...). The module name and class name just happen to be the same. It's similar to datetime that way.

What to use, depending on your supported Python versions:

  • If you only support Python 3.x: Just use io.BytesIO or io.StringIO depending on what kind of data you're working with.

  • If you support both Python 2.6/2.7 and 3.x, or are trying to transition your code from 2.6/2.7 to 3.x: The easiest option is still to use io.BytesIO or io.StringIO. Although StringIO.StringIO is flexible and thus seems preferred for 2.6/2.7, that flexibility could mask bugs that will manifest in 3.x. For example, I had some code which used StringIO.StringIO or io.StringIO depending on Python version, but I was actually passing a byte string, so when I got around to testing it in Python 3.x it failed and had to be fixed.

    Another advantage of using io.StringIO is the support for universal newlines. If you pass the keyword argument newline='' into io.StringIO, it will be able to split lines on any of \n, \r\n, or \r. I found that StringIO.StringIO would trip up on \r in particular.

    Note that if you import BytesIO or StringIO from six, you get StringIO.StringIO in Python 2.x and the appropriate class from io in Python 3.x. If you agree with my previous paragraphs' assessment, this is actually one case where you should avoid six and just import from io instead.

  • If you support Python 2.5 or lower and 3.x: You'll need StringIO.StringIO for 2.5 or lower, so you might as well use six. But realize that it's generally very difficult to support both 2.5 and 3.x, so you should consider bumping your lowest supported version to 2.6 if at all possible.

Atlante answered 11/12, 2018 at 22:15 Comment(0)
B
26

Thank you OP for your question, and Roman for your answer. I had to search a bit to find this; I hope the following helps others.

Python 2.7

See: https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

import numpy as np
from StringIO import StringIO

data = "1, abc , 2\n 3, xxx, 4"

print type(data)
"""
<type 'str'>
"""

print '\n', np.genfromtxt(StringIO(data), delimiter=",", dtype="|S3", autostrip=True)
"""
[['1' 'abc' '2']
 ['3' 'xxx' '4']]
"""

print '\n', type(data)
"""
<type 'str'>
"""

print '\n', np.genfromtxt(StringIO(data), delimiter=",", autostrip=True)
"""
[[  1.  nan   2.]
 [  3.  nan   4.]]
"""

Python 3.5:

import numpy as np
from io import StringIO
import io

data = "1, abc , 2\n 3, xxx, 4"
#print(data)
"""
1, abc , 2
 3, xxx, 4
"""

#print(type(data))
"""
<class 'str'>
"""

#np.genfromtxt(StringIO(data), delimiter=",", autostrip=True)
# TypeError: Can't convert 'bytes' object to str implicitly

print('\n')
print(np.genfromtxt(io.BytesIO(data.encode()), delimiter=",", dtype="|S3", autostrip=True))
"""
[[b'1' b'abc' b'2']
 [b'3' b'xxx' b'4']]
"""

print('\n')
print(np.genfromtxt(io.BytesIO(data.encode()), delimiter=",", autostrip=True))
"""
[[  1.  nan   2.]
 [  3.  nan   4.]]
"""

Aside:

dtype="|Sx", where x = any of { 1, 2, 3, ...}:

dtypes. Difference between S1 and S2 in Python

"The |S1 and |S2 strings are data type descriptors; the first means the array holds strings of length 1, the second of length 2. ..."

Borras answered 22/5, 2016 at 23:34 Comment(0)
B
24

You can use the StringIO from the six module:

import six
import numpy

x = "1 3\n 4.5 8"
numpy.genfromtxt(six.StringIO(x))
Barranquilla answered 31/5, 2016 at 16:4 Comment(0)
A
7

In order to make examples from here work with Python 3.5.2, you can rewrite as follows :

import io
data =io.BytesIO(b"1, 2, 3\n4, 5, 6") 
import numpy
numpy.genfromtxt(data, delimiter=",")

The reason for the change may be that the content of a file is in data (bytes) which do not make text until being decoded somehow. genfrombytes may be a better name than genfromtxt.

Aigneis answered 19/12, 2016 at 16:5 Comment(0)
S
2

Here is another example for Python 3. It will use two functions to add two numbers and then use CProfile to save the .prof file. Then it will load the save file using pstats.Stats and ```StringIO`` to convert the data to a string for further usage.

main.py

import cProfile
import time
import pstats
from io import StringIO

def add_slow(a, b):
    time.sleep(0.5)
    return a+b

def add_fast(a, b):
    return a+b

prof = cProfile.Profile()

def main_func():
    arr = []
    prof.enable()
    for i in range(10):
        if i%2==0:
            arr.append(add_slow(i,i))
        else:
            arr.append(add_fast(i,i))
    prof.disable()
    #prof.print_stats(sort='time')
    prof.dump_stats("main_funcs.prof")
    return arr

main_func()
stream = StringIO();
stats = pstats.Stats("main_funcs.prof", stream=stream); 
stats.print_stats()
stream.seek(0)
print(16*'=',"RESULTS",16*'=')
print (stream.read())

Usage:

python3 main.py

Output:

================ RESULTS ================
Tue Jul  6 17:36:21 2021    main_funcs.prof

         26 function calls in 2.507 seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        5    2.507    0.501    2.507    0.501 {built-in method time.sleep}
        5    0.000    0.000    2.507    0.501 profiler.py:39(add_slow)
        5    0.000    0.000    0.000    0.000 profiler.py:43(add_fast)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Comments: We can observe that in the above code, the time.sleep function is taking about 2.507 seconds.

Suspense answered 6/7, 2021 at 9:37 Comment(0)
G
-2

I hope this will meet your requirement

import PyPDF4
import io

pdfFile = open(r'test.pdf', 'rb')
pdfReader = PyPDF4.PdfFileReader(pdfFile)
pageObj = pdfReader.getPage(1)
pagetext = pageObj.extractText()

for line in io.StringIO(pagetext):
    print(line)
Gastrolith answered 6/12, 2020 at 10:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.