Python 2.7: reload(sys) disables error messages and print in Windows
Asked Answered
H

2

7

I'm making a script that requires me to change the encoding format to "UTF-8". I found a topic here on Stachoverflow that said i could use:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

It works great in OSX 10.8 (maybe earlier versions too), but in Windows XP and Windows 7 (probably Vista and 8 too) it disables all feedback in the interpreter. The script still runs, but i can't print anything or see if anything goes wrong.

Is there a way to patch the current code or is there an alternate way to change the encoding?

Handle answered 3/11, 2012 at 14:40 Comment(12)
What do you exactly mean with "disables all feedback"?Extramundane
Might be because cmd.exe doesn't use utf-8 by default?Pulpit
Could you elaborate on I'm making a script that requires me to change the encoding format... - why ?Electroanalysis
@Extramundane I don't get any error messages and print-statements doesn't show anything in the interpreter.Handle
@JakobBowyer And how does your comment help me?Handle
@JonClements I'm importing my schools website (yes i have permission) as a text file, and i parse through it, to find information and index it. It is a danish website and therefore it contains ØÆÅ which doesn't work by default for me.Handle
How do you import a website into a Python script?Melodist
According to the Python developers, M.-A. Lemburg and Martin v. Löwis, changing setdefaultencoding is not a supported way to solve any problem. It will make your Python scripts incompatible with the majority of other Python users, and may lead to unexpected behavior or moji-bake.Holiday
setdefaultencoding affects the way Python does implicit conversion between str and unicode. This could happen in lots of ways so to help you fix your script the proper way, we'd need to see your code. In general, you just have to keep track of what is str and what is unicode and don't mix them willy-nilly. Usually you'd want to convert user-inputted strs to unicode, work everywhere with unicode, and encode your unicode to utf-8 or whatever is appropriate only upon output.Holiday
@Holiday hmm. Is there a good alternative? I most confess i didn't read all the responses in your link.Handle
There is no easy alternative. Python3 will force programmers to pay much closer attention to what is bytes (that is, strs in Python2) and str (or, what is called unicode in Python2). Instead of implicitly converting between the two using the ascii encoding, Python3 will often just raise an exception. So it will pay off in the long run to know the absolute minimum needed to deal with unicode as well as some practical advice on how to deal with unicode in Python.Holiday
I agree with the other commenters. You definitely need to convert your data into unicode object and then work with that.Extramundane
K
6

May be what happen to you are related with idle, since idle replace default sys.stdin, sys.stdout, sys.stderr with its own object. After you reload(sys), the three file object associated with sys will be restored to default ones, so you can not see it in idle.

You may solve it by change them back after reload(sys):

import sys
stdin, stdout, stderr = sys.stdin, sys.stdout, sys.stderr
reload(sys)
sys.stdin, sys.stdout, sys.stderr = stdin, stdout, stderr
Kroon answered 24/5, 2014 at 16:30 Comment(1)
Thanks! This hack did sort me another problem where standard stream capture (capsys) was broken in pytest when the tested code was doing a reload(sys). By assuring that we keep the streams on reload this problem was sorted.Spinney
S
1

To be frank, I have zero idea why you would possibly want to alter the default encoding for Python just to read and parse a single file (or even a great number of files, for that matter). Python can quite easily parse and handle UTF-8 without such drastic measures. Moreover, on this very site, there are some great methods to do so. This issue is close to a duplicate of: Unicode (UTF-8) reading and writing to files in Python

On that line, the best answer is: https://mcmap.net/q/86083/-unicode-utf-8-reading-and-writing-to-files-in-python, which basically relies on the Python Codecs module.

Using this approach, you can do the following:

import codecs
with codecs.open("SomeFile", "rb", "utf-8") as inFile: 
    text = inFile.read()
# Do something with 'text' here
with codecs.open("DifferentFile", "wb", "utf-8") as outFile:
    outFile.write(text)

This successfully reads a UTF-8 formatted file, then writes it back out as UTF-8. The variable 'text' will be a unicode string in Python. You can always write it back out as UTF-8 or UTF-16 or any compatible output format.

Sutlej answered 14/2, 2013 at 5:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.