I spent a few angry hours looking for the problem with Unicode strings that was broken down to something that Python (2.7) hides from me and I still don't understand. First, I tried to use u".."
strings consistently in my code, but that resulted in the infamous UnicodeEncodeError
. I tried using .encode('utf8')
, but that didn't help either. Finally, it turned out I shouldn't use either and it all works out automagically. However, I (here I need to give credit to a friend who helped me) did notice something weird while banging my head against the wall. sys.getdefaultencoding()
returns ascii, while sys.stdout.encoding
returns UTF-8. 1. in the code below works fine without any modifications to sys
and 2. raises a UnicodeEncodeError
. If I change the default system encoding with reload(sys).setdefaultencoding("utf8")
, then 2. works fine. My question is why the two encoding variables are different in the first place and how do I manage to use the wrong encoding in this simple piece of code? Please, don't send me to the Unicode HOWTO, I've read that obviously in the tens of questions about UnicodeEncodeError
.
# -*- coding: utf-8 -*-
import sys
class Token:
def __init__(self, string, final=False):
self.value = string
self.final = final
def __str__(self):
return self.value
def __repr__(self):
return self.value
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
# 1.
myString = "I need 20 000€."
tok = Token(myString)
print(tok)
reload(sys).setdefaultencoding("utf8")
# 2.
myString = u"I need 20 000€."
tok = Token(myString)
print(tok)