python encoding error only when called as external process
Asked Answered
F

2

6

A simple file like

$ cat x.py
x = u'Gen\xe8ve'
print x

when run will give me:

$ python x.py
Genève

however, when run as a "command substitution" will give:

$ echo $(python x.py)
...
UnicodeEncodeError: 'ascii' codec...

I've tried with different terminal emulators (xterm, gnome-term) and the console on a ttyS. With bash and sh. With python2.4 and 2.7. I've tried setting the LC_ALL or LANG to some utf-8 locale before running python. I've checked the sys.getdefaultencoding(). And nothing helped.

The problem arises also when the script is called from another process (like java), but the above was the easiest way I found to replicate it.

I don't understand what's the difference between the two calls. Can anyone help?

Fidgety answered 7/8, 2012 at 11:20 Comment(1)
OK. I can definitely reproduce this.Dinkins
W
9

The problem here is that in the second call you are basically writing to a pipe that only accepts bytestrings (file-like object). The same happens if you try to execute this:

python x.py > my_file
Traceback (most recent call last):
File "x.py", line 2, in <module>
    print x
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 3: ordinal not in range(128)

As the receiver only understands bytestrings and not unicode characters you must first encode the unicode string into a bytestring using the encode function:

x = u'Gen\xe8ve'.encode('utf-8') 
print x

This will print the the unicode string encoded as a utf-8 bytestring (a sequence of bytes), allowing it to be written to a file-like object.

$echo $(python x.py)
Genève
$python x.py 
Genève
Workmanship answered 7/8, 2012 at 11:33 Comment(1)
Usually a terminal accepts only bytestrings too (On Windows It might accept Unicode directly). The difference is whether Python knows an appropriate character encoding or defaults to asciiSerica
S
3

As you suspect, Python doesn't know how to print unicode when its standard output is not a known terminal. Consider encoding the string before printing it:

# coding: utf-8
x = u'Gen\xe8ve'
print x.encode("utf-8")

Note that the invoking program and your script will need to agree in a common encoding.

Sharolynsharon answered 7/8, 2012 at 11:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.