How to print non-ASCII characters in Python
Asked Answered
H

5

7

I have a problem when I'm printing (or writing to a file) the non-ASCII characters in Python. I've resolved it by overriding the str method in my own objects, and making "x.encode('utf-8')" inside it, where x is a property inside the object.

But, if I receive a third-party object, and I make "str(object)", and this object has a non-ASCII character inside, it will fail.

So the question is: is there any way to tell the str method that the object has an UTF-8 codification, generically? I'm working with Python 2.5.4.

Habergeon answered 10/11, 2009 at 10:48 Comment(2)
What does "receive a a third-party object" mean? What third-party object? And why can't this mysterious object be trusted to produce proper string values?Waite
I'm interacting with other programs which are not made by me. Those programs can have objects with string properties which can contain non-ascii charactersHabergeon
H
2

I would like to say that I've found a solution in Unix systems, exporting a environment var, with this:

export LC_CTYPE="es:ES.UTF-8"

This way, all files are in utf-8, so I can make prints or whatever and it works fine

Habergeon answered 10/11, 2009 at 12:12 Comment(1)
What does this have to do with your question? Or with python?Dramaturge
E
10

There is no way to make str() work with Unicode in Python < 3.0.

Use repr(obj) instead of str(obj). repr() will convert the result to ASCII, properly escaping everything that isn't in the ASCII code range.

Other than that, use a file object which allows unicode. So don't encode at the input side but at the output side:

fileObj = codecs.open( "someFile", "w", "utf-8" )

Now you can write unicode strings to fileObj and they will be converted as needed. To make the same happen with print, you need to wrap sys.stdout:

import sys, codecs, locale
print str(sys.stdout.encoding)
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
line = u"\u0411\n"
print type(line), len(line)
sys.stdout.write(line)
print line
Emprise answered 10/11, 2009 at 11:5 Comment(2)
But I have the same problem when I use print(object), because internally it calls to str, so if the object has a non-ascii character it will fail. I've seen that I can put this in the first line of my files.py: # -- coding: utf-8 -- but it doesn't workHabergeon
The encoding of the source file has nothing to do with what str() supports. str() only supports unicode characters in py3k, so either use repr() or unicode() everywhere.Emprise
L
4
none_ascii = '''
        ███╗   ███╗ ██████╗ ██╗   ██╗██╗███████╗███████╗ 
        ████╗ ████║██╔═══██╗██║   ██║██║██╔════╝██╔════╝ 
        ██╔████╔██║██║   ██║██║   ██║██║█████╗  ███████╗ 
        ██║╚██╔╝██║██║   ██║╚██╗ ██╔╝██║██╔══╝  ╚════██║ 
        ██║ ╚═╝ ██║╚██████╔╝ ╚████╔╝ ██║███████╗███████║ 
        ╚═╝     ╚═╝ ╚═════╝   ╚═══╝  ╚═╝╚══════╝╚══════╝ 
'''

print(none_ascii.decode('utf-8'))
Libido answered 1/3, 2017 at 3:10 Comment(0)
D
3

How about you use unicode(object) and define __unicode__ method on your classes?

Then you know its unicode and you can encode it anyway you want into to a file.

Dramaturge answered 10/11, 2009 at 10:51 Comment(5)
But then I'm in the same problem: if I receive a third party object and I use "unicode(object)", and the object has a non-ascii character, it will fail, won't it?Habergeon
Besides, when I use "print(object)", internally it calls str method, so I can't use unicodeHabergeon
One more question: if I use python 3, Won't I have those problems? Python3 makes the conversion alone? Does it accept non-ascii characters by default?Habergeon
All Python 3 strings are (what used to be) unicode by default.Codee
First, please realize, if you receive and array of bytes, witch python strings essetialy are, there is no way to be sure what encoding it is in. If there are third-party objects that give you strings in non-standard encoding, they should also provide which encoding it is in.Dramaturge
H
2

I would like to say that I've found a solution in Unix systems, exporting a environment var, with this:

export LC_CTYPE="es:ES.UTF-8"

This way, all files are in utf-8, so I can make prints or whatever and it works fine

Habergeon answered 10/11, 2009 at 12:12 Comment(1)
What does this have to do with your question? Or with python?Dramaturge
I
0

just paste these two lines at the top of your code

  1. #!/usr/local/bin/python
  2. # coding: latin-1

go to this link for further details https://www.python.org/dev/peps/pep-0263/

Innocence answered 2/5, 2015 at 20:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.