Convert Unicode/UTF-8 string to lower/upper case using pure & pythonic library
Asked Answered
M

1

11

I use Google App Engine and cannot use any C/C++ extension, just pure & pythonic library to do conversion of Unicode/UTF-8 strings to lower/upper case. str.lower() and string.lowercase() don't.

Marra answered 27/1, 2010 at 9:51 Comment(0)
P
25

str encoded in UTF-8 and unicode are two different types. Don't use string, use the appropriate method on the unicode object:

>>> print u'ĉ'.upper()
Ĉ

Decode str to unicode before using:

>>> print 'ĉ'.decode('utf-8').upper()
Ĉ
Priester answered 27/1, 2010 at 9:54 Comment(6)
Thanks. Is this applicable to Vietnamese?Marra
It should be. It's not hard to test in the interactive interpreter.Priester
My code does not work for Russian and Vietnamese. I don't know other languages oladic.appspot.com/add/ОИЧУНКАЛС oladic.appspot.com/add/TÌNH%20YÊU oladic.appspot.com/add/ĉĉĉĉMarra
Finally it worked! Thank you very much! I wish I could vote more!Marra
Viet: you probably want to URL-encode unicode characters if you're putting them in a URL (although it's probably easier to just POST them as utf-8, assuming you're using a form to submit them).Baguette
Python 3: 'str' object has no attribute 'decode'Antin

© 2022 - 2024 — McMap. All rights reserved.