Django stores unicode
string using code points and identifies the string as unicode for further processing.
UTF-8 uses four 8-bit bytes encoding, so the unicode
string that's being used by Django needs to be decoded or interpreted from code point notation to its UTF-8 notation at some point.
In the case of Åland Islands, what seems to be happening is that it's taking the UTF-8 byte encoding and interpret it as code points to convert the string.
The string django_countries returns is most likely u'\xc5land Islands'
where \xc5
is the UTF code point notation of Å. In UTF-8 byte notation \xc5
becomes \xc3\x85
where each number \xc3
and \x85
is a 8-bit byte. See:
http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc5&mode=hex
Or you can use country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name.encode('utf-8') to go from u'\xc5land Islands'
to '\xc3\x85land Islands'
If you take then each byte and use them as code points, you'll see it'll give you these characters: Ã…
See: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc3&mode=hex
And: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=x85&mode=hex
See code snippet with html notation of these characters.
<div id="test">Ã…Å</div>
So I'm guessing you have 2 different encodings in you application. One way to get from u'\xc5land Islands'
to u'\xc3\x85land Islands'
would be to in an utf-8 environment encode to UTF-8 which would convert u'\xc5'
to '\xc3\x85'
and then decode to unicode
from iso-8859
which would give u'\xc3\x85land Islands'
. But since it's not in the code you're providing, I'm guessing it's happening somewhere between the moment you set country_label
and the moment your output isn't displayed properly. Either automatically because of encodings settings, or through an explicit assignation somewhere.
FIRST EDIT:
To set encoding for you app, add # -*- coding: utf-8 -*-
at the top of your py file and <meta charset="UTF-8">
in of your template.
And to get unicode string from a django.utils.functional.proxy object you can call unicode()
. Like this:
country_label = unicode(fields.Country(form.cleaned_data.get('country')[0:2]).name)
SECOND EDIT:
One other way to figure out where the problem is would be to use force_bytes
(https://docs.djangoproject.com/en/1.8/ref/utils/#module-django.utils.encoding) Like this:
from django.utils.encoding import force_bytes
country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name
forced_country_label = force_bytes(country_label, encoding='utf-8', strings_only=False, errors='strict')
But since you already tried many conversions without success, maybe the problem is more complex. Can you share your version of django_countries
, Python
and your django app language settings?
What you can do also is go see directly in your djano_countries
package (that should be in your python directory), find the file data.py and open it to see what it looks like. Maybe the data itself is corrupted.
country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name.encode('utf-8')
in the code but still it rendered asÃ…land
. I am using render method to get the template. – Psittacosis