django countries encoding is not giving correct name
Asked Answered
P

3

12

I am using django_countries module for countries list, the problem is there are couple of countries with special characters like 'Åland Islands' and 'Saint Barthélemy'.

I am calling this method to get the country name:

country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name

I know that country_label is lazy translated proxy object of django utils, but it is not giving the right name rather it gives 'Ã…land Islands'. any suggestions for this please?

Psittacosis answered 4/6, 2015 at 7:44 Comment(0)
T
3

Django stores unicode string using code points and identifies the string as unicode for further processing. UTF-8 uses four 8-bit bytes encoding, so the unicode string that's being used by Django needs to be decoded or interpreted from code point notation to its UTF-8 notation at some point. In the case of Åland Islands, what seems to be happening is that it's taking the UTF-8 byte encoding and interpret it as code points to convert the string.

The string django_countries returns is most likely u'\xc5land Islands' where \xc5 is the UTF code point notation of Å. In UTF-8 byte notation \xc5 becomes \xc3\x85 where each number \xc3 and \x85 is a 8-bit byte. See: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc5&mode=hex

Or you can use country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name.encode('utf-8') to go from u'\xc5land Islands' to '\xc3\x85land Islands'

If you take then each byte and use them as code points, you'll see it'll give you these characters: Ã… See: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc3&mode=hex And: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=x85&mode=hex

See code snippet with html notation of these characters.

<div id="test">&#xC3;&#x85;&#xC5;</div>

So I'm guessing you have 2 different encodings in you application. One way to get from u'\xc5land Islands' to u'\xc3\x85land Islands' would be to in an utf-8 environment encode to UTF-8 which would convert u'\xc5' to '\xc3\x85' and then decode to unicode from iso-8859 which would give u'\xc3\x85land Islands'. But since it's not in the code you're providing, I'm guessing it's happening somewhere between the moment you set country_label and the moment your output isn't displayed properly. Either automatically because of encodings settings, or through an explicit assignation somewhere.

FIRST EDIT:

To set encoding for you app, add # -*- coding: utf-8 -*- at the top of your py file and <meta charset="UTF-8"> in of your template. And to get unicode string from a django.utils.functional.proxy object you can call unicode(). Like this:

country_label = unicode(fields.Country(form.cleaned_data.get('country')[0:2]).name)

SECOND EDIT:

One other way to figure out where the problem is would be to use force_bytes (https://docs.djangoproject.com/en/1.8/ref/utils/#module-django.utils.encoding) Like this:

from django.utils.encoding import force_bytes
country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name
forced_country_label = force_bytes(country_label, encoding='utf-8', strings_only=False, errors='strict') 

But since you already tried many conversions without success, maybe the problem is more complex. Can you share your version of django_countries, Python and your django app language settings? What you can do also is go see directly in your djano_countries package (that should be in your python directory), find the file data.py and open it to see what it looks like. Maybe the data itself is corrupted.

Transversal answered 8/6, 2015 at 15:12 Comment(5)
I used country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name.encode('utf-8') in the code but still it rendered as Ã…land. I am using render method to get the template.Psittacosis
See edit, I'm supposing country_label goes straight to the context and isn't saved in db before being rendered?Galliwasp
@mad_programmer What happens if you pass encoding argument to unicode(), like this: unicode(fields.Country(...).name, 'UTF-8')?Patroon
@Patroon Your solution gives error TypeError: coercing to Unicode: need string or buffer, __proxy__ foundPsittacosis
@JulienGrégoire no the solution you suggested doesnt work, it still gives the same string. Yes, it goes to context through render method and I use it directly in template. Not getting stored anywhere in db.Psittacosis
A
0

try:

from __future__ import unicode_literals #Place as first import.

AND / OR

country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name.encode('latin1').decode('utf8')
Abramson answered 4/6, 2015 at 9:11 Comment(1)
both the solutions dont work. The second one gives exception. In the second option I get UnicodeDecodeError UnicodeDecodeError: 'utf8' codec can't decode byte 0xc5 in position 0: invalid continuation byte Psittacosis
S
0

Just this this week I encountered a similar encoding error. I believe the problem is because the machine encoding is differ with the one on Python. Try to add this to your .bashrc or .zshrc.

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

Then, open up a new terminal and run the Django app again.

Saltus answered 10/6, 2015 at 1:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.