Simple ascii url encoding with python
Asked Answered
B

6

1

look at that:

import urllib
print urllib.urlencode(dict(bla='Ã'))

the output is

bla=%C3%BC

what I want is simple, I want the output in ascii instead of utf-8, so I need the output:

bla=%C3

if I try:

urllib.urlencode(dict(bla='Ã'.decode('iso-8859-1')))

doesn't work (all my python files are utf-8 encoded):

'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

In production, the input comes unicoded.

Begrime answered 24/6, 2010 at 21:52 Comment(4)
A+tilde converted to ASCII (?) is 0xC3 ? i don't think soManara
It might be worth rephrasing the title. ASCII does not include an 'Ã' character.Syncom
"bla=%C3%BC" contains no non-ASCII characters. You need to explain what you really want/need and why you think that you need it.Crank
@mykhal: U+00C3 is LATIN CAPITAL LETTER A WITH TILDE. "\xC3" is mapped to U+00C3 in ISO-8859-1 and cp1252. What are you trying to say?Crank
B
0

thanks to all solutions. all of you converge to the very same point. I made a mess changing the right code

.encode('iso-8859-1') 

to

.decode('iso-8859-1')

turn back to .encode('iso-8859-1') and it works.

Begrime answered 25/6, 2010 at 17:51 Comment(0)
F
4

Have a look at unicode transliteration in python:

from unidecode import unidecode
print unidecode(u"\u5317\u4EB0")

# That prints: Bei Jing

In your case:

bla='Ã'
print unidecode(bla)
'A'

This is a third party library, which can be easily installed via:

$ git clone http://code.zemanta.com/tsolc/git/unidecode
$ cd unidecode
$ python setup.py install
Fantan answered 24/6, 2010 at 22:8 Comment(0)
E
2

I want the output in ascii instead of utf-8

That's not ASCII, which has no characters mapped above 0x80. You're talking about ISO-8859-1, or possibly code page 1252 (the Windows encoding based on it).

'Ã'.decode('iso-8859-1')

Well that depends on what encoding you've used to save the character à in the source, doesn't it? It sounds like your text editor has saved it as UTF-8. (That's a good thing, because locale-specific encodings like ISO-8859-1 need to go away ASAP.)

Tell Python that the source file you've saved is in UTF-8 as per PEP 263:

# coding=utf-8

urllib.quote(u'Ã'.encode('iso-8859-1'))    # -> %C3

Or, if you don't want that hassle, use a backslash escape:

urllib.quote(u'\u00C3'.encode('iso-8859-1'))    # -> %C3

Although, either way, a modern webapp should be using UTF-8 for its input rather than ISO-8859-1/cp1252.

Ento answered 24/6, 2010 at 22:4 Comment(1)
the entire webapp is already utf-8, but the external url-based webservice that I am trying to communicate doesn't recognize %C3%BC, only %C3. your solution works fine.Begrime
M
2

pretty well working asciification is this way:

import unicodedata
unicodedata.normalize('NFKD', 'Ã'.decode('UTF-8')).encode('ascii', 'ignore')
Manara answered 24/6, 2010 at 22:4 Comment(0)
M
1

If your input is actually UTF-8 and you want iso-8859-1 as output (which is not ASCII) what you need is:

'ñ'.decode('utf-8').encode('iso-8859-1')
Mildew answered 24/6, 2010 at 22:2 Comment(0)
B
0

thanks to all solutions. all of you converge to the very same point. I made a mess changing the right code

.encode('iso-8859-1') 

to

.decode('iso-8859-1')

turn back to .encode('iso-8859-1') and it works.

Begrime answered 25/6, 2010 at 17:51 Comment(0)
B
0

Package unihandecode is

US-ASCII transliterations of Unicode text.
an improved version of Python unidecode, that is Python port of Text::Unidecode Perl module by Sean M. Burke .

pip install Unihandecode

then in python

import unihandecode
print(unihandecode.unidecode(u'Ã'))

prints A.

Benedix answered 4/6, 2015 at 17:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.