urllib.quote() throws KeyError
Asked Answered
M

3

42

To encode the URI, I used urllib.quote("schönefeld") but when some non-ascii characters exists in string, it thorws

KeyError: u'\xe9'
Code: return ''.join(map(quoter, s))

My input strings are köln, brønshøj, schönefeld etc.

When I tried just printing statements in windows(Using python2.7, pyscripter IDE). But in linux it raises exception (I guess platform doesn't matter).

This is what I am trying:

from commands import getstatusoutput
queryParams = "schönefeld";
cmdString = "http://baseurl" + quote(queryParams)
print getstatusoutput(cmdString)

Exploring the issue reason: in urllib.quote(), actually exception being throwin at return ''.join(map(quoter, s)).

The code in urllib is:

def quote(s, safe='/'):
    if not s:
        if s is None:
            raise TypeError('None object cannot be quoted')
        return s
     cachekey = (safe, always_safe)
     try:
         (quoter, safe) = _safe_quoters[cachekey]
     except KeyError:
         safe_map = _safe_map.copy()
         safe_map.update([(c, c) for c in safe])
         quoter = safe_map.__getitem__
         safe = always_safe + safe
         _safe_quoters[cachekey] = (quoter, safe)
      if not s.rstrip(safe):
         return s
      return ''.join(map(quoter, s))

The reason for exception is in ''.join(map(quoter, s)), for every element in s, quoter function will be called and finally the list will be joined by '' and returned.

For non-ascii char è, the equivalent key will be %E8 which presents in _safe_map variable. But when I am calling quote('è'), it searches for the key \xe8. So that the key does not exist and exception thrown.

So, I just modifed s = [el.upper().replace("\\X","%") for el in s] before calling ''.join(map(quoter, s)) within try-except block. Now it works fine.

But I am annoying what I have done is correct approach or it will create any other issue? And also I do have 200+ instances of linux which is very tough to deploy this fix in all instances.

Mcglone answered 27/2, 2013 at 15:14 Comment(4)
Is this Python 2 with unicode values? It works fine for already-encoded data.Cornfield
You do not get an error for urllib.quote('sch\xe9nefeld'). You only get the error for urllib.quote(u'sch\xe9nefeld') (note the u'' unicode literal).Cornfield
@MartijnPieters so cmdString = "http://baseurl" + quote("schönefeld") this should be like cmdString=u"http://baseurl"+quote(u"schönefeld")?Mcglone
No, you misunderstand me. I am stating that the error only occurs when you give quote() unicode values. For byte strings (already encoded) this doesn't happen.Cornfield
C
66

You are trying to quote Unicode data, so you need to decide how to turn that into URL-safe bytes.

Encode the string to bytes first. UTF-8 is often used:

>>> import urllib
>>> urllib.quote(u'sch\xe9nefeld')
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py:1268: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  return ''.join(map(quoter, s))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1268, in quote
    return ''.join(map(quoter, s))
KeyError: u'\xe9'
>>> urllib.quote(u'sch\xe9nefeld'.encode('utf8'))
'sch%C3%A9nefeld'

However, the encoding depends on what the server will accept. It's best to stick to the encoding the original form was sent with.

Cornfield answered 27/2, 2013 at 15:19 Comment(2)
utf-8 has stronger case than your answer implies. All major browsers use utf-8 before percent-encoding while constructing URIs. IRI to URI must be converted using utf-8. Other encodings are used on legacy servers.Fun
@J.F.Sebastian: Sure, the path elements of URIs use UTF-8. But this is the query part instead. What a browser uses for encoding in the query string is less well defined, and has been, in the past, based on the encoding of the HTML page the form stems from.Cornfield
M
2

By just converting the string to unicode I resolved the issue.

here is the snippet:

try:
    unicode(mystring, "ascii")
except UnicodeError:
    mystring = unicode(mystring, "utf-8")
else:
    pass

Detailed description of solution can be found at http://effbot.org/pyfaq/what-does-unicodeerror-ascii-decoding-encoding-error-ordinal-not-in-range-128-mean.htm

Mcglone answered 17/12, 2013 at 11:42 Comment(0)
K
1

I had the exact same error as @underscore but in my case the problem was that map(quoter,s) tried to look for the key u'\xe9' which was not in the _safe_map. However \xe9 was, so I solved the issue by replacing u'\xe9' by \xe9 in s.

Moreover, shouldn't the return statement be within the try/except ? I also had to change this to completely solve the problem.

Ko answered 28/7, 2015 at 14:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.