Using python's urllib.quote_plus on utf-8 strings with 'safe' arguments

Asked 14/3, 2014 at 20:41 Answered 22/1, 2016 at 13:47

Solved python utf-8 sparql urllib unicode-escapes

I have a unicode string in python code:

name = u'Mayte_Martín'

I would like to use it with a SPARQL query, which meant that I should encode the string using 'utf-8' and use urllib.quote_plus or requests.quote on it. However, both these quote functions behave strangely as can be seen when used with and without the 'safe' arguments.

from urllib import quote_plus

Without 'safe' argument:

quote_plus(name.encode('utf-8'))
Output: 'Mayte_Mart%C3%ADn'

With 'safe' argument:

quote_plus(name.encode('utf-8'), safe=':/')
Output: 
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-164-556248391ee1> in <module>()
----> 1 quote_plus(v, safe=':/')

/usr/lib/python2.7/urllib.pyc in quote_plus(s, safe)
   1273         s = quote(s, safe + ' ')
   1274         return s.replace(' ', '+')
-> 1275     return quote(s, safe)
   1276 
   1277 def urlencode(query, doseq=0):

/usr/lib/python2.7/urllib.pyc in quote(s, safe)
   1264         safe = always_safe + safe
   1265         _safe_quoters[cachekey] = (quoter, safe)
-> 1266     if not s.rstrip(safe):
   1267         return s
   1268     return ''.join(map(quoter, s))

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

The problem seems to be with rstrip function. I tried to make some changes and call as...

quote_plus(name.encode('utf-8'), safe=u':/'.encode('utf-8'))

But that did not solve the issue. What could be the issue here?

Tripos answered 14/3, 2014 at 20:41 Comment(5)

I just try your code with python 2.7.4 & ipython 1.1.0 with no problems at all. – Mathematics 14/3, 2014 at 20:53

With the 'safe' argument? I have python v2.7.3 and ipython v1.2.1 – Tripos 14/3, 2014 at 22:44

As you mentioned it works for you, I just created another clean environment and tried it. It works! So, it must be an interference of some other module/activity in my workspace. I'll try to figure what it is and post it here. – Tripos 14/3, 2014 at 22:55

It is perplexing. I restarted my ipython notebook's kernel and I wasn't able to reproduce it now. – Tripos 14/3, 2014 at 23:9

there is a bug here: bugs.python.org/issue23885 it seems python dev team would not fix it – Desrochers 22/1, 2016 at 3:47

I'm answering my own question, so that it may help others who face the same issue.

This particular issue arises when you make the following import in the current workspace before executing anything else.

from __future__ import unicode_literals

This has somehow turned out to be incompatible with the following sequence of code.

from urllib import quote_plus

name = u'Mayte_Martín'
quote_plus(name.encode('utf-8'), safe=':/')

The same code without importing unicode_literals works fine.

Tripos answered 20/3, 2014 at 14:52 Comment(4)

I'm receiving the UnicodeDecodeError when passing any non-ASCII characters, regardless of whether unicode_literals was imported. – Heteronomous 6/5, 2015 at 15:25

Interesting. Can you quote the exact code you were using to do it? – Tripos 7/5, 2015 at 16:23

urllib.quote_plus(u'Mayte_Martín'.encode('utf-8'), safe=':/') gives UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128) on Python 2.7.6. – Heteronomous 7/5, 2015 at 17:12

encode both argument would work. see my answer for example – Desrochers 22/1, 2016 at 13:49

According to this bug, here is the workaround:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from urllib import quote_plus
name = u'Mayte_Martín'
quote_plus(name.encode('utf-8'), safe=':/'.encode('utf-8'))

You must encode both argument in quote or quote_plus method to utf-8

Desrochers answered 22/1, 2016 at 13:47 Comment(1)

Works great, thanks! I'm decoding with urllib.unquote(encoded_name).decode('utf-8') – Osteoplastic 12/9, 2016 at 16:28

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import urllib
name = u'Mayte_Martín'
print urllib.quote_plus(name.encode('utf-8'), safe=':/')

works without problem for me (Py 2.7.9, Debian)

(I don't know the answer, but I cannot make comments with regard to reputation)

Ringmaster answered 20/5, 2015 at 10:52 Comment(0)

Recommended topics

Hot tags