Python encoding characters with urllib.quote
Asked Answered
T

3

34

I'm trying to encode non-ASCII characters so I can put them inside an url and use them in urlopen. The problem is that I want an encoding like JavaScript (that for example encodes ó as %C3%B3):

encodeURIComponent(ó)
'%C3%B3'

But urllib.quote in python returns ó as %F3:

urllib.quote(ó)
'%F3'

I want to know how to achieve an encoding like javascript's encodeURIComponent in Python, and also if I can encode non ISO 8859-1 characters like Chinese. Thanks!

Trey answered 21/6, 2011 at 19:41 Comment(1)
related: #6338969Languor
G
39

You want to make sure you're using unicode.

Example:

import urllib

s = u"ó"
print urllib.quote(s.encode("utf-8"))

Outputs:

%C3%B3

Guertin answered 21/6, 2011 at 20:0 Comment(0)
S
49

in Python 3 the urllib.quote has been renamed to urllib.parse.quote.

Also in Python 3 all strings are unicode strings (the byte strings are called bytes).

Example:

from urllib.parse import quote

print(quote('ó'))
# output: %C3%B3
Secretariat answered 2/8, 2018 at 9:9 Comment(2)
Upvoted, I think the Python 3 version of the answer is more relevant these days.Afrika
Note that encodeURIComponent("!") != quote("!"), so it's not like JS version as the OP asked.Regiment
G
39

You want to make sure you're using unicode.

Example:

import urllib

s = u"ó"
print urllib.quote(s.encode("utf-8"))

Outputs:

%C3%B3

Guertin answered 21/6, 2011 at 20:0 Comment(0)
R
3

Note that encodeURIComponent() does not encode the chars A-Z a-z 0-9 - _ . ! ~ * ' ( ). By default urllib.parse.quote() does encode some of these chars, you need to pass the safe chars list to get an equivalent encoder for Python.

In Python 3 the correct solution is

from urllib.parse import quote

quote("ó", safe="!~*'()")
Regiment answered 15/11, 2022 at 1:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.