Is there an alternative to parse_qs that handles semi-colons?
Asked Answered
L

1

5

TL;DR

What libraries/calls are available to handle query strings containing semi-colons differently than parse_qs?

>>> urlparse.parse_qs("tagged=python;ruby")
>>> {'tagged': ['python']}

Full Background

I'm working with the StackExchange API to search for tagged questions.

Search is laid out like so, with tags separated by semi-colons:

/2.1/search?order=desc&sort=activity&tagged=python;ruby&site=stackoverflow

Interacting with the API is just fine. The problem comes in when I want to test the calls, particularly when using httpretty to mock HTTP.

Under the hood, httpretty is using urlparse.parse_qs from the python standard libraries to parse the querystring.

>>> urlparse.parse_qs("tagged=python;ruby")
{'tagged': ['python']}

Clearly that doesn't work well. That's the small example, here's a snippet of httpretty (outside of testing context).

import requests
import httpretty

httpretty.enable()

httpretty.register_uri(httpretty.GET, "https://api.stackexchange.com/2.1/search", body='{"items":[]}')
resp = requests.get("https://api.stackexchange.com/2.1/search", params={"tagged":"python;ruby"})
httpretty_request = httpretty.last_request()
print(httpretty_request.querystring)

httpretty.disable()
httpretty.reset()

I want to use the machinery from httpretty, but need a workaround for parse_qs. I can monkey patch httpretty for now, but would love to see what else can be done.

Lenorelenox answered 3/1, 2014 at 18:21 Comment(2)
Unfortunately ';' it's hard coded in urlparse to be a separator. See: hg.python.org/cpython/file/2.7/Lib/urlparse.py#l150-157 and no way to overwrite it via arguments.Kabuki
Oh hey, thanks for linking to the source. It looks like it's actually hardcoded in parse_qsl.Lenorelenox
L
1

To get around this, I temporarily monkey patched httpretty.core.unquote_utf8 (technically httpretty.compat.unquote_utf8).

#
# To get around how parse_qs works (urlparse, under the hood of
# httpretty), we'll leave the semi colon quoted.
# 
# See https://github.com/gabrielfalcao/HTTPretty/issues/134
orig_unquote = httpretty.core.unquote_utf8
httpretty.core.unquote_utf8 = (lambda x: x)

# It should handle tags as a list
httpretty.register_uri(httpretty.GET,
                       "https://api.stackexchange.com/2.1/search",
                       body=param_check_callback({'tagged': 'python;dog'}))
search_questions(since=since, tags=["python", "dog"], site="pets")

...

# Back to normal for the rest
httpretty.core.unquote_utf8 = orig_unquote
# Test the test by making sure this is back to normal
assert httpretty.core.unquote_utf8("%3B") == ";"

This assumes you don't need anything else unquoted. Another option is to only leave the semi-colons percent-encoded before it reaches parse_qs.

Lenorelenox answered 3/1, 2014 at 21:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.